Re: [PATCH] c++: Redundant -Wdeprecated-declarations warning in build_over_call (PR 67960)

2020-03-12 Thread Jason Merrill via Gcc-patches

On 3/12/20 3:28 PM, Patrick Palka wrote:

In build_over_call, we are emitting a redundant -Wdeprecated-declarations
warning about the deprecated callee function, first from mark_used and again
from build_addr_func <- decay_conversion <- cp_build_addr_expr <- mark_used.

It seems this warning coming from build_addr_func will always be redundant, so
we can safely use a warning_sentinel to disable the warning before calling
build_addr_func.  (And any deprecation warning that could come from
build_addr_func would be for FN, and so we wouldn't be suppressing too much.)

Tested on x86_64-pc-linux-gnu.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/67960
* call.c (build_over_call): Use a warning_sentinel to disable
warn_deprecated_decl before calling build_addr_func.


OK.


gcc/testsuite/ChangeLog:

PR c++/67960
g++.dg/diagnostic/pr67960.C: New test.
g++.dg/diagnostic/pr67960-2.C: New test.
---
  gcc/cp/call.c   |  5 +
  gcc/testsuite/g++.dg/diagnostic/pr67960-2.C | 13 +
  gcc/testsuite/g++.dg/diagnostic/pr67960.C   | 13 +
  3 files changed, 31 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr67960-2.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr67960.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 5767a8b8c4e..2f6cfd85175 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9062,6 +9062,11 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
  }
else
  {
+  /* If FN is marked deprecated, then we've already issued a deprecated-use
+warning from mark_used above, so avoid reduntantly issuing another one
+from build_addr_func.  */
+  warning_sentinel w (warn_deprecated_decl);
+
fn = build_addr_func (fn, complain);
if (fn == error_mark_node)
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr67960-2.C 
b/gcc/testsuite/g++.dg/diagnostic/pr67960-2.C
new file mode 100644
index 000..d2ba509534f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr67960-2.C
@@ -0,0 +1,13 @@
+// PR c++/67960
+// { dg-do compile }
+// { dg-additional-options "-Werror -fmax-errors=1" }
+__attribute__((deprecated)) void doNothing(){}
+
+int
+main()
+{
+  doNothing(); // { dg-error "is deprecated" }
+}
+
+// { dg-message "all warnings being treated as errors" "" { target *-*-* } 0 }
+// { dg-bogus "compilation terminated" "" { target *-*-* } 0 }
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr67960.C 
b/gcc/testsuite/g++.dg/diagnostic/pr67960.C
new file mode 100644
index 000..d7b1a2d0811
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr67960.C
@@ -0,0 +1,13 @@
+// PR c++/67960
+// { dg-do compile { target c++14 } }
+// { dg-additional-options "-Werror -fmax-errors=1" }
+[[deprecated]] void doNothing(){}
+
+int
+main()
+{
+  doNothing(); // { dg-error "is deprecated" }
+}
+
+// { dg-message "all warnings being treated as errors" "" { target *-*-* } 0 }
+// { dg-bogus "compilation terminated" "" { target *-*-* } 0 }





RE: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-12 Thread Yangfei (Felix)
> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Friday, March 13, 2020 7:50 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) 
> Subject: Re: [PATCH PR94026] combine missed opportunity to simplify
> comparisons with zero
> 
> Hi!
> 
> Please Cc: me on combine patches; you sent it nine days ago, and I didn't see 
> it
> until now.

OK.  

> On Wed, Mar 04, 2020 at 08:39:36AM +, Yangfei (Felix) wrote:
> >   This is a simple fix for PR94026.
> >   With this fix, combine will try make an extraction if we are in a equality
> comparison and this is an AND
> >   with a constant which is power of two minus one.  Shift here should be an
> constant.  For example, combine
> >   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare 
> > (zero_extract
> (x 2 9)) 0).
> 
> Why is that a good thing?

The reported test case is reduced from spec2017 541.leela_r.  I have pasted 
original code snippet on the bugzilla.  
We found other compilers like aocc/llvm can catch this pattern and simplify it. 
 

> (There should be thorough tests on many archs, showing it helps on average,
> and it doesn't regress anything.  I can do that for you, but not right now).

I only have aarch64 & x86_64 linux available and have tested this patch with 
spec17 on both platforms.  
No obvious improvement & regression witnessed.  This is expected as only one 
instruction is reduced here.  
It's appreciated if this can be tested on other archs.  

> The code needs more comments, and the commit message should say what is
> done and why you made those choices.
> In general, we should have *fewer* zero_extract, not more.

OK.  I can add more comments & commit message if finally we are inclined to go 
with this patch.  

Thanks,
Felix


Re: [PATCH] driver: Fix typos in options descriptions

2020-03-12 Thread Joseph Myers
On Thu, 30 Jan 2020, Lewis Hyatt wrote:

> Is this something that would be desirable to change for GCC 10? Attached
> patch would do so, and the output would become instead:

This patch is OK.  (It may mean some translation updates, but we generally 
do such bug fixes when the issue is found by translators in the course of 
working on translations; it's probably time for me to submit another 
updated .pot file to the TP.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] avoid treating more incompatible redeclarations as builtin-ins [PR94040]

2020-03-12 Thread Joseph Myers
On Thu, 5 Mar 2020, Martin Sebor wrote:

> Tested on x86_64-linux.  Is this acceptable for GCC 10?  How about 9?

OK for GCC 10.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] df: Don't abuse bb->aux (PR94148)

2020-03-12 Thread Segher Boessenkool
Hi,

On Thu, Mar 12, 2020 at 10:29:45PM +, Segher Boessenkool wrote:
> The df dataflow solvers use the aux field in the basic_block struct,
> although that is reserved for any use by passes.  And not only that,
> it is required that you set all such fields to NULL before calling
> the solvers, or you quietly get wrong results.
> 
> This changes the solvers to use a local array for last_change_age
> instead, just like it already had a local array for last_visit_age.
> 
> Tested on powerpc64-linux {-m32,-m64}.  Also tested with the tests for
> PR94042, which it also solves (I need to confirm that still though,
> there are other testsuite problems interfering with my testing).

I can confirm it also fixes PR94042, no extra patch to shrink-wrap is
needed even (as I thought before).


Segher


>   PR rtl-optimization/94148
>   PR rtl-optimization/94042
>   * df-core.c (BB_LAST_CHANGE_AGE): Delete.
>   (df_worklist_propagate_forward): New parameter last_change_age, use
>   that instead of bb->aux.
>   (df_worklist_propagate_backward): Ditto.
>   (df_worklist_dataflow_doublequeue): Use a local array last_change_age.


[PATCH] Add C++2a wait/notify_one/notify_all support to std::atomic<>

2020-03-12 Thread Thomas Rodgers via Gcc-patches
From f650ed07ed13c40624b72b7b4bebd967fa33f917 Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Thu, 12 Mar 2020 17:50:09 -0700
Subject: [PATCH] Add C++2a wait/notify_one/notify_all support to std::atomic<>

	* include/Makefile.am (bits_headers): Add new header.
	* include/Mamefile.in: Regenerate.
	* include/bits/atomic_base.h (__atomic_base<_Itp>:wait): Define.
	(__atomic_base<_Itp>:notify_one): Likewise.
	(__atomic_base<_Itp>:notify_all): Likewise.
	(__atomic_base<_Ptp*>:wait): Likewise.
	(__atomic_base<_Ptp*>:notify_one): Likewise.
	(__atomic_base<_Ptp*>:notify_all): Likewise.
	(__atomic_impl:wait): Likewise.
	(__atomic_impl:notify_one): Likewise.
	(__atomic_impl:notify_all): Likewise.
	(__atomic_float<_Fp>:wait): Likewise.
	(__atomic_float<_Fp>:notify_one): Likewise.
	(__atomic_float<_Fp>:notify_all): Likewise.
	(__atomic_ref<_Tp>:wait): Likewise.
	(__atomic_ref<_Tp>:notify_one): Likewise.
	(__atomic_ref<_Tp>:notify_all): Likewise.
	* include/bits/atomic_wait.h: New file.
	* include/std/atomic (atomic::wait): Define.
	(atomic::wait_one): Likewise.
	(atomic::wait_all): Likewise.
	(atomic<_Tp>::wait): Likewise.
	(atomic<_Tp>::wait_one): Likewise.
	(atomic<_Tp>::wait_all): Likewise.
	(atomic<_Tp*>::wait): Likewise.
	(atomic<_Tp*>::wait_one): Likewise.
	(atomic<_Tp*>::wait_all): Likewise.
	* testsuite/29_atomic/atomic/wait_notify/atomic_refs.cc: New test.
	* testsuite/29_atomic/atomic/wait_notify/bool.cc: Likewise.
	* testsuite/29_atomic/atomic/wait_notify/integrals.cc: Likewise.
	* testsuite/29_atomic/atomic/wait_notify/floats.cc: Likewise.
	* testsuite/29_atomic/atomic/wait_notify/pointers.cc: Likewise.
	* testsuite/29_atomic/atomic/wait_notify/generic.h: New File.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   2 +
 libstdc++-v3/include/bits/atomic_base.h   | 180 ++-
 libstdc++-v3/include/bits/atomic_wait.h   | 295 ++
 libstdc++-v3/include/std/atomic   |  38 +++
 .../atomic/wait_notify/atomic_refs.cc | 103 ++
 .../29_atomics/atomic/wait_notify/bool.cc |  57 
 .../29_atomics/atomic/wait_notify/floats.cc   |  32 ++
 .../29_atomics/atomic/wait_notify/generic.h   |  62 
 .../atomic/wait_notify/integrals.cc   |  56 
 .../29_atomics/atomic/wait_notify/pointers.cc |  59 
 11 files changed, 883 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/atomic_wait.h
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/atomic_refs.cc
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/bool.cc
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/floats.cc
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/generic.h
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/integrals.cc
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/pointers.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 80aeb3f8959..d195a721fd5 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -100,6 +100,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_wait.h \
 	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index eb437ad8d8d..4faaac5fb8d 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -445,6 +445,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_wait.h \
 	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
@@ -526,6 +527,7 @@ bits_headers = \
 	${bits_srcdir}/specfun.h \
 	${bits_srcdir}/sstream.tcc \
 	${bits_srcdir}/std_abs.h \
+	${bits_srcdir}/std_condvar.h \
 	${bits_srcdir}/std_function.h \
 	${bits_srcdir}/std_mutex.h \
 	${bits_srcdir}/stl_algo.h \
diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 87fe0bd6000..0581c41b30f 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -37,6 +37,11 @@
 #include 
 #include 
 
+#if __cplusplus > 201703L
+#include 
+#include 
+#endif
+
 #ifndef _GLIBCXX_ALWAYS_INLINE
 #define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__))
 #endif
@@ -134,7 +139,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __ret;
 }
 
-
   // Base types for atomics.
   template
 struct __atomic_base;
@@ -542,6 +546,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    __cmpexch_failure_order(__m));
   }
 
+#if __cplusplus > 201703L
+  _GLIBCXX_ALWAYS_INLINE void
+  wait(__int_type __old, memory_order __m = memory_order_seq_cst) 

Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread J.W. Jagersma via Gcc-patches
On 2020-03-12 20:26, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 07:42:30PM +0100, J.W. Jagersma wrote:
>> On 2020-03-12 16:32, Segher Boessenkool wrote:
>>> On Thu, Mar 12, 2020 at 02:08:18PM +0100, Richard Biener wrote:
> It wasn't clear from my message above, but: I was mostly worried about
> requiring the asm to treat memory operands in a certain way when the
> exception is thrown.  IMO it would be better to say that the values of
> memory operands are undefined on the exception edge.

 Rather unspecified.  So IMHO on the exception edge the asm() should
 still appear as a def for all outputs so the compiler cannot derive any
 actual values for them.  That of course also means that they must not
 appear to be must-defs since we may not DSE earlier stores for example.
>>>
>>> So make all outputs of an asm that may throw (w/ -fnon-call-exceptions)
>>> inout operands instead?  That works for registers exactly the same, too?
>>
>> Should gcc do this or would it be the user's responsibility?  For
>> memory operands I don't think anything changes if you replace "=m" by
>> "+m".  However changing each "=r" to "+r" would certainly generate less
>> optimal code.
> 
> [  "+m"(x)  means exactly the same as  "=m"(x) : "m"(x)  ]
> 
> Yes, but this is required for all operands that could keep their old
> value on exceptions.  Since this will only happen with
> -fnon-call-exceptions, which already has a significant performance cost,
> I don't think adding just a tiny bit more is so bad?  The benefit is
> that all asm will work, the user cannot do this wrong anymore.

I don't want to unnecessarily pessimize register outputs if it can be
avoided.  And even if you do change register outputs to in+out, they
are still more likely to contain some intermediate value that you would
want to discard on throwing.

If you think of it in terms of a function call, the closest equivalent
of register vs memory outputs would be:
reg = f ();
vs.
f ();
If f() throws, then in the first case, 'reg' is never assigned to. For
the second case, you can't expect 'mem' to remain unmodified.

There is also the case of "=rm" and similar which have not been
discussed so far, but are (in my experience) the most common operand
constraint.  I think these can be handled as "=r" if they end up in
memory since they are never offsettable, and are unlikely (impossible?)
to have side-effects if written to.  So I think the following behavior
is the most sensible:

If the output *may* reside in a register (this covers any constraint
combination that includes a register constraint, so also "=rm", etc),
then it is assigned via a temporary and its value discarded after a
throw.
Only pure memory operands, strictly "=m", "=o", etc, are converted to
in+out and assumed to be valid after throwing.

Does that sound reasonable?


Re: Remove redundant zero extend

2020-03-12 Thread Segher Boessenkool
Hi!

On Thu, Mar 12, 2020 at 02:43:06PM -0700, Jim Wilson wrote:
> I looked at combine because I'm familiar with that pass, but the ree
> pass might be the right place to handle this.

IMO, part of this should perhaps be done on Gimple already.  But the part
that should be done on RTL should be done *early*, it should start before
the loop optimisations, and it should be *finished* before combine.

>  I don't know if it has
> any support for handling if statements.  If not, maybe it could be
> extended to handle cases like this.

No, combine cannot do anything that crosses basic blocks.


Segher


Re: [PATCH PR94026] combine missed opportunity to simplify comparisons with zero

2020-03-12 Thread Segher Boessenkool
Hi!

Please Cc: me on combine patches; you sent it nine days ago, and I didn't
see it until now.

On Wed, Mar 04, 2020 at 08:39:36AM +, Yangfei (Felix) wrote:
>   This is a simple fix for PR94026.  
>   With this fix, combine will try make an extraction if we are in a equality 
> comparison and this is an AND
>   with a constant which is power of two minus one.  Shift here should be an 
> constant.  For example, combine
>   will transform (compare (and (lshiftrt x 8) 6) 0) to (compare (zero_extract 
> (x 2 9)) 0).  

Why is that a good thing?

(There should be thorough tests on many archs, showing it helps on
average, and it doesn't regress anything.  I can do that for you, but
not right now).

The code needs more comments, and the commit message should say what is
done and why you made those choices.

In general, we should have *fewer* zero_extract, not more.


Segher


Re: [RS6000] make PLT loads volatile

2020-03-12 Thread Alan Modra via Gcc-patches
On Thu, Mar 12, 2020 at 11:57:17AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Mar 12, 2020 at 01:18:50PM +1030, Alan Modra wrote:
> > With lazy PLT resolution the first load of a PLT entry may be a value
> > pointing at a resolver stub.  gcc's loop processing can result in the
> > PLT load in inline PLT calls being hoisted out of a loop in the
> > mistaken idea that this is an optimisation.  It isn't.  If the value
> > hoisted was that for a resolver stub then every call to that function
> > in the loop will go via the resolver, slowing things down quite
> > dramatically.
> > 
> > The PLT really is volatile, so teach gcc about that.
> 
> It would be nice if we could keep it cached after it has been resolved
> once, this has potential for regressing performance if we don't?  And
> LD_BIND_NOW should keep working just as fast as it is now, too?

Using a call-saved register to cache a load out of the PLT looks
really silly when the inline PLT call is turned back into a direct
call by the linker.  You end up with an unnecessary save and restore
of the register, plus copies from the register to r12.  What's the
chance of someone reporting that as a gcc "bug"?  :-)  Then there's
the possibility that shortening the number of instructions between two
calls of a small function runs into stalls.

How can we teach gcc about these unknowns?  ie. How to weight use of a
call-saved register to cache PLT loads against other possible uses of
that register in a loop?  It's quite likely not a good use, even when
gcc knows the PLT entry has been resolved..  Which means some gcc
infrastructure would be needed to do this sensibly and without the
necessary infrastructure, I think gcc hoisting a PLT load out of a
loop should never be done.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] avoid -Wredundant-tags on a first declaration in use (PR 93824)

2020-03-12 Thread Martin Sebor via Gcc-patches

On 3/12/20 11:03 AM, Martin Sebor wrote:

On 3/11/20 3:30 PM, Martin Sebor wrote:

On 3/11/20 2:10 PM, Jason Merrill wrote:

On 3/11/20 12:57 PM, Martin Sebor wrote:

On 3/9/20 6:08 PM, Jason Merrill wrote:

On 3/9/20 5:39 PM, Martin Sebor wrote:

On 3/9/20 1:40 PM, Jason Merrill wrote:

On 3/9/20 12:31 PM, Martin Sebor wrote:

On 2/28/20 1:24 PM, Jason Merrill wrote:

On 2/28/20 12:45 PM, Martin Sebor wrote:

On 2/28/20 9:58 AM, Jason Merrill wrote:

On 2/24/20 6:58 PM, Martin Sebor wrote:
-Wredundant-tags doesn't consider type declarations that are 
also

the first uses of the type, such as in 'void f (struct S);' and
issues false positives for those.  According to the reported 
that's

making it harder to use the warning to clean up LibreOffice.

The attached patch extends -Wredundant-tags to avoid these 
false
positives by relying on the same class_decl_loc_t::class2loc 
mapping
as -Wmismatched-tags.  The patch also somewhat improves the 
detection
of both issues in template declarations (though more work is 
still

needed there).


+ a new entry for it and return unless it's a 
declaration

+ involving a template that may need to be diagnosed by
+ -Wredundant-tags.  */
   *rdl = class_decl_loc_t (class_key, false, def_p);
-  return;
+  if (TREE_CODE (decl) != TEMPLATE_DECL)
+    return;


How can the first appearance of a class template be redundant?


I'm not sure I correctly understand the question.  The comment 
says

"involving a template" (i.e., not one of the first declaration of
a template).  The test case that corresponds to this test is:

   template  struct S7 { };
   struct S7 s7v;  // { dg-warning "\\\[-Wredundant-tags" }

where DECL is the TEPLATE_DECL of S7.

As I mentioned, more work is still needed to handle templates 
right
because some redundant tags are still not diagnosed.  For 
example:


   template  struct S7 { };
   template 
   using U = struct S7;   // missing warning


When we get here for an instance of a template, it doesn't make 
sense to treat it as a new type.


If decl is a template and type_decl is an instance of that 
template, do we want to (before the lookup) change type_decl to 
the template or the corresponding generic TYPE_DECL, which 
should already be in the table?


I'm struggling with how to do this.  Given type (a RECORD_TYPE) and
type_decl (a TEMPLATE_DECL) representing the use of a template, how
do I get the corresponding template (or its explicit or partial
specialization) in the three cases below?

   1) Instance of the primary:
  template  class A;
  struct A a;

   2) Instance of an explicit specialization:
  template  class B;
  template <> struct B;
  class B b;

   3) Instance of a partial specialization:
  template  class C;
  template  struct C;
  class C c;

By trial and (lots of) error I figured out that in both (1) and 
(2),

but not in (3), TYPE_MAIN_DECL (TYPE_TI_TEMPLATE (type)) returns
the template's type_decl.

Is there some function to call to get it in (3), or even better,
in all three cases?


I think you're looking for most_general_template.


I don't think that's quite what I'm looking for.  At least it doesn't
return the template or its specialization in all three cases above.


Ah, true, that function stops at specializations.  Oddly, I don't 
think there's currently a similar function that looks through them. 
You could create one that does a simple loop through 
DECL_TI_TEMPLATE like is_specialization_of.


Thanks for the tip.  Even with that I'm having trouble with partial
specializations.  For example in:

   template    struct S;
   template  class S;
   extern class  S s1;
   extern struct S s2;  // expect -Wmismatched-tags

how do I find the declaration of the partial specialization when given
the type in the extern declaration?  A loop in my find_template_for()
function (similar to is_specialization_of) only visits the implicit
specialization S (i.e., its own type) and the primary.


Is that a problem?  The name is from the primary template, so does it 
matter for this warning whether there's an explicit specialization 
involved?


I don't understand the question.  S is an instance of
the partial specialization.  To diagnose the right mismatch the warning
needs to know how to find the template (i.e., either the primary, or
the explicit or partial specialization) the instance corresponds to and
the class-key it was declared with.  As it is, while GCC does diagnose
the right declaration (that of s2), it does that thanks to a bug:
because it finds and uses the type and class-key used to declare s1.
If we get rid of s1 it doesn't diagnose anything.

I tried using DECL_TEMPLATE_SPECIALIZATIONS() to get the list of
the partial specializations but it doesn't like any of the arguments
I've given it (it ICEs).


With this fixed, here's the algorithm I tried:

1) for a type T of a template instantiation (s1 above), get the primary
    P that T was instantiated 

[PATCH] df: Don't abuse bb->aux (PR94148)

2020-03-12 Thread Segher Boessenkool
The df dataflow solvers use the aux field in the basic_block struct,
although that is reserved for any use by passes.  And not only that,
it is required that you set all such fields to NULL before calling
the solvers, or you quietly get wrong results.

This changes the solvers to use a local array for last_change_age
instead, just like it already had a local array for last_visit_age.

Tested on powerpc64-linux {-m32,-m64}.  Also tested with the tests for
PR94042, which it also solves (I need to confirm that still though,
there are other testsuite problems interfering with my testing).

PR rtl-optimization/94148
PR rtl-optimization/94042
* df-core.c (BB_LAST_CHANGE_AGE): Delete.
(df_worklist_propagate_forward): New parameter last_change_age, use
that instead of bb->aux.
(df_worklist_propagate_backward): Ditto.
(df_worklist_dataflow_doublequeue): Use a local array last_change_age.

---
 gcc/df-core.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/df-core.c b/gcc/df-core.c
index 346849e..9875a26 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -871,9 +871,6 @@ make_pass_df_finish (gcc::context *ctxt)
The general data flow analysis engine.
 */
 
-/* Return time BB when it was visited for last time.  */
-#define BB_LAST_CHANGE_AGE(bb) ((ptrdiff_t)(bb)->aux)
-
 /* Helper function for df_worklist_dataflow.
Propagate the dataflow forward.
Given a BB_INDEX, do the dataflow propagation
@@ -897,7 +894,8 @@ df_worklist_propagate_forward (struct dataflow *dataflow,
unsigned *bbindex_to_postorder,
bitmap pending,
sbitmap considered,
-  ptrdiff_t age)
+  vec _change_age,
+  int age)
 {
   edge e;
   edge_iterator ei;
@@ -908,7 +906,8 @@ df_worklist_propagate_forward (struct dataflow *dataflow,
   if (EDGE_COUNT (bb->preds) > 0)
 FOR_EACH_EDGE (e, ei, bb->preds)
   {
-if (age <= BB_LAST_CHANGE_AGE (e->src)
+   if (bbindex_to_postorder[e->src->index] < last_change_age.length ()
+   && age <= last_change_age[bbindex_to_postorder[e->src->index]]
&& bitmap_bit_p (considered, e->src->index))
   changed |= dataflow->problem->con_fun_n (e);
   }
@@ -942,7 +941,8 @@ df_worklist_propagate_backward (struct dataflow *dataflow,
 unsigned *bbindex_to_postorder,
 bitmap pending,
 sbitmap considered,
-   ptrdiff_t age)
+   vec _change_age,
+   int age)
 {
   edge e;
   edge_iterator ei;
@@ -953,7 +953,8 @@ df_worklist_propagate_backward (struct dataflow *dataflow,
   if (EDGE_COUNT (bb->succs) > 0)
 FOR_EACH_EDGE (e, ei, bb->succs)
   {
-if (age <= BB_LAST_CHANGE_AGE (e->dest)
+   if (bbindex_to_postorder[e->dest->index] < last_change_age.length ()
+   && age <= last_change_age[bbindex_to_postorder[e->dest->index]]
&& bitmap_bit_p (considered, e->dest->index))
   changed |= dataflow->problem->con_fun_n (e);
   }
@@ -991,10 +992,10 @@ df_worklist_propagate_backward (struct dataflow *dataflow,
worklists (we are processing WORKLIST and storing new BBs to visit in
PENDING).
 
-   As an optimization we maintain ages when BB was changed (stored in bb->aux)
-   and when it was last visited (stored in last_visit_age).  This avoids need
-   to re-do confluence function for edges to basic blocks whose source
-   did not change since destination was visited last time.  */
+   As an optimization we maintain ages when BB was changed (stored in
+   last_change_age) and when it was last visited (stored in last_visit_age).
+   This avoids need to re-do confluence function for edges to basic blocks
+   whose source did not change since destination was visited last time.  */
 
 static void
 df_worklist_dataflow_doublequeue (struct dataflow *dataflow,
@@ -1010,11 +1011,11 @@ df_worklist_dataflow_doublequeue (struct dataflow 
*dataflow,
   int age = 0;
   bool changed;
   vec last_visit_age = vNULL;
+  vec last_change_age = vNULL;
   int prev_age;
-  basic_block bb;
-  int i;
 
   last_visit_age.safe_grow_cleared (n_blocks);
+  last_change_age.safe_grow_cleared (n_blocks);
 
   /* Double-queueing. Worklist is for the current iteration,
  and pending is for the next. */
@@ -1032,30 +1033,30 @@ df_worklist_dataflow_doublequeue (struct dataflow 
*dataflow,
 
  bitmap_clear_bit (pending, index);
  bb_index = blocks_in_postorder[index];
- bb = BASIC_BLOCK_FOR_FN (cfun, bb_index);
  prev_age = last_visit_age[index];
  if (dir == DF_FORWARD)
   

Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-03-12 Thread Jeff Law via Gcc-patches
On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> > > I don't think each stanza of code should use it's own "noop-ness", no.
> > Richard's patch is actually better than mine in that regard as it handles 
> > mem
> > and
> > reg nop moves in an indentical way.  I don't think refactoring the cse.c 
> > code
> > is
> > advisable now though -- but I do want to fix the multiply-reported ICE on 
> > ARM
> > and
> > Richard's cse changes are the cleanest way to do that that I can see.
> 
> It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
> this patch does not make things worse.
> 
> > > I don't know if this patch makes matters worse or not.  It doesn't seem
> > > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > > rs6000, if that is true, yes I do object ;-)
> > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it
> > through
> > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86
> > native
> > and more.
> 
> I don't see anything rs6000 below?  Is it just this generic code?
> 
> > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
> > }
> >  
> >   /* Similarly, lots of targets don't allow no-op
> > -(set (mem x) (mem x)) moves.  */
> > +(set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> > +might be impossible for certain registers (like CC registers).  */
> >   else if (n_sets == 1
> > -  && MEM_P (trial)
> > +  && ! CALL_P (insn)
> > +  && (MEM_P (trial) || REG_P (trial))
> >&& MEM_P (dest)
> >&& rtx_equal_p (trial, dest)
> >&& !side_effects_p (dest)
> 
> This adds the !CALL_P (no space btw) condition, why is that?
> 
> (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
> patterns for that, or *should* have at least!)
I fixed the extraneous whitespace and committed the change.

THanks,
jeff
> 



Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread J.W. Jagersma via Gcc-patches
On 2020-03-12 10:59, Richard Sandiford wrote:
> The other case I mentioned was equivalent to:
> 
>  int
>  test_mem2 ()
>  {
>int i = 2;
>try
>  {
>asm volatile ("ud2; mov%z0 $1, %0" : "=m" (i));
>  }
>catch (const illegal_opcode&)
>  {
>if (i == 2) return 0;
>  }
>return i;
>  }
> 
> Is this test expected to pass?

Good point.  Yes, this should pass, and I thought it did, but it seems
I was mistaken.  To fix that would require transforming "=m" into "+m"
as Segher suggested.

> However, these "mem" tests are testing gimple register types, so they'll
> still be SSA names outside of the asm.  It would also be good to test what
> happens for non-register types, e.g. something like:
> 
>  int
>  test_mem3 ()
>  {
>int i[5] = { 1, 2, 3, 4, 5 };
>try
>  {
>asm volatile ("ud2; " : "=m" (i));
>  }
>catch (const illegal_opcode&)
>  {
>if (i[0] == 1 && ...) return 0;
>  }
>return ...;
>  }
> 
> and similarly for ud2 after the store.

I think I see what you mean.  Would such a test not also cover what the
current test_mem() function does?  If so then that could be removed.

Also in my current patch I used: (tree-eh.c:2104)

if (tree_could_throw_p (opval)
|| !is_gimple_reg_type (TREE_TYPE (opval))
|| !is_gimple_reg (get_base_address (opval)))

to determine the difference between a register and memory type.  Could
there potentially be a case where that identifies an operand as gimple
register type, but ends up compiled as a memory operand to an asm?

> It wasn't clear from my message above, but: I was mostly worried about
> requiring the asm to treat memory operands in a certain way when the
> exception is thrown.  IMO it would be better to say that the values of
> memory operands are undefined on the exception edge.

I'm not sure about the semantic difference between undefined and
unspecified.  But gcc should not write to any memory after a throw,
because that write operation itself may have side effects.  Likewise
asm memory operands should not be redirected to a temporary for the
same reason, and also because gcc can't possibly know which parts of
an (offsettable) memory operand are written to.


Re: [PATCH] Fix Hashtable node manipulation when custom pointer

2020-03-12 Thread Jonathan Wakely via Gcc-patches

On 12/03/20 19:33 +0100, François Dumont via Libstdc++ wrote:
I wonder if this fix is correct because if it is you spent more time 
making ext_ptr.cc works than fixing it :-)


I don't remember the details, but I think this is not correct. We need
to store the fancy pointer, not a raw pointer. Simply extracting a raw
pointer from the allocator's pointer type effectively means we don't
work with fancy pointer types.

So this makes it compile, but doesn't make it correct.

The status quo is that we don't support fancy pointers in any of our
node-based containers, but that's a bug that needs to be fixed.



    libstdc++ Hashtable: Fix node manipulation when using custom pointer


    Use std::__to_address to get NodeHandle when reinserting nodes.

    * include/bits/hashtable.h 
(_Hashtable<>::_M_reinsert_node): Use

    std::__to_address to access node pointer.
    (_Hashtable<>::_M_reinsert_node_multi): Likewise.
    (_Hashtable<>::_M_merge_unique): Likewise.
    * 
testsuite/23_containers/unordered_set/allocator/ext_ptr.cc: Run in

    C++11 and higher.
    * 
testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc: New.


It only impacts a feature not released so far so it is not a 


This code has been released since GCC 7.1.0, three years ago.


regression but we could perhaps fix it now, no ?

François




diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index b00319a668b..cbcbacef3b6 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -847,7 +847,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
else
  {
__ret.position
- = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
+ = _M_insert_unique_node(__k, __bkt, __code,
+ std::__to_address(__nh._M_ptr));
__nh._M_ptr = nullptr;
__ret.inserted = true;
  }
@@ -867,7 +868,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
const key_type& __k = __nh._M_key();
auto __code = this->_M_hash_code(__k);
auto __ret
- = _M_insert_multi_node(__hint._M_cur, __k, __code, __nh._M_ptr);
+ = _M_insert_multi_node(__hint._M_cur, __k, __code,
+std::__to_address(__nh._M_ptr));
__nh._M_ptr = nullptr;
return __ret;
  }
@@ -934,7 +936,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  if (_M_find_node(__bkt, __k, __code) == nullptr)
{
  auto __nh = __src.extract(__pos);
- _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr,
+ _M_insert_unique_node(__k, __bkt, __code,
+   std::__to_address(__nh._M_ptr),
__n_elt);
  __nh._M_ptr = nullptr;
  __n_elt = 1;
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc
new file mode 100644
index 000..3a2538eeb31
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc
@@ -0,0 +1,52 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target { c++11 } } }
+
+#include 
+#include 
+#include 
+#include 
+
+struct T { int i; };
+bool operator==(const T& l, const T& r) { return l.i == r.i; }
+struct H
+{ std::size_t operator()(const T& t) const noexcept { return t.i; } };
+struct E : std::equal_to { };
+
+using __gnu_test::CustomPointerAlloc;
+
+template class std::unordered_multiset>;
+
+void test01()
+{
+  typedef CustomPointerAlloc alloc_type;
+  typedef std::unordered_set test_type;
+  test_type v;
+  v.insert(T());
+  VERIFY( ++v.begin() == v.end() );
+
+  test_type v2 = v;
+  v2.insert(T{1});
+  v.merge(v2);
+  VERIFY( v.size() == 2 );
+}
+
+int main()
+{
+  test01();
+}
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/ext_ptr.cc 
b/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/ext_ptr.cc
index f6b908ac03e..9f1c1b2132c 100644
--- 

Re: Remove redundant zero extend

2020-03-12 Thread Jim Wilson
On Thu, Mar 12, 2020 at 2:38 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Mar 12, 2020 at 4:06 AM Jeff Law via Gcc-patches
>  wrote:
> >
> > On Wed, 2020-03-11 at 13:04 +, Nidal Faour via Gcc-patches wrote:
> > > This patch is a code density oriented and attempt to remove redundant 
> > > sign/zero
> > > extension from assignment statement.
> > > The approach taken is to use VRP data while expanding the assignment to 
> > > RTL to
> > > determine whether a sign/zero extension is necessary.
> > > Thought the motivation of the patch is code density but it also good for 
> > > speed.
> > >
> > > for example:
> > > extern unsigned int func ();
> > >
> > >   unsigned char
> > >   foo (unsigned int arg)
> > >   {
> > > if (arg == 2)
> > >   return 0;
> > >
> > > return (func() == arg || arg == 7);
> > >   }
> > >

When we reach combine, we have
  if func result == arg
 r73 = 1
  else
 r73 = arg-7 == 0
  r75 = zero_extend subreg:QI r73

On this platform a scc operation always produces 0 or 1, so the
zero_extend is redundant.  We would need something pushing the zero
extend back into the if_then_else and then noticing that it is
redundant on both branches, or something propagating the value ranges
forward.

We almost have the latter bit in combine.  Right before we start
combining instructions, when we set nonzero_sign_valid to 1, we have

(gdb) print reg_stat[73]
$1 = (reg_stat_type &) @0x28ad908: {last_death = 0x75de0500,
  last_set = 0x75de02c0, last_set_value = 0x75ddc490,
  last_set_table_tick = 6, last_set_label = 5, last_set_nonzero_bits = 1,
  last_set_sign_bit_copies = 31 '\037', last_set_mode = E_SImode,
  last_set_invalid = 0 '\000', sign_bit_copies = 31 '\037',
  nonzero_bits = 1, truncation_label = 0, truncated_to_mode = E_VOIDmode}
(gdb)

so we know that r73 has 31 valid sign bits and only one bit is
nonzero.  This info is still valid when we reach the zero extend insn.
Unfortunately, we don't use this info to do anything useful.  The
zero_extend is the only insn in its basic block (not counting the
unconditional branch that ends the block), so it doesn't get combined
with anything, and hence doesn't get simplified.  If it did get
combined with something, then expand_compound_operation would
eliminate the zero_extend and subreg.  That makes me wonder if there
is any value in trying to handle single-instruction combinations, just
to get the simplifications we can get from the nonzero_bits info, but
it isn't obvious how we would detect when a single insn combine is
better than the original insn.  Maybe rtx cost info can be used for
that.

I looked at combine because I'm familiar with that pass, but the ree
pass might be the right place to handle this.  I don't know if it has
any support for handling if statements.  If not, maybe it could be
extended to handle cases like this.

Jim


Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-03-12 Thread Jeff Law via Gcc-patches
On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:
> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> > > >   else if (n_sets == 1
> > > > -  && MEM_P (trial)
> > > > +  && ! CALL_P (insn)
> > > > +  && (MEM_P (trial) || REG_P (trial))
> > > >&& MEM_P (dest)
> > > >&& rtx_equal_p (trial, dest)
> > > >&& !side_effects_p (dest)
> > > 
> > > This adds the !CALL_P (no space btw) condition, why is that?
> > Because n_sets is not valid for CALL_P insns which resulted in a failure on
> > ppc. 
> > See find_sets_in_insn which ignores the set on the LHS of a call.  So 
> > imagine
> > if
> > we had a nop register set in parallel with a (set (reg) (call ...)).  We'd
> > end up
> > deleting the entire PARALLEL which is obviously wrong.
> 
> Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE
> sees a TOC set parallel with a call as a no-op because it is set to the
> same value (an unspec, not an unspec_volatile) that GCC can derive is
> already in the TOC reg?  Or is this some other case?
Not entirely sure.  Richard's message didn't include the precise details. 

> 
> The change sounds fine, fwiw.
> 
> > One could argue that find_sets_in_insn should be fixed as well.  I'd be
> > worried
> > about fallout from that.
> 
> Should it ignore all SETs in parallel with a call?  Or what do you want
> to fix there?
I was thinking the other way.  Make it include sets even those for the function
return value.  Having n_sets's meaning be different for CALL_INSNs vs other 
INSNs
just seems like a poor implementation decision because of the inconsistency.

jeff



Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-03-12 Thread Segher Boessenkool
On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:
> On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> > > else if (n_sets == 1
> > > -&& MEM_P (trial)
> > > +&& ! CALL_P (insn)
> > > +&& (MEM_P (trial) || REG_P (trial))
> > >  && MEM_P (dest)
> > >  && rtx_equal_p (trial, dest)
> > >  && !side_effects_p (dest)
> > 
> > This adds the !CALL_P (no space btw) condition, why is that?
> Because n_sets is not valid for CALL_P insns which resulted in a failure on 
> ppc. 
> See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine 
> if
> we had a nop register set in parallel with a (set (reg) (call ...)).  We'd 
> end up
> deleting the entire PARALLEL which is obviously wrong.

Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE
sees a TOC set parallel with a call as a no-op because it is set to the
same value (an unspec, not an unspec_volatile) that GCC can derive is
already in the TOC reg?  Or is this some other case?

The change sounds fine, fwiw.

> One could argue that find_sets_in_insn should be fixed as well.  I'd be 
> worried
> about fallout from that.

Should it ignore all SETs in parallel with a call?  Or what do you want
to fix there?


Segher


Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2020-03-12 Thread Sandra Loosemore

On 1/27/20 2:02 PM, Jeff Law wrote:


[snip]

While I think we've missed the boat for gcc-10, I think these patches 
should go forward in gcc-11. I'll own getting the paths sorted so that 
this problem is avoided.


I recently retested these patches on trunk, and I found some new 
regressions in the analyzer tests.


FAIL: default/gcc.sum:gcc.dg/analyzer/edges-1.c  (test for bogus 
messages, line 16)


This one may be a genuine analyzer bug?  It is apparently failing to 
prune the control flow paths per commit 
004f2c07b6d3fa543f0fe86c55a7b3c227de2bb6.


PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c  (test for 
warnings, line 39)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c  (test for 
warnings, line 46)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c (test for 
excess errors)


This one is hitting the "--Wanalyzer-too-complex" diagnostic and could 
probably be fixed by adjusting the parameters at the top of the testcase.


PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c  (test for 
warnings, line 28)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c  (test for 
warnings, line 37)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c (test for excess 
errors)


It's now printing "2 processed enodes" in both warnings instead of 3 or 
4 (respectively).  At first glance, it's not obvious what the 
significance of the the things being tested here is, so I'm not sure if 
this is an analyzer bug or just patterns that are too fragile.


David, WDYT?

-Sandra


Re: [PATCH] drop weakref attribute on function definitions (PR 92799)

2020-03-12 Thread Jeff Law via Gcc-patches
On Mon, 2020-03-09 at 14:33 -0600, Martin Sebor wrote:
> On 3/5/20 5:26 PM, Jeff Law wrote:
> > On Fri, 2020-02-14 at 15:41 -0700, Martin Sebor wrote:
> > > Because attribute weakref introduces a kind of a definition, it can
> > > only be applied to declarations of symbols that are not defined.  GCC
> > > normally issues a warning when the attribute is applied to a defined
> > > symbol, but PR 92799 shows that it misses some cases on which it then
> > > leads to an ICE.
> > > 
> > > The ICE was introduced in GCC 4.5.  Prior to then, GCC accepted such
> > > invalid definitions and silently dropped the weakref attribute.
> > > 
> > > The attached patch avoids the ICE while again dropping the invalid
> > > attribute from the definition, except with the (now) usual warning.
> > > 
> > > Tested on x86_64-linux.
> > > 
> > > I also looked for code bases that make use of attribute weakref to
> > > rebuild them as another test but couldn't find any.  (There are
> > > a couple of instances in the Linux kernel but they look #ifdef'd
> > > out).  Does anyone know of any that do use it that I could try to
> > > build on Linux?
> > So you added this check
> > 
> > ... || DECL_INITIAL (decl) != error_mark_node
> > 
> > Do you need to check that DECL_INITIAL is not NULL?  IIUC DECL_INITIAL in
> > this
> > context is a tri-state.
> > 
> > NULL -- DECL is not a function definition
> > error_mark_node -- it was a function definition, but the body was free'd
> > everything else -- the function definition
> 
> I've only seen two values come up for a function declared weakref in
> the test suite: error_mark_node and something with the TREE_CODE of
> BLOCK (the block where the weakref function is used when it's also
> explicitly defined in the code, and when the attribute is subsequently
> diagnosed by the warning).
So when it's a BLOCK it's giving you the outermost block scope for the function
which we usually use to generate debug info.

The key point is that we use ERROR_MARK_NODE because a non-NULL DECL_INITIAL
denotes an actual function definition, so we can't set it to NULL blindly.  See
cgraph_node::expand, cgraph_node:release_body, rest_of_handle_final and perhaps
other places.  Another example would be setting up the ifunc resolver.  It's a
real function so DECL_INITIAL must be non-NULL, but we don't really have a block
structure we can point to, so instead we set it to error_mark_node
(make_dispatcher_decl).

I wonder if the earlier node->definition check ultimately prevents DECL_INITIAL
from being NULL at this point.  According to cgraph.h that field is true when 
the
symbol corresponds to a definition in the current unit.  That would seem to
indicate yes.

So I think we can go forward with your patch as-is.

jeff




Re: [PATCH] RX removed control register cpen

2020-03-12 Thread Jeff Law via Gcc-patches
On Thu, 2020-03-12 at 19:49 +0200, Darius Galis wrote:
> Hello,
> 
> The following patch removes the CTRLREG_CPEN register variable.
> According to the RX software manual: 
> https://www.renesas.com/cn/en/doc/products/mpumcu/doc/rx_family/r01us0316ej0100-rxv3sm.pdf
> there is no control register called cpen.
> 
> Regression test is OK, without additional fails, and was tested with the
> following command:
> 
> make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim
> 
> Please let me know if this is OK, Thank you!
> Darius
> 
>  From ce78a5cde96d2ace6e06733320edbfb0c59c0e88 Mon Sep 17 00:00:00 2001
> From: Darius 
> Date: Tue, 10 Mar 2020 18:21:20 +0200
> Subject: Removed CTRLREG_CPEN register
> 
> ---
>   gcc/ChangeLog   | 5 +
>   gcc/config/rx/rx.c  | 1 -
>   gcc/config/rx/rx.md | 1 -
>   3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 6c4a505..ea9bbb2 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2020-03-10  Darius Galis  
> +
> + * config/rx/rx.md: Remove CTRLREG_CPEN
> + * config/rx/rx.c: Likewise.
THanks.  I verified that the CPEN register was removed in 2009.  Installed on 
the
trunk.

jeff
> 



[PATCH] c++: Redundant -Wdeprecated-declarations warning in build_over_call (PR 67960)

2020-03-12 Thread Patrick Palka via Gcc-patches
In build_over_call, we are emitting a redundant -Wdeprecated-declarations
warning about the deprecated callee function, first from mark_used and again
from build_addr_func <- decay_conversion <- cp_build_addr_expr <- mark_used.

It seems this warning coming from build_addr_func will always be redundant, so
we can safely use a warning_sentinel to disable the warning before calling
build_addr_func.  (And any deprecation warning that could come from
build_addr_func would be for FN, and so we wouldn't be suppressing too much.)

Tested on x86_64-pc-linux-gnu.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/67960
* call.c (build_over_call): Use a warning_sentinel to disable
warn_deprecated_decl before calling build_addr_func.

gcc/testsuite/ChangeLog:

PR c++/67960
g++.dg/diagnostic/pr67960.C: New test.
g++.dg/diagnostic/pr67960-2.C: New test.
---
 gcc/cp/call.c   |  5 +
 gcc/testsuite/g++.dg/diagnostic/pr67960-2.C | 13 +
 gcc/testsuite/g++.dg/diagnostic/pr67960.C   | 13 +
 3 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr67960-2.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/pr67960.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 5767a8b8c4e..2f6cfd85175 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9062,6 +9062,11 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
 }
   else
 {
+  /* If FN is marked deprecated, then we've already issued a deprecated-use
+warning from mark_used above, so avoid reduntantly issuing another one
+from build_addr_func.  */
+  warning_sentinel w (warn_deprecated_decl);
+
   fn = build_addr_func (fn, complain);
   if (fn == error_mark_node)
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr67960-2.C 
b/gcc/testsuite/g++.dg/diagnostic/pr67960-2.C
new file mode 100644
index 000..d2ba509534f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr67960-2.C
@@ -0,0 +1,13 @@
+// PR c++/67960
+// { dg-do compile }
+// { dg-additional-options "-Werror -fmax-errors=1" }
+__attribute__((deprecated)) void doNothing(){}
+
+int
+main()
+{
+  doNothing(); // { dg-error "is deprecated" }
+}
+
+// { dg-message "all warnings being treated as errors" "" { target *-*-* } 0 }
+// { dg-bogus "compilation terminated" "" { target *-*-* } 0 }
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr67960.C 
b/gcc/testsuite/g++.dg/diagnostic/pr67960.C
new file mode 100644
index 000..d7b1a2d0811
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr67960.C
@@ -0,0 +1,13 @@
+// PR c++/67960
+// { dg-do compile { target c++14 } }
+// { dg-additional-options "-Werror -fmax-errors=1" }
+[[deprecated]] void doNothing(){}
+
+int
+main()
+{
+  doNothing(); // { dg-error "is deprecated" }
+}
+
+// { dg-message "all warnings being treated as errors" "" { target *-*-* } 0 }
+// { dg-bogus "compilation terminated" "" { target *-*-* } 0 }
-- 
2.26.0.rc1



Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread Segher Boessenkool
On Thu, Mar 12, 2020 at 07:42:30PM +0100, J.W. Jagersma wrote:
> On 2020-03-12 16:32, Segher Boessenkool wrote:
> > On Thu, Mar 12, 2020 at 02:08:18PM +0100, Richard Biener wrote:
> >>> It wasn't clear from my message above, but: I was mostly worried about
> >>> requiring the asm to treat memory operands in a certain way when the
> >>> exception is thrown.  IMO it would be better to say that the values of
> >>> memory operands are undefined on the exception edge.
> >>
> >> Rather unspecified.  So IMHO on the exception edge the asm() should
> >> still appear as a def for all outputs so the compiler cannot derive any
> >> actual values for them.  That of course also means that they must not
> >> appear to be must-defs since we may not DSE earlier stores for example.
> > 
> > So make all outputs of an asm that may throw (w/ -fnon-call-exceptions)
> > inout operands instead?  That works for registers exactly the same, too?
> 
> Should gcc do this or would it be the user's responsibility?  For
> memory operands I don't think anything changes if you replace "=m" by
> "+m".  However changing each "=r" to "+r" would certainly generate less
> optimal code.

[  "+m"(x)  means exactly the same as  "=m"(x) : "m"(x)  ]

Yes, but this is required for all operands that could keep their old
value on exceptions.  Since this will only happen with
-fnon-call-exceptions, which already has a significant performance cost,
I don't think adding just a tiny bit more is so bad?  The benefit is
that all asm will work, the user cannot do this wrong anymore.


Segher


[PR c++/94147] Lamdas attached to global variables

2020-03-12 Thread Nathan Sidwell
This patch implements Jason's suggestion of pushing a lambda scope when 
parsing a global variable initializer.  That bit worked fine, but 
happened to cause g++.dg/opt/dump1.C to not give any 
used-but-not-defined warnings.


The reason was no_linkage_check, which considers any lambda that has an 
extra-scope to have linkage.  Which is technically correct.  Except that 
we think that all types that have linkage have external linkage.


Our representation of linkage and visibility is somewhat inaccurate, 
particularly when it comes to types.  We have TREE_PUBLIC, 
DECL_EXTERNAL, DECL_VISIBILITY, DECL_COMDAT, DECL_NOT_REALLY_EXTERN.  It 
could really do with a through cleanup, but that won't be a simple task.


The best I could come up with was seeing if the extra scope was a 
VAR_DECL, and if that was TREE_PUBLIC and the var was inline (its 
COMDATness is sadly not set at that point) or a template instantiation, 
then the lambda had linkage.  Otherwise it's as-if it has no-linkage 
from the POV of compiler internals.


This is an ABI change (so we should document it), but it's changing 
mangling from an unpredictable (in practice) counter, to something the 
ABI defines.  So I'm not concerned about mangling-changed warnings, or 
preserving the broken mangling under some ABI selection flag.  Code that 
did this worked by accident within a single TU.  It'll continue to work 
by design there, and across TUs.


booted on x86_64-linux, I'll commit in a few days if there are no comments.

nathan

--
Nathan Sidwell
2020-03-12  Nathan Sidwell  

	PR c++/94147 - mangling of lambdas assigned to globals
	* parser.c (cp_parser_init_declarator): Namespace-scope variables
	provide a lambda scope.
	* tree.c (no_linkage_check): Lambdas with a variable for extra
	scope have a linkage from the variable.

diff --git c/gcc/cp/parser.c w/gcc/cp/parser.c
index 24f71671469..91dc8ea46c3 100644
--- c/gcc/cp/parser.c
+++ w/gcc/cp/parser.c
@@ -20771,16 +20771,24 @@ cp_parser_init_declarator (cp_parser* parser,
   else
 	{
 	  /* We want to record the extra mangling scope for in-class
-	 initializers of class members and initializers of static data
-	 member templates.  The former involves deferring
-	 parsing of the initializer until end of class as with default
-	 arguments.  So right here we only handle the latter.  */
-	  if (!member_p && processing_template_decl && decl != error_mark_node)
+	 initializers of class members and initializers of static
+	 data member templates and namespace-scope initializers.
+	 The former involves deferring parsing of the initializer
+	 until end of class as with default arguments.  So right
+	 here we only handle the latter two.  */
+	  bool has_lambda_scope = false;
+
+	  if (decl != error_mark_node
+	  && !member_p
+	  && (processing_template_decl || DECL_NAMESPACE_SCOPE_P (decl)))
+	has_lambda_scope = true;
+
+	  if (has_lambda_scope)
 	start_lambda_scope (decl);
 	  initializer = cp_parser_initializer (parser,
 	   _direct_init,
 	   _non_constant_init);
-	  if (!member_p && processing_template_decl && decl != error_mark_node)
+	  if (has_lambda_scope)
 	finish_lambda_scope ();
 	  if (initializer == error_mark_node)
 	cp_parser_skip_to_end_of_statement (parser);
diff --git c/gcc/cp/tree.c w/gcc/cp/tree.c
index a412345e3bf..da2e7fdcca3 100644
--- c/gcc/cp/tree.c
+++ w/gcc/cp/tree.c
@@ -2794,9 +2794,23 @@ no_linkage_check (tree t, bool relaxed_p)
  fix it up later if not.  We need to check this even in templates so
  that we properly handle a lambda-expression in the signature.  */
   if (LAMBDA_TYPE_P (t)
-  && CLASSTYPE_LAMBDA_EXPR (t) != error_mark_node
-  && LAMBDA_TYPE_EXTRA_SCOPE (t) == NULL_TREE)
-return t;
+  && CLASSTYPE_LAMBDA_EXPR (t) != error_mark_node)
+{
+  tree extra = LAMBDA_TYPE_EXTRA_SCOPE (t);
+  if (!extra)
+	return t;
+
+  /* If the mangling scope is internal-linkage or not repeatable
+	 elsewhere, the lambda effectively has no linkage.  (Sadly
+	 we're not very careful with the linkages of types.)  */
+  if (TREE_CODE (extra) == VAR_DECL
+	  && !(TREE_PUBLIC (extra)
+	   && (processing_template_decl
+		   || (DECL_LANG_SPECIFIC (extra) && DECL_USE_TEMPLATE (extra))
+		   /* DECL_COMDAT is set too late for us to check.  */
+		   || DECL_VAR_DECLARED_INLINE_P (extra
+	return t;
+}
 
   /* Otherwise there's no point in checking linkage on template functions; we
  can't know their complete types.  */
diff --git c/gcc/testsuite/g++.dg/abi/lambda-vis.C w/gcc/testsuite/g++.dg/abi/lambda-vis.C
new file mode 100644
index 000..c3eb157dc20
--- /dev/null
+++ w/gcc/testsuite/g++.dg/abi/lambda-vis.C
@@ -0,0 +1,23 @@
+// { dg-do compile { target c++17 } }
+// { dg-options "-fno-inline" }
+
+template int sfoo (T); // { dg-warning "used but never defined" }
+template int gfoo (T); // { dg-warning "used but never defined" }
+template int ifoo 

Re: Re: [Fortran, OpenACC] Reject vars of different scope in $acc declare (PR94120)

2020-03-12 Thread Paul Richard Thomas via Gcc-patches
I would swear that I selected "reply all" but... Never mind, you put it right!

Cheers

Paul

On Thu, 12 Mar 2020 at 08:44, Tobias Burnus  wrote:
>
> (I assume that should have gone to gcc-patches@ and fortran@ as well.)
>
>
>  Forwarded Message 
> Subject:Re: [Fortran, OpenACC] Reject vars of different scope in $acc
> declare (PR94120)
> Date:   Tue, 10 Mar 2020 18:02:41 +
> From:   Paul Richard Thomas 
> To: Tobias Burnus 
>
>
>
> Hi Tobias,
>
> This looks to be straightforward enough to me :-) OK for trunk.
>
> Thanks
>
> Paul
>
> On Tue, 10 Mar 2020 at 17:18, Tobias Burnus  wrote:
> > [This fixes a bunch of issues found when actually only
> > wanting to add a check for the following restriction.]
> >
> > OpenACC's "Declare Directive" has the following restriction:
> >
> > "A declare directive must be in the same scope
> >as the declaration of any var that appears in
> >the data clauses of the directive."
> >
> > The gfortran now rejects a "var" declared is in a different
> > namespace than the "$acc declare". (Use-associated variables
> > are already rejected.) Testing showed that a straight-forward
> > check fails if the result variable is the function name – as
> > then the function symbol is in the parent scope. — Extending
> > the failing test to use a result variable showed that the
> > current is-a-module-variable check didn't work and when fixing,
> > one was running into a likewise issue.
> >
> > The reason that I exclude 's' being a module is that at
> > resolution time, an is-variable check is done.
> >
> >
> > The other changes are because the following restriction was
> > mishandled:
> >
> > "In a Fortran module declaration section, only
> >create, copyin, device_resident, and link clauses are allowed."
> >
> > But all examples where for variables using those in a module
> > procedure …
> >
> >
> > OK for the trunk?
> >
> > Cheers,
> >
> > Tobias
> >
> > PS: For those who wounder what happens in a BLOCK DATA construct:
> > gfortran outrightly rejects 'acc declare'. (It probably should
> > work for COMMON blocks with 'declare device_resident' – but
> > currently the spec only permits it in program/subroutine/function
> > + declaration part of a module.)
> >
> > PPS: The PR shows (for C) that one can construct a test case,
> > which violates the OpenACC restriction and actually fails with
> > an ICE. I have a draft patch for C (see PR) but not yet one for
> > C++, hence, I start with the Fortran side. – I currently still
> > struggle to write a same-scope check in C++.
> > [The C test case in turn was a fallout of debugging an
> > ICE-on-valid-code issue …]
> >
> > -
> > Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / 
> > Germany
> > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> > Alexander Walter
>
>
>
> --
> "If you can't explain it simply, you don't understand it well enough"
> - Albert Einstein
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-03-12 Thread Jeff Law via Gcc-patches
On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> 
> > > I don't know if this patch makes matters worse or not.  It doesn't seem
> > > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > > rs6000, if that is true, yes I do object ;-)
> > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it
> > through
> > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86
> > native
> > and more.
> 
> I don't see anything rs6000 below?  Is it just this generic code?
It's just generic code.  THe rs6000 issue is fixed by the !CALL_P condition.

> 
> > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
> > }
> >  
> >   /* Similarly, lots of targets don't allow no-op
> > -(set (mem x) (mem x)) moves.  */
> > +(set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> > +might be impossible for certain registers (like CC registers).  */
> >   else if (n_sets == 1
> > -  && MEM_P (trial)
> > +  && ! CALL_P (insn)
> > +  && (MEM_P (trial) || REG_P (trial))
> >&& MEM_P (dest)
> >&& rtx_equal_p (trial, dest)
> >&& !side_effects_p (dest)
> 
> This adds the !CALL_P (no space btw) condition, why is that?
Because n_sets is not valid for CALL_P insns which resulted in a failure on 
ppc. 
See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine if
we had a nop register set in parallel with a (set (reg) (call ...)).  We'd end 
up
deleting the entire PARALLEL which is obviously wrong.

One could argue that find_sets_in_insn should be fixed as well.  I'd be worried
about fallout from that.

jeff



Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread J.W. Jagersma via Gcc-patches
On 2020-03-12 16:32, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 02:08:18PM +0100, Richard Biener wrote:
>>> It wasn't clear from my message above, but: I was mostly worried about
>>> requiring the asm to treat memory operands in a certain way when the
>>> exception is thrown.  IMO it would be better to say that the values of
>>> memory operands are undefined on the exception edge.
>>
>> Rather unspecified.  So IMHO on the exception edge the asm() should
>> still appear as a def for all outputs so the compiler cannot derive any
>> actual values for them.  That of course also means that they must not
>> appear to be must-defs since we may not DSE earlier stores for example.
> 
> So make all outputs of an asm that may throw (w/ -fnon-call-exceptions)
> inout operands instead?  That works for registers exactly the same, too?

Should gcc do this or would it be the user's responsibility?  For
memory operands I don't think anything changes if you replace "=m" by
"+m".  However changing each "=r" to "+r" would certainly generate less
optimal code.

>> If we manage to get the unspecified values correct in SSA then we don't
>> need to say whether the asm may clobber them before or after throwing.
> 
> Yeah.  Which users will *never* get right, that is, it would be hard to
> use any such interface correctly.
> 
> 
> Segher
> 


[PATCH] Fix Hashtable node manipulation when custom pointer

2020-03-12 Thread François Dumont via Gcc-patches
I wonder if this fix is correct because if it is you spent more time 
making ext_ptr.cc works than fixing it :-)


    libstdc++ Hashtable: Fix node manipulation when using custom pointer


    Use std::__to_address to get NodeHandle when reinserting nodes.

    * include/bits/hashtable.h 
(_Hashtable<>::_M_reinsert_node): Use

    std::__to_address to access node pointer.
    (_Hashtable<>::_M_reinsert_node_multi): Likewise.
    (_Hashtable<>::_M_merge_unique): Likewise.
    * 
testsuite/23_containers/unordered_set/allocator/ext_ptr.cc: Run in

    C++11 and higher.
    * 
testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc: New.


It only impacts a feature not released so far so it is not a regression 
but we could perhaps fix it now, no ?


François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index b00319a668b..cbcbacef3b6 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -847,7 +847,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	else
 	  {
 		__ret.position
-		  = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
+		  = _M_insert_unique_node(__k, __bkt, __code,
+	  std::__to_address(__nh._M_ptr));
 		__nh._M_ptr = nullptr;
 		__ret.inserted = true;
 	  }
@@ -867,7 +868,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	const key_type& __k = __nh._M_key();
 	auto __code = this->_M_hash_code(__k);
 	auto __ret
-	  = _M_insert_multi_node(__hint._M_cur, __k, __code, __nh._M_ptr);
+	  = _M_insert_multi_node(__hint._M_cur, __k, __code,
+ std::__to_address(__nh._M_ptr));
 	__nh._M_ptr = nullptr;
 	return __ret;
   }
@@ -934,7 +936,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  if (_M_find_node(__bkt, __k, __code) == nullptr)
 		{
 		  auto __nh = __src.extract(__pos);
-		  _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr,
+		  _M_insert_unique_node(__k, __bkt, __code,
+	std::__to_address(__nh._M_ptr),
 	__n_elt);
 		  __nh._M_ptr = nullptr;
 		  __n_elt = 1;
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc b/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc
new file mode 100644
index 000..3a2538eeb31
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multiset/allocator/ext_ptr.cc
@@ -0,0 +1,52 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target { c++11 } } }
+
+#include 
+#include 
+#include 
+#include 
+
+struct T { int i; };
+bool operator==(const T& l, const T& r) { return l.i == r.i; }
+struct H
+{ std::size_t operator()(const T& t) const noexcept { return t.i; } };
+struct E : std::equal_to { };
+
+using __gnu_test::CustomPointerAlloc;
+
+template class std::unordered_multiset>;
+
+void test01()
+{
+  typedef CustomPointerAlloc alloc_type;
+  typedef std::unordered_set test_type;
+  test_type v;
+  v.insert(T());
+  VERIFY( ++v.begin() == v.end() );
+
+  test_type v2 = v;
+  v2.insert(T{1});
+  v.merge(v2);
+  VERIFY( v.size() == 2 );
+}
+
+int main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/ext_ptr.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/ext_ptr.cc
index f6b908ac03e..9f1c1b2132c 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/ext_ptr.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_set/allocator/ext_ptr.cc
@@ -15,10 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// This test fails to compile since C++17 (see xfail-if below) so we can only
-// do a "run" test for C++11 and C++14, and a "compile" test for C++17 and up.
-// { dg-do run { target { c++11_only || c++14_only } } }
-// { dg-do compile { target c++17 } }
+// { dg-do run { target { c++11 } } }
 
 #include 
 #include 
@@ -27,14 +24,12 @@
 
 struct T { int i; };
 bool operator==(const T& l, const T& r) { return l.i == r.i; }
-struct H { std::size_t operator()(const T& t) const noexcept { return t.i; }
-};
+struct H
+{ std::size_t operator()(const T& t) const noexcept { return t.i; } };
 struct E : std::equal_to { };
 
 using __gnu_test::CustomPointerAlloc;
 
-// { 

Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-03-12 Thread Segher Boessenkool
On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> > I don't think each stanza of code should use it's own "noop-ness", no.
> Richard's patch is actually better than mine in that regard as it handles mem 
> and
> reg nop moves in an indentical way.  I don't think refactoring the cse.c code 
> is
> advisable now though -- but I do want to fix the multiply-reported ICE on ARM 
> and
> Richard's cse changes are the cleanest way to do that that I can see.

It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
this patch does not make things worse.

> > I don't know if this patch makes matters worse or not.  It doesn't seem
> > suitable for stage 4 though.  And Richard said the cse.c part breaks
> > rs6000, if that is true, yes I do object ;-)
> The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it 
> through
> my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native
> and more.

I don't see anything rs6000 below?  Is it just this generic code?

> @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
>   }
>  
> /* Similarly, lots of targets don't allow no-op
> -  (set (mem x) (mem x)) moves.  */
> +  (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
> +  might be impossible for certain registers (like CC registers).  */
> else if (n_sets == 1
> -&& MEM_P (trial)
> +&& ! CALL_P (insn)
> +&& (MEM_P (trial) || REG_P (trial))
>  && MEM_P (dest)
>  && rtx_equal_p (trial, dest)
>  && !side_effects_p (dest)

This adds the !CALL_P (no space btw) condition, why is that?

(Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
patterns for that, or *should* have at least!)


Segher


Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-03-12 Thread Jeff Law via Gcc-patches
On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote:
> > On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:
> > > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> > > > Yea, it's closely related.  In your case you need to effectively ignore
> > > > the nop insn to get the optimization you want.  In mine that nop insn
> > > > causes an ICE.
> > > > 
> > > > I think we could take your cse bits + adding a !CALL_P separately from
> > > > the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> > > > the ARM ICEs and take you a tiny step forward on optimizing that SVE
> > > > case.  Thoughts?
> > > 
> > > CSE should consistently keep track of what insns are no-op moves (in its
> > > definition, all passes have a slightly different definition of this),
> > > and use that everywhere consistently.
> > So does that mean you object to the cse.c portion of Richard's patch?
> 
> It's more a "what we need to do in the future" thing, it is stage 4 now,
> it is too big a change to do now.
I suspect you're referring to the simplify-rtx bits from his patch which I agree
are not appropriate for stage4.  The cse bits from that patch are are simple.

> 
> What patch?  The "34" patch?  https://gcc.gnu.org/r278411 .
> 
> I don't think each stanza of code should use it's own "noop-ness", no.
Richard's patch is actually better than mine in that regard as it handles mem 
and
reg nop moves in an indentical way.  I don't think refactoring the cse.c code is
advisable now though -- but I do want to fix the multiply-reported ICE on ARM 
and
Richard's cse changes are the cleanest way to do that that I can see.



> 
> I don't know if this patch makes matters worse or not.  It doesn't seem
> suitable for stage 4 though.  And Richard said the cse.c part breaks
> rs6000, if that is true, yes I do object ;-)
The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it 
through
my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native
and more.

Concretely I'm proposing the following patch to address 90275 and its 
duplicates.


PR rtl-optimization/90275
* cse.c (cse_insn): Delete no-op register moves too.

PR rtl-optimization/90275
* gcc.c-torture/compile/pr90275.c: New test.

diff --git a/gcc/cse.c b/gcc/cse.c
index 79ee0ce80e3..1779bb9a333 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4625,7 +4625,7 @@ cse_insn (rtx_insn *insn)
   for (i = 0; i < n_sets; i++)
 {
   bool repeat = false;
-  bool mem_noop_insn = false;
+  bool noop_insn = false;
   rtx src, dest;
   rtx src_folded;
   struct table_elt *elt = 0, *p;
@@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
}
 
  /* Similarly, lots of targets don't allow no-op
-(set (mem x) (mem x)) moves.  */
+(set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
+might be impossible for certain registers (like CC registers).  */
  else if (n_sets == 1
-  && MEM_P (trial)
+  && ! CALL_P (insn)
+  && (MEM_P (trial) || REG_P (trial))
   && MEM_P (dest)
   && rtx_equal_p (trial, dest)
   && !side_effects_p (dest)
@@ -5334,7 +5336,7 @@ cse_insn (rtx_insn *insn)
   || insn_nothrow_p (insn)))
{
  SET_SRC (sets[i].rtl) = trial;
- mem_noop_insn = true;
+ noop_insn = true;
  break;
}
 
@@ -5562,8 +5564,8 @@ cse_insn (rtx_insn *insn)
  sets[i].rtl = 0;
}
 
-  /* Similarly for no-op MEM moves.  */
-  else if (mem_noop_insn)
+  /* Similarly for no-op moves.  */
+  else if (noop_insn)
{
  if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
cse_cfg_altered = true;
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c 
b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
new file mode 100644
index 000..83e0df77226
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
@@ -0,0 +1,27 @@
+a, b, c;
+
+long long d;
+
+e() {
+
+  char f;
+
+  for (;;) {
+
+c = a = c ? 5 : 0;
+
+if (f) {
+
+  b = a;
+
+  f = d;
+
+}
+
+(d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
+
+  }
+
+}
+
+


Re: Re: [PATCH,rs6000] Add command line and builtin compatibility

2020-03-12 Thread Segher Boessenkool
Hi Carl,

On Thu, Mar 12, 2020 at 09:57:06AM -0700, Carl Love wrote:
> > From the GCC manual:
> > 
> >   -mmfcrfp4  2.01
> >   -mpopcntb  p5  2.02
> >   -mfprndp5+ 2.04  ("info gcc" says 2.03, that's wrong?  But the
> > ISA
> > says this is 2.02 even?  Now what!)
> >   -mcmpb p6  2.05
> >   -mpopcntd  p7  2.06
> > 
> > (and there are more, in fact).
> 
> 
> 
> > It would be better if you disallowed this option combination?  Don't
> > allow an options that says ISA X insns are allowed, but ISA Y insns
> > are
> > not, with Y < X.  In this case X is 2.06 and Y is 2.02 or 2.03 or
> > 2.04.
> 
> My fix was very specific to the builtin and the command line argument
> as pointed out in the bug report.  
> 
> Sounds like you are implying we should take a much more general
> approach to checking the command line args against the ISA level.  Such
> a test would need to go somewhere else not in the builtin expansion
> call.  We may want a routine that just does these checks which can then
> be called from the appropriate place.  Let me go look at the GCC manual
> and see about what all needs testing.

Yes, but only for this fprnd vs. 2.06 (vsx) situation.  Like we already
have:

  if (TARGET_DIRECT_MOVE && !TARGET_VSX)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE)
error ("%qs requires %qs", "-mdirect-move", "-mvsx");
  rs6000_isa_flags &= ~OPTION_MASK_DIRECT_MOVE;
}

(and many other cases there), we could do this there as well (so, don't
allow -mvsx (maybe via a -mcpu= etc.) at the same time as -mno-fprnd).

Do you see problems with that?


Segher


[PATCH] RX removed control register cpen

2020-03-12 Thread Darius Galis

Hello,

The following patch removes the CTRLREG_CPEN register variable.
According to the RX software manual: 
https://www.renesas.com/cn/en/doc/products/mpumcu/doc/rx_family/r01us0316ej0100-rxv3sm.pdf
there is no control register called cpen.

Regression test is OK, without additional fails, and was tested with the
following command:

make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim

Please let me know if this is OK, Thank you!
Darius

From ce78a5cde96d2ace6e06733320edbfb0c59c0e88 Mon Sep 17 00:00:00 2001
From: Darius 
Date: Tue, 10 Mar 2020 18:21:20 +0200
Subject: Removed CTRLREG_CPEN register

---
 gcc/ChangeLog   | 5 +
 gcc/config/rx/rx.c  | 1 -
 gcc/config/rx/rx.md | 1 -
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6c4a505..ea9bbb2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-03-10  Darius Galis  
+
+   * config/rx/rx.md: Remove CTRLREG_CPEN
+   * config/rx/rx.c: Likewise.
+
 2020-03-09  Martin Liska  
 
 	PR target/93800

diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index 0a51124..151ad39 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -641,7 +641,6 @@ rx_print_operand (FILE * file, rtx op, int letter)
case CTRLREG_PSW:   fprintf (file, "psw"); break;
case CTRLREG_USP:   fprintf (file, "usp"); break;
case CTRLREG_FPSW:  fprintf (file, "fpsw"); break;
-   case CTRLREG_CPEN:  fprintf (file, "cpen"); break;
case CTRLREG_BPSW:  fprintf (file, "bpsw"); break;
case CTRLREG_BPC:   fprintf (file, "bpc"); break;
case CTRLREG_ISP:   fprintf (file, "isp"); break;
diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md
index 0cf5025..df08a9e 100644
--- a/gcc/config/rx/rx.md
+++ b/gcc/config/rx/rx.md
@@ -79,7 +79,6 @@
(CTRLREG_PSW0)
(CTRLREG_USP2)
(CTRLREG_FPSW   3)
-   (CTRLREG_CPEN   4)
(CTRLREG_BPSW   8)
(CTRLREG_BPC9)
(CTRLREG_ISP   10)
--
1.9.1

--
Ing. Darius Galiș
Software Engineer - CyberTHOR Studios Ltd.



[committed] maintainer-scripts: Fix up gcc_release without -l, where mkdir was using umask 077 after migration

2020-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

After the sourceware migration, seems the gcc_release -u gccadmin ...
created directories had 2700 permissions instead of 2755.

Fxed thusly, committed to trunk.

2020-03-12  Jakub Jelinek  

* gcc_release (upload_files): Without -l, pass -m 755 to the mkdir
command invoked through ssh.

--- maintainer-scripts/gcc_release.jj   2020-02-27 09:31:57.481096944 +0100
+++ maintainer-scripts/gcc_release  2020-03-12 18:12:54.061688369 +0100
@@ -398,7 +398,7 @@ upload_files() {
   # Make sure the directory exists on the server.
   if [ $LOCAL -eq 0 ]; then
 ${SSH} -l ${GCC_USERNAME} ${GCC_HOSTNAME} \
-  mkdir -p "${FTP_PATH}/diffs"
+  mkdir -m 755 -p "${FTP_PATH}/diffs"
 UPLOAD_PATH="${GCC_USERNAME}@${GCC_HOSTNAME}:${FTP_PATH}"
   else
 mkdir -p "${FTP_PATH}/diffs" \

Jakub



Re: [PATCH] avoid -Wredundant-tags on a first declaration in use (PR 93824)

2020-03-12 Thread Martin Sebor via Gcc-patches

On 3/11/20 3:30 PM, Martin Sebor wrote:

On 3/11/20 2:10 PM, Jason Merrill wrote:

On 3/11/20 12:57 PM, Martin Sebor wrote:

On 3/9/20 6:08 PM, Jason Merrill wrote:

On 3/9/20 5:39 PM, Martin Sebor wrote:

On 3/9/20 1:40 PM, Jason Merrill wrote:

On 3/9/20 12:31 PM, Martin Sebor wrote:

On 2/28/20 1:24 PM, Jason Merrill wrote:

On 2/28/20 12:45 PM, Martin Sebor wrote:

On 2/28/20 9:58 AM, Jason Merrill wrote:

On 2/24/20 6:58 PM, Martin Sebor wrote:
-Wredundant-tags doesn't consider type declarations that are 
also

the first uses of the type, such as in 'void f (struct S);' and
issues false positives for those.  According to the reported 
that's

making it harder to use the warning to clean up LibreOffice.

The attached patch extends -Wredundant-tags to avoid these false
positives by relying on the same class_decl_loc_t::class2loc 
mapping
as -Wmismatched-tags.  The patch also somewhat improves the 
detection
of both issues in template declarations (though more work is 
still

needed there).


+ a new entry for it and return unless it's a 
declaration

+ involving a template that may need to be diagnosed by
+ -Wredundant-tags.  */
   *rdl = class_decl_loc_t (class_key, false, def_p);
-  return;
+  if (TREE_CODE (decl) != TEMPLATE_DECL)
+    return;


How can the first appearance of a class template be redundant?


I'm not sure I correctly understand the question.  The comment 
says

"involving a template" (i.e., not one of the first declaration of
a template).  The test case that corresponds to this test is:

   template  struct S7 { };
   struct S7 s7v;  // { dg-warning "\\\[-Wredundant-tags" }

where DECL is the TEPLATE_DECL of S7.

As I mentioned, more work is still needed to handle templates 
right

because some redundant tags are still not diagnosed.  For example:

   template  struct S7 { };
   template 
   using U = struct S7;   // missing warning


When we get here for an instance of a template, it doesn't make 
sense to treat it as a new type.


If decl is a template and type_decl is an instance of that 
template, do we want to (before the lookup) change type_decl to 
the template or the corresponding generic TYPE_DECL, which 
should already be in the table?


I'm struggling with how to do this.  Given type (a RECORD_TYPE) and
type_decl (a TEMPLATE_DECL) representing the use of a template, how
do I get the corresponding template (or its explicit or partial
specialization) in the three cases below?

   1) Instance of the primary:
  template  class A;
  struct A a;

   2) Instance of an explicit specialization:
  template  class B;
  template <> struct B;
  class B b;

   3) Instance of a partial specialization:
  template  class C;
  template  struct C;
  class C c;

By trial and (lots of) error I figured out that in both (1) and (2),
but not in (3), TYPE_MAIN_DECL (TYPE_TI_TEMPLATE (type)) returns
the template's type_decl.

Is there some function to call to get it in (3), or even better,
in all three cases?


I think you're looking for most_general_template.


I don't think that's quite what I'm looking for.  At least it doesn't
return the template or its specialization in all three cases above.


Ah, true, that function stops at specializations.  Oddly, I don't 
think there's currently a similar function that looks through them. 
You could create one that does a simple loop through 
DECL_TI_TEMPLATE like is_specialization_of.


Thanks for the tip.  Even with that I'm having trouble with partial
specializations.  For example in:

   template    struct S;
   template  class S;
   extern class  S s1;
   extern struct S s2;  // expect -Wmismatched-tags

how do I find the declaration of the partial specialization when given
the type in the extern declaration?  A loop in my find_template_for()
function (similar to is_specialization_of) only visits the implicit
specialization S (i.e., its own type) and the primary.


Is that a problem?  The name is from the primary template, so does it 
matter for this warning whether there's an explicit specialization 
involved?


I don't understand the question.  S is an instance of
the partial specialization.  To diagnose the right mismatch the warning
needs to know how to find the template (i.e., either the primary, or
the explicit or partial specialization) the instance corresponds to and
the class-key it was declared with.  As it is, while GCC does diagnose
the right declaration (that of s2), it does that thanks to a bug:
because it finds and uses the type and class-key used to declare s1.
If we get rid of s1 it doesn't diagnose anything.

I tried using DECL_TEMPLATE_SPECIALIZATIONS() to get the list of
the partial specializations but it doesn't like any of the arguments
I've given it (it ICEs).


With this fixed, here's the algorithm I tried:

1) for a type T of a template instantiation (s1 above), get the primary
   P that T was instantiated from using
   P = TYPE_MAIN_DECL 

Re: Re: [PATCH,rs6000] Add command line and builtin compatibility

2020-03-12 Thread Carl Love via Gcc-patches
Segher:

> 
> From the GCC manual:
> 
>   -mmfcrfp4  2.01
>   -mpopcntb  p5  2.02
>   -mfprndp5+ 2.04  ("info gcc" says 2.03, that's wrong?  But the
> ISA
> says this is 2.02 even?  Now what!)
>   -mcmpb p6  2.05
>   -mpopcntd  p7  2.06
> 
> (and there are more, in fact).



> 
> It would be better if you disallowed this option combination?  Don't
> allow an options that says ISA X insns are allowed, but ISA Y insns
> are
> not, with Y < X.  In this case X is 2.06 and Y is 2.02 or 2.03 or
> 2.04.

My fix was very specific to the builtin and the command line argument
as pointed out in the bug report.  

Sounds like you are implying we should take a much more general
approach to checking the command line args against the ISA level.  Such
a test would need to go somewhere else not in the builtin expansion
call.  We may want a routine that just does these checks which can then
be called from the appropriate place.  Let me go look at the GCC manual
and see about what all needs testing.

   Carl 



Re: [RS6000] make PLT loads volatile

2020-03-12 Thread Segher Boessenkool
Hi!

On Thu, Mar 12, 2020 at 01:18:50PM +1030, Alan Modra wrote:
> With lazy PLT resolution the first load of a PLT entry may be a value
> pointing at a resolver stub.  gcc's loop processing can result in the
> PLT load in inline PLT calls being hoisted out of a loop in the
> mistaken idea that this is an optimisation.  It isn't.  If the value
> hoisted was that for a resolver stub then every call to that function
> in the loop will go via the resolver, slowing things down quite
> dramatically.
> 
> The PLT really is volatile, so teach gcc about that.

It would be nice if we could keep it cached after it has been resolved
once, this has potential for regressing performance if we don't?  And
LD_BIND_NOW should keep working just as fast as it is now, too?


Segher


Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread Segher Boessenkool
On Thu, Mar 12, 2020 at 02:08:18PM +0100, Richard Biener wrote:
> > It wasn't clear from my message above, but: I was mostly worried about
> > requiring the asm to treat memory operands in a certain way when the
> > exception is thrown.  IMO it would be better to say that the values of
> > memory operands are undefined on the exception edge.
> 
> Rather unspecified.  So IMHO on the exception edge the asm() should
> still appear as a def for all outputs so the compiler cannot derive any
> actual values for them.  That of course also means that they must not
> appear to be must-defs since we may not DSE earlier stores for example.

So make all outputs of an asm that may throw (w/ -fnon-call-exceptions)
inout operands instead?  That works for registers exactly the same, too?

> If we manage to get the unspecified values correct in SSA then we don't
> need to say whether the asm may clobber them before or after throwing.

Yeah.  Which users will *never* get right, that is, it would be hard to
use any such interface correctly.


Segher


Re: [PATCH] arm: correct constraints on movsi_compare0 [PR91913]

2020-03-12 Thread Richard Earnshaw

On 10/02/2020 15:51, Richard Earnshaw wrote:


The peephole that detects a mov of one register to another followed by
a comparison of the original register against zero is only used in Arm
state; but the instruction that matches this is generic to all 32-bit
compilation states.  That instruction lacks support for SP which is
permitted in Arm state, but has restrictions in Thumb2 code.

This patch fixes the problem by allowing SP when in ARM state for all
registers; in Thumb state it allows SP only as a source when the
register really is copied to another target.

* config/arm/arm.md (movsi_compare0): Allow SP as a source register
in Thumb state and also as a destination in Arm state.  Add T16
variants.
---
  gcc/config/arm/arm.md | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)



I've now backported this (along with Jakub's test) to gcc-8 and gcc-9.

R.


[PATCH] aarch64: Fix another bug in aarch64_add_offset_1 [PR94121]

2020-03-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 12, 2020 at 12:27:48PM +0100, Andreas Schwab wrote:
> I'm getting this ICE with -mabi=ilp32:
> 
> during RTL pass: fwprop1
> /opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c: In function 'bar':
> /opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c:16:1: internal compiler 
> error: in decompose, at rtl.h:2279

That is a preexisting issue, caused by another bug in the same function.
When mode is SImode and moffset is 0x8000 (or anything else with the
bit 31 set), we need to sign-extend it.

Fixed thusly, ok for trunk if it passes bootstrap/regtest on aarch64-linux?

2020-03-12  Jakub Jelinek  

PR target/94121
* config/aarch64/aarch64.c (aarch64_add_offset_1): Use gen_int_mode
instead of GEN_INT.

--- gcc/config/aarch64/aarch64.c.jj 2020-03-12 15:05:20.610726090 +0100
+++ gcc/config/aarch64/aarch64.c2020-03-12 15:18:35.390025244 +0100
@@ -3757,7 +3757,8 @@ aarch64_add_offset_1 (scalar_int_mode mo
   if (emit_move_imm)
 {
   gcc_assert (temp1 != NULL_RTX || can_create_pseudo_p ());
-  temp1 = aarch64_force_temporary (mode, temp1, GEN_INT (moffset));
+  temp1 = aarch64_force_temporary (mode, temp1,
+  gen_int_mode (moffset, mode));
 }
   insn = emit_insn (offset < 0
? gen_sub3_insn (dest, src, temp1)


Jakub



Re: [C/C++, OpenACC] Reject vars of different scope in acc declare (PR94120)

2020-03-12 Thread Frederik Harwath
Tobias Burnus  writes:

Hi Tobias,

> Fortran patch: https://gcc.gnu.org/pipermail/gcc-patches/current/541774.html
>
> "A declare directive must be in the same scope
>   as the declaration of any var that appears in
>   the data clauses of the directive."
>
> ("A declare directive is used […] following a variable
>declaration in C or C++".)
>
> NOTE for C++: This patch assumes that variables in a namespace
> are handled in the same way as those which are at
> global (namespace) scope; however, the OpenACC specification's
> wording currently is "In C or C++ global scope, only …".
> Hence, one can argue about this part of the patch; but as
> it fixes an ICE and is a very sensible extension – the other
> option is to reject it – I believe it is fine.
> (On the OpenACC side, this is now Issue 288.)

Sounds reasonable to me.

> +bool
> +c_check_oacc_same_scope (tree decl)
> +{
> +  struct c_binding *b = I_SYMBOL_BINDING (DECL_NAME (decl));
> +  return b != NULL && B_IN_CURRENT_SCOPE (b);
> +}

Is the function really specific to OpenACC? If not, then "_oacc"
could be dropped from its name. How about "c_check_current_scope"?

> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 24f71671469..8f09eb0d375 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> [...]
> -   if (global_bindings_p ())
> +   if (current_binding_level->kind == sk_namespace)
> [...]
> -  if (error || global_bindings_p ())
> +  if (error || current_binding_level->kind == sk_namespace)
>  return NULL_TREE;

So - just to be sure - the new namespace condition subsumes the old
"global_bindings_p" condition because the global scope is also a namespace,
right? Yes, now I see that you have a test case that demonstrates that
the declare directive still works for global variables with those changes.

> diff --git a/gcc/testsuite/g++.dg/declare-pr94120.C 
> b/gcc/testsuite/g++.dg/declare-pr94120.C
> new file mode 100644
> index 000..8515c4ff875
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/declare-pr94120.C
> @@ -0,0 +1,30 @@
> +/* { dg-do compile }  */
> +
> +/* PR middle-end/94120  */
> +
> +int b[8];
> +#pragma acc declare create (b)

Looks good to me.

Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-03-12 Thread Christophe Lyon via Gcc-patches
Hi,


On Thu, 27 Feb 2020 at 18:03, Kyrill Tkachov
 wrote:
>
> Hi Mihail,
>
> On 2/27/20 2:44 PM, Mihail Ionescu wrote:
> > Hi Kyrill,
> >
> > On 02/27/2020 11:09 AM, Kyrill Tkachov wrote:
> >> Hi Mihail,
> >>
> >> On 2/27/20 10:27 AM, Mihail Ionescu wrote:
> >>> Hi,
> >>>
> >>> This patch adds support for the bf16 vector create, get, set,
> >>> duplicate and reinterpret intrinsics.
> >>> ACLE documents are at https://developer.arm.com/docs/101028/latest
> >>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
> >>>
> >>> Regression tested on arm-none-eabi.
> >>>
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2020-02-27  Mihail Ionescu  
> >>>
> >>> * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the
> >>> beginning of the file.
> >>> (vcreate_bf16, vcombine_bf16): New.
> >>> (vdup_n_bf16, vdupq_n_bf16): New.
> >>> (vdup_lane_bf16, vdup_laneq_bf16): New.
> >>> (vdupq_lane_bf16, vdupq_laneq_bf16): New.
> >>> (vduph_lane_bf16, vduph_laneq_bf16): New.
> >>> (vset_lane_bf16, vsetq_lane_bf16): New.
> >>> (vget_lane_bf16, vgetq_lane_bf16): New.
> >>> (vget_high_bf16, vget_low_bf16): New.
> >>> (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New.
> >>> (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New.
> >>> (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New.
> >>> (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New.
> >>> (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New.
> >>> (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New.
> >>> (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New.
> >>> (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New.
> >>> (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New.
> >>> (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New.
> >>> (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New.
> >>> (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New.
> >>> (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New.
> >>> (vreinterpretq_bf16_p128): New.
> >>> (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New.
> >>> (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New.
> >>> (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New.
> >>> (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New.
> >>> (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New.
> >>> (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New.
> >>> (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New.
> >>> (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New.
> >>> (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New.
> >>> (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New.
> >>> (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New.
> >>> (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New.
> >>> (vreinterpretq_p128_bf16): New.
> >>> * config/arm/arm_neon_builtins.def (VDX): Add V4BF.
> >>> (V_elem): Likewise.
> >>> (V_elem_l): Likewise.
> >>> (VD_LANE): Likewise.
> >>> (VQX) Add V8BF.
> >>> (V_DOUBLE): Likewise.
> >>> (VDQX): Add V4BF and V8BF.
> >>> (V_two_elem, V_three_elem, V_four_elem): Likewise.
> >>> (V_reg): Likewise.
> >>> (V_HALF): Likewise.
> >>> (V_double_vector_mode): Likewise.
> >>> (V_cmp_result): Likewise.
> >>> (V_uf_sclr): Likewise.
> >>> (V_sz_elem): Likewise.
> >>> (Is_d_reg): Likewise.
> >>> (V_mode_nunits): Likewise.
> >>> * config/arm/neon.md (neon_vdup_lane): Enable for BFloat.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> 2020-02-27  Mihail Ionescu  
> >>>
> >>> * gcc.target/arm/bf16_dup.c: New test.
> >>> * gcc.target/arm/bf16_reinterpret.c: Likewise.
> >>>
> >>> Is it ok for trunk?
> >>
> >> This looks mostly ok with a few nits...
> >>
> >>
> >>>
> >>> Regards,
> >>> Mihail
> >>>
> >>>
> >>> ### Attachment also inlined for ease of reply
> >>> ###
> >>>
> >>>
> >>> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
> >>> index
> >>> 09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c
> >>> 100644
> >>> --- a/gcc/config/arm/arm_neon.h
> >>> +++ b/gcc/config/arm/arm_neon.h
> >>> @@ -42,6 +42,15 @@ extern "C" {
> >>>  #include 
> >>>  #include 
> >>>
> >>> +#ifdef __ARM_BIG_ENDIAN
> >>> +#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
> >>> +#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) -
> >>> 1))
> >>> +#define __arm_laneq(__vec, __idx) (__idx ^
> >>> (__ARM_NUM_LANES(__vec)/2 - 1))
> >>> +#else
> >>> +#define __arm_lane(__vec, __idx) __idx
> >>> +#define __arm_laneq(__vec, __idx) __idx
> >>> +#endif
> >>> +
> >>>  typedef __simd64_int8_t int8x8_t;
> >>>  typedef __simd64_int16_t int16x4_t;
> >>>  typedef __simd64_int32_t 

Re: [PATCH] maintainer-scripts: Fix jit documentation build with update_web_docs_git

2020-03-12 Thread David Malcolm via Gcc-patches
On Thu, 2020-03-12 at 13:16 +0100, Jakub Jelinek wrote:
> Hi!
> 
> scripts/update_web_docs_git -r 9.3.0 -d gcc-9.3.0
> failed after the sourceware upgrade, there is no python-sphinx10
> package and
> python3-sphinx is new enough that the docs build succeeded.
> 
> Ok for trunk?
> 
> 2020-03-12  Jakub Jelinek  
> 
>   * update_web_docs_git: Use SPHINXBUILD=/usr/bin/sphinx-build
> rather
>   than SPHINXBUILD=/usr/bin/sphinx-1.0-build.
> 
> --- maintainer-scripts/update_web_docs_git.jj 2020-01-14
> 09:23:17.677789918 +0100
> +++ maintainer-scripts/update_web_docs_git2020-03-12
> 13:12:47.052639530 +0100
> @@ -183,15 +183,16 @@ done
>  # defaulting to "sphinx-build".
>  #
>  # sphinx is packaged in Fedora and EPEL 6 within "python-sphinx",
> +# in RHEL 8 within "python3-sphinx",
>  # and in openSUSE within "python-Sphinx".
>  #
>  # For EPEL6, python-sphinx is sphinx 0.6.6, which is missing various
>  # directives (e.g. ":c:macro:"), so we need the variant
>  # python-sphinx10 package.  The latter installs its executable as
>  #   /usr/bin/sphinx-1.0-build
> -# so we need to override SPHINXBUILD with this when invoking "make".
> +# so we needed to override SPHINXBUILD with this when invoking
> "make".
>  pushd gcc/gcc/jit/docs
> -make SPHINXBUILD=/usr/bin/sphinx-1.0-build html || true
> +make SPHINXBUILD=/usr/bin/sphinx-build html || true

The Makefile in question has:
  SPHINXBUILD   = sphinx-build
so presumably the SPHINXBUILD=something here is only in case someone
wants to override it.

There's a case for removing it altogether, but I think the patch is OK
as is (depends on whether we want to keep that historical information
in the script, or just in the git history, I suppose)

Dave

>  popd
>  cp -a gcc/gcc/jit/docs/_build/html jit
>  mkdir -p $DOCSDIR/jit
> 
>   Jakub



Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

Hi.

There's updated version of the patch.
Changes from the previous version:
- comment added to ld_plugin_symbol
- new section renamed to ext_symtab
- assert added for loop iterations in produce_symtab and 
produce_symtab_extension

Martin
>From ad24565cfb3166fdd9995381b25c1f558c7bcf8c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  

	* lto-section-in.c: Add extension_symtab.
	* lto-streamer-out.c (write_symbol_extension_info):
	New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
	...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c|   3 +-
 gcc/lto-streamer-out.c  |  76 +-
 gcc/lto-streamer.h  |   1 +
 include/lto-symtab.h|  12 +++
 lto-plugin/lto-plugin.c | 165 +++-
 5 files changed, 237 insertions(+), 20 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..78b015be696 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "mode_table",
   "hsa",
   "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "ext_symtab"
 };
 
 /* Hooks so that the ipa passes can call into the lto front end to get
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..e459ec91856 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,12 +2778,32 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+   ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+}
+  lto_write_data (_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
SET and VSET are cgraph/varpool node sets we are outputting.  */
 
-static void
+static unsigned int
 produce_symtab (struct output_block *ob)
 {
+  unsigned int streamed_symbols = 0;
   struct streamer_tree_cache_d *cache = ob->writer_cache;
   char *section_name = lto_get_section_name (LTO_section_symtab, NULL, 0, NULL);
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
@@ -2804,6 +2825,7 @@ produce_symtab (struct output_block *ob)
   if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
   write_symbol (cache, node->decl, , false);
+  ++streamed_symbols;
 }
   for (lsei = lsei_start (encoder);
!lsei_end_p (lsei); lsei_next ())
@@ -2813,8 +2835,55 @@ produce_symtab (struct output_block *ob)
   if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
   write_symbol (cache, node->decl, , false);
+  ++streamed_symbols;
+}
+
+  lto_end_section ();
+
+  return streamed_symbols;
+}
+
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob,
+			  unsigned int previous_streamed_symbols)
+{
+  unsigned int streamed_symbols = 0;
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+	 NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+ 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

On 3/12/20 2:15 PM, Richard Biener wrote:

#elif defined __LITTLE_ENDIAN__


Hmm, the macro is not defined. Even a simple test-case shows that:

$ cat le.c
#ifdef __LITTLE_ENDIAN__
asdfa
#endif

$ gcc le.c -c
[no error]


Re: [PATCH][GCC][AArch64]: Break apart paradoxical subregs for VSTRUCT writes (PR target/94052)

2020-03-12 Thread Tamar Christina
Hi Richard,

I have updated the patch, changelog is the same.

bootstrapped and regtested on aarch64-none-linux-gnu and no issues.

OK for gcc 9 and 8?

Thanks,
Tamar

The 03/10/2020 11:49, Richard Sandiford wrote:
> Tamar Christina  writes:
> > Hi All,
> >
> > This works around an ICE in reload where from expand we get the following 
> > RTL
> > generated for VSTRUCT mode writes:
> >
> > (insn 446 354 445 2 (set (reg:CI 383)
> >  (subreg:CI (reg:V4SI 291) 0)) "small.i":146:22 3408 {*aarch64_movci}
> >  (nil))
> >
> > This sequence is trying to say two things:
> >
> > 1) liveliness: It's trying to say that eventually the whole CI reg will be
> >written to. It does this by generating the paradoxical subreg.
> > 2) write data: It's trying to in the same instruction also write the V4SI 
> > mode
> >component at offset 0 in the CI reg.
> >
> > Reload is unable to understand this concept and so it attempts to handle 
> > this
> > instruction by breaking apart the instruction, first writing the data and 
> > then
> > tries to reload the paradoxical part.  This gets it to the same instruction
> > again and eventually we ICE since we reach the limit of no. reloads.
> 
> reload/LRA does understand the concept.  It just isn't handling this
> particular case very well :-)
> 
> The pre-RA insn is:
> 
> (insn 210 218 209 6 (set (reg/v:OI 182 [ vres ])
> (subreg:OI (reg:V4SI 289) 0)) "pr94052.C":157:31 3518 {*aarch64_movoi}
>  (expr_list:REG_DEAD (reg:V4SI 289)
> (nil)))
> 
> IRA allocates a hard register to r182 but not r289:
> 
>   126:r182 l032  ...
>   ...
>   129:r289 l0   mem  ...
> 
> This is because memory appears to be cheaper:
> 
>   a129(r289,l2) costs: FP_LO8_REGS:136,136 FP_LO_REGS:136,136 FP_REGS:136,136 
> MEM:110,110
> 
> That's probably a bug in itself (possibly in the target costs), but it
> shouldn't be a correctness issue.
> 
> LRA then handles insn 210 like this:
> 
>   Creating newreg=317, assigning class ALL_REGS to slow/invalid mem r317
>   Creating newreg=318, assigning class ALL_REGS to slow/invalid mem r318
>   210: r182:OI=r318:OI
>   REG_DEAD r289:V4SI
> Inserting slow/invalid mem reload before:
>   355: r317:V4SI=[sfp:DI+0x60]
>   356: r318:OI=r317:V4SI#0
> 
> 1 Non pseudo reload: reject++
>   alt=0,overall=1,losers=0,rld_nregs=0
>  Choosing alt 0 in insn 210:  (0) =w  (1) w {*aarch64_movoi}
>   Change to class FP_REGS for r318
> 
> That looks OK so far, given the allocation.  But later we have:
> 
> ** Assignment #2: **
> 
>  Assigning to 318 (cl=FP_REGS, orig=318, freq=44, tfirst=318, 
> tfreq=44)...
>Assign 32 to subreg reload r318 (freq=44)
>  Assigning to 317 (cl=ALL_REGS, orig=317, freq=44, tfirst=317, 
> tfreq=44)...
>Assign 0 to subreg reload r317 (freq=44)
> 
> So LRA is assigning a GPR (x0) to the new V4SI register r317.  It's this
> allocation that induces the cycling, because we get:
> 
>Cycle danger: overall += LRA_MAX_REJECT
>   alt=0,overall=606,losers=1,rld_nregs=2
> 0 Spill pseudo into memory: reject+=3
> Using memory insn operand 0: reject+=3
> 0 Non input pseudo reload: reject++
> Cycle danger: overall += LRA_MAX_REJECT
>   alt=1,overall=619,losers=2,rld_nregs=2
> 1 Spill pseudo into memory: reject+=3
> Using memory insn operand 1: reject+=3
>   alt=2,overall=12,losers=1,rld_nregs=0
>  Choosing alt 2 in insn 356:  (0) w  (1) Utv {*aarch64_movoi}
>   Creating newreg=319, assigning class NO_REGS to r319
>   356: r318:OI=r319:OI
>   REG_DEAD r317:V4SI
> Inserting insn reload before:
>   357: r319:OI=r317:V4SI#0
> 
> 0 Non input pseudo reload: reject++
>   alt=0,overall=13,losers=2,rld_nregs=4
> 0 Non pseudo reload: reject++
>   alt=1,overall=7,losers=1,rld_nregs=2
> 0 Non input pseudo reload: reject++
> 1 Spill pseudo into memory: reject+=3
> Using memory insn operand 1: reject+=3
> alt=2,overall=19,losers=2 -- refuse
>  Choosing alt 1 in insn 357:  (0) Utv  (1) w {*aarch64_movoi}
>   Creating newreg=320, assigning class FP_REGS to r320
>   357: r319:OI=r320:OI
> Inserting insn reload before:
>   358: r320:OI=r317:V4SI#0
> 
> and we keep oscillating between those two choices (alt 2 vs alt 1).
> This wouldn't have happened if we'd allocated an FPR instead.
> 
> I think the problem here is that we're always trying to reload the
> subreg rather than the inner register, even though the allocation for
> the inner register isn't valid for the subreg.  There is code in
> simplify_operand_subreg to detect this kind of situation, but it
> seems to be missing a check for hard_regno_mode_ok.  The first patch
> below seems to fix that.
> 
> > This patch fixes it by in the backend when we see such a 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Richard Biener via Gcc-patches
On Thu, Mar 12, 2020 at 2:11 PM Martin Liška  wrote:
>
> On 3/12/20 1:55 PM, Jan Hubicka wrote:
> >> gcc/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-section-in.c: Add extension_symtab.
> >>  * lto-streamer-out.c (write_symbol_extension_info):
> >>  New.
> >>  (produce_symtab_extension): New.
> >>  (produce_asm_for_decls): Stream also produce_symtab_extension.
> >>  * lto-streamer.h (enum lto_section_type): New section.
> >>
> >> include/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-symtab.h (enum gcc_plugin_symbol_type): New.
> >>  (enum gcc_plugin_symbol_section_flags): Likewise.
> >>
> >> lto-plugin/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-plugin.c (LTO_SECTION_PREFIX): Rename to
> >>  ...
> >>  (LTO_SYMTAB_PREFIX): ... this.
> >>  (LTO_SECTION_PREFIX_LEN): Rename to ...
> >>  (LTO_SYMTAB_PREFIX_LEN): ... this.
> >>  (LTO_SYMTAB_EXT_PREFIX): New.
> >>  (LTO_SYMTAB_EXT_PREFIX_LEN): New.
> >>  (LTO_LTO_PREFIX): New.
> >>  (LTO_LTO_PREFIX_LEN): New.
> >>  (parse_table_entry): Fill up unused to zero.
> >>  (parse_table_entry_extension): New.
> >>  (parse_symtab_extension): New.
> >>  (finish_conflict_resolution): Change type
> >>  for resolution.
> >>  (clear_new_symtab_flags): New.
> >>  (write_resolution): Support new get_symbols_v4.
> >>  (process_symtab): Use new macro name.
> >>  (process_symtab_extension): New.
> >>  (claim_file_handler): Parse also process_symtab_extension.
> >>  (onload): Call new add_symbols_v2.
> > Hi,
> > thanks for updating the patch.  Two minor nits
> >> ---
> >>   gcc/lto-section-in.c|   3 +-
> >>   gcc/lto-streamer-out.c  |  64 +++-
> >>   gcc/lto-streamer.h  |   1 +
> >>   include/lto-symtab.h|  12 +++
> >>   lto-plugin/lto-plugin.c | 165 +++-
> >>   5 files changed, 226 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >> index c17dd69dbdd..7bf59c513fc 100644
> >> --- a/gcc/lto-section-in.c
> >> +++ b/gcc/lto-section-in.c
> >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >> "mode_table",
> >> "hsa",
> >> "lto",
> >> -  "ipa_sra"
> >> +  "ipa_sra",
> >> +  "extension_symtab"
> > I would call it symtab_ext so alphabetically it is coupled with symtab.

Maybe ext_symtab then, proper enlish for the long form would be
extended_symtab I guess.

> It's problematic because of this collision where we search for prefix:
>
> if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
>
> Note that the section names have suffix:
> .gnu.lto_.symtab.48f1e54b1c048b0f
>
> > I wonder if moving it up in the enum would make it physically appear
> > next to .symtab in object file so we save a bit of i/o?
>
> It's not about names order, but streaming order. Which should be fine:
>
> $ readelf -SW /tmp/bss.o
> ...
>[10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS 
> 0001dd 0002de 00   E  0   0  1
>[11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS 
> 0004bb 49 00   E  0   0  1
>[12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS
>  000504 06 00   E  0   0  1
>[13] .gnu.lto_.optsPROGBITS 00050a 51 00   
> E  0   0  1
> ...
>
>
> >> +static void
> >> +produce_symtab_extension (struct output_block *ob)
> >> +{
> >> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
> >> + NULL, 0, NULL);
> >> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> >> +  lto_symtab_encoder_iterator lsei;
> >> +
> >> +  lto_begin_section (section_name, false);
> >> +  free (section_name);
> >> +
> >> +  /* Write the symbol table.
> >> + First write everything defined and then all declarations.
> >> + This is necessary to handle cases where we have duplicated symbols.  
> >> */
> >> +  for (lsei = lsei_start (encoder);
> >> +   !lsei_end_p (lsei); lsei_next ())
> >> +{
> >> +  symtab_node *node = lsei_node (lsei);
> >> +
> >> +  if (DECL_EXTERNAL (node->decl) || 
> >> !node->output_to_lto_symbol_table_p ())
> >> +continue;
> >> +  write_symbol_extension_info (node->decl);
> >> +}
> >> +  for (lsei = lsei_start (encoder);
> >> +   !lsei_end_p (lsei); lsei_next ())
> >> +{
> >> +  symtab_node *node = lsei_node (lsei);
> >> +
> >> +  if (!DECL_EXTERNAL (node->decl) || 
> >> !node->output_to_lto_symbol_table_p ())
> >> +continue;
> >> +  write_symbol_extension_info (node->decl);
> >> +}
> >> +
> >> +  lto_end_section ();
> >> +}
> >
> > It seems fragile to me to duplicate the loops that needs to be kept in
> > sync because breaking that will likely get bit suprising outcome (i.e.
> > bootstrap will work since we do not care 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

On 3/12/20 1:55 PM, Jan Hubicka wrote:

gcc/ChangeLog:

2020-03-12  Martin Liska  

* lto-section-in.c: Add extension_symtab.
* lto-streamer-out.c (write_symbol_extension_info):
New.
(produce_symtab_extension): New.
(produce_asm_for_decls): Stream also produce_symtab_extension.
* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  

* lto-symtab.h (enum gcc_plugin_symbol_type): New.
(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  

* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
...
(LTO_SYMTAB_PREFIX): ... this.
(LTO_SECTION_PREFIX_LEN): Rename to ...
(LTO_SYMTAB_PREFIX_LEN): ... this.
(LTO_SYMTAB_EXT_PREFIX): New.
(LTO_SYMTAB_EXT_PREFIX_LEN): New.
(LTO_LTO_PREFIX): New.
(LTO_LTO_PREFIX_LEN): New.
(parse_table_entry): Fill up unused to zero.
(parse_table_entry_extension): New.
(parse_symtab_extension): New.
(finish_conflict_resolution): Change type
for resolution.
(clear_new_symtab_flags): New.
(write_resolution): Support new get_symbols_v4.
(process_symtab): Use new macro name.
(process_symtab_extension): New.
(claim_file_handler): Parse also process_symtab_extension.
(onload): Call new add_symbols_v2.

Hi,
thanks for updating the patch.  Two minor nits

---
  gcc/lto-section-in.c|   3 +-
  gcc/lto-streamer-out.c  |  64 +++-
  gcc/lto-streamer.h  |   1 +
  include/lto-symtab.h|  12 +++
  lto-plugin/lto-plugin.c | 165 +++-
  5 files changed, 226 insertions(+), 19 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..7bf59c513fc 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
"mode_table",
"hsa",
"lto",
-  "ipa_sra"
+  "ipa_sra",
+  "extension_symtab"

I would call it symtab_ext so alphabetically it is coupled with symtab.


It's problematic because of this collision where we search for prefix:

if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)

Note that the section names have suffix:
.gnu.lto_.symtab.48f1e54b1c048b0f


I wonder if moving it up in the enum would make it physically appear
next to .symtab in object file so we save a bit of i/o?


It's not about names order, but streaming order. Which should be fine:

$ readelf -SW /tmp/bss.o
...
  [10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS 0001dd 
0002de 00   E  0   0  1
  [11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS 
0004bb 49 00   E  0   0  1
  [12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS
 000504 06 00   E  0   0  1
  [13] .gnu.lto_.optsPROGBITS 00050a 51 00   E  
0   0  1
...



+static void
+produce_symtab_extension (struct output_block *ob)
+{
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+ First write everything defined and then all declarations.
+ This is necessary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+   !lsei_end_p (lsei); lsei_next ())
+{
+  symtab_node *node = lsei_node (lsei);
+
+  if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+   continue;
+  write_symbol_extension_info (node->decl);
+}
+  for (lsei = lsei_start (encoder);
+   !lsei_end_p (lsei); lsei_next ())
+{
+  symtab_node *node = lsei_node (lsei);
+
+  if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p 
())
+   continue;
+  write_symbol_extension_info (node->decl);
+}
+
+  lto_end_section ();
+}


It seems fragile to me to duplicate the loops that needs to be kept in
sync because breaking that will likely get bit suprising outcome (i.e.
bootstrap will work since we do not care about symbol types). Perhaps it
would be more robust to simply push the extension bits into a vector in
write_symbol for later streaming.  Or at least add comment that loops
needs to be kept in sync.


I'll add comment.


diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..804684449cf 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,17 @@ struct ld_plugin_symbol
  {
char *name;
char *version;
-  int def;
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char 

Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread Richard Biener via Gcc-patches
On Thu, Mar 12, 2020 at 11:01 AM Richard Sandiford
 wrote:
>
> "J.W. Jagersma"  writes:
> > On 2020-03-09 13:13, Richard Sandiford wrote:
> >> Thanks for doing this.
> >
> > Hi Richard, thanks for your response.
> >
> >> "J.W. Jagersma"  writes:
> >>> On 2020-03-07 20:20, Segher Boessenkool wrote:
>  Some comments:
> 
> > +When non-call exceptions (@option{-fnon-call-exceptions}) are enabled, 
> > a
> > +@code{volatile asm} statement is also allowed to throw exceptions.  If 
> > it
> > +does, then the compiler assumes that its output operands have not been 
> > written
> > +yet.
> 
>  That reads as if the compiler assumes the outputs retain their original
>  value, but that isn't true (I hope!)  The compiler assumes the output
>  are clobbered, but it doesn't assume they are assigned any definite
>  value?
> >>>
> >>> Register outputs are assumed to be clobbered, yes.  For memory outputs
> >>> this is not the case, if the asm writes it before throwing then the
> >>> memory operand retains this value.  It should be the user's
> >>> responsibility to ensure that an asm has no side-effects if it throws.
> >>
> >> I guess one problem is that something like "=" explicitly says that
> >> the operand is clobbered "early", and I think it would be fair for
> >> "early" to include clobbers before the exception.  So IMO we should
> >> allow at least early-clobbered memory outputs to be clobbered by the
> >> exception.
> >
> > Is "=" not equivalent to "=m" in every case?  As I understand it, the
> > earlyclobber modifier is only relevant for register outputs.  It does
> > not specify anything about clobbering the output itself, it only says
> > that an input could be clobbered if it is allocated in the same
> > register (or if a memory input uses this register as index).
> >
> >> And if we do that, then I'm not sure there's much benefit in trying to
> >> treat the non-earlyclobber memory case specially.
> >>
> >> It would be good to have testcases for the output cases.  E.g. for:
> >>
> >> int foo;
> >> int bar = 0;
> >> try
> >>   {
> >> foo = 1;
> >> asm volatile ("..." : "=m" (foo));
> >>   }
> >> catch (...whatever...)
> >>   {
> >> bar = foo;
> >>   }
> >> ...use bar...
> >>
> >> What does "bar = foo" read?  Is it always undefined behaviour if executed?
> >> Or does it always read "foo" from memory?  Can it be optimised to "bar = 
> >> 1"?
> >> Is "foo = 1" dead code?
> >
> > These are very good points.  But I am not sure how to test for all of
> > these.  My test case now looks as follows:
> >
> > // PR inline-asm/93981
> > // { dg-do run }
> > // { dg-options "-fnon-call-exceptions -O3" }
> > // { dg-xfail-run-if "" { ! *-linux-gnu } }
> >
> > #include 
> >
> > struct illegal_opcode { };
> >
> > extern "C" void
> > sigill (int)
> > {
> >   throw illegal_opcode ( );
> > }
> >
> > int
> > test_mem ()
> > {
> >   int i = 2;
> >   try
> > {
> >   asm volatile ("mov%z0 $1, %0; ud2" : "=m" (i));
> > }
> >   catch (const illegal_opcode&)
> > {
> >   if (i == 1) return 0;
> > }
> >   return i;
> > }
> >
> > int
> > test_reg ()
> > {
> >   int i = 2;
> >   try
> > {
> >   asm volatile ("mov%z0 $1, %0; ud2" : "=r" (i));
> > }
> >   catch (const illegal_opcode&)
> > {
> >   if (i == 2) return 0;
> > }
> >   return i;
> > }
> >
> > int
> > main ()
> > {
> >   std::signal (SIGILL, sigill);
> >   return test_reg () + test_mem ();
> > }
> >
> > I think that should cover most of it.  Am I missing anything?
>
> The other case I mentioned was equivalent to:
>
>  int
>  test_mem2 ()
>  {
>int i = 2;
>try
>  {
>asm volatile ("ud2; mov%z0 $1, %0" : "=m" (i));
>  }
>catch (const illegal_opcode&)
>  {
>if (i == 2) return 0;
>  }
>return i;
>  }
>
> Is this test expected to pass?
>
> However, these "mem" tests are testing gimple register types, so they'll
> still be SSA names outside of the asm.  It would also be good to test what
> happens for non-register types, e.g. something like:
>
>  int
>  test_mem3 ()
>  {
>int i[5] = { 1, 2, 3, 4, 5 };
>try
>  {
>asm volatile ("ud2; " : "=m" (i));
>  }
>catch (const illegal_opcode&)
>  {
>if (i[0] == 1 && ...) return 0;
>  }
>return ...;
>  }
>
> and similarly for ud2 after the store.
>
> It wasn't clear from my message above, but: I was mostly worried about
> requiring the asm to treat memory operands in a certain way when the
> exception is thrown.  IMO it would be better to say that the values of
> 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Jan Hubicka
> gcc/ChangeLog:
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-section-in.c: Add extension_symtab.
>   * lto-streamer-out.c (write_symbol_extension_info):
>   New.
>   (produce_symtab_extension): New.
>   (produce_asm_for_decls): Stream also produce_symtab_extension.
>   * lto-streamer.h (enum lto_section_type): New section.
> 
> include/ChangeLog:
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-symtab.h (enum gcc_plugin_symbol_type): New.
>   (enum gcc_plugin_symbol_section_flags): Likewise.
> 
> lto-plugin/ChangeLog:
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-plugin.c (LTO_SECTION_PREFIX): Rename to
>   ...
>   (LTO_SYMTAB_PREFIX): ... this.
>   (LTO_SECTION_PREFIX_LEN): Rename to ...
>   (LTO_SYMTAB_PREFIX_LEN): ... this.
>   (LTO_SYMTAB_EXT_PREFIX): New.
>   (LTO_SYMTAB_EXT_PREFIX_LEN): New.
>   (LTO_LTO_PREFIX): New.
>   (LTO_LTO_PREFIX_LEN): New.
>   (parse_table_entry): Fill up unused to zero.
>   (parse_table_entry_extension): New.
>   (parse_symtab_extension): New.
>   (finish_conflict_resolution): Change type
>   for resolution.
>   (clear_new_symtab_flags): New.
>   (write_resolution): Support new get_symbols_v4.
>   (process_symtab): Use new macro name.
>   (process_symtab_extension): New.
>   (claim_file_handler): Parse also process_symtab_extension.
>   (onload): Call new add_symbols_v2.
Hi,
thanks for updating the patch.  Two minor nits
> ---
>  gcc/lto-section-in.c|   3 +-
>  gcc/lto-streamer-out.c  |  64 +++-
>  gcc/lto-streamer.h  |   1 +
>  include/lto-symtab.h|  12 +++
>  lto-plugin/lto-plugin.c | 165 +++-
>  5 files changed, 226 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> index c17dd69dbdd..7bf59c513fc 100644
> --- a/gcc/lto-section-in.c
> +++ b/gcc/lto-section-in.c
> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>"mode_table",
>"hsa",
>"lto",
> -  "ipa_sra"
> +  "ipa_sra",
> +  "extension_symtab"
I would call it symtab_ext so alphabetically it is coupled with symtab.
I wonder if moving it up in the enum would make it physically appear
next to .symtab in object file so we save a bit of i/o?
> +static void
> +produce_symtab_extension (struct output_block *ob)
> +{
> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
> +  NULL, 0, NULL);
> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> +  lto_symtab_encoder_iterator lsei;
> +
> +  lto_begin_section (section_name, false);
> +  free (section_name);
> +
> +  /* Write the symbol table.
> + First write everything defined and then all declarations.
> + This is necessary to handle cases where we have duplicated symbols.  */
> +  for (lsei = lsei_start (encoder);
> +   !lsei_end_p (lsei); lsei_next ())
> +{
> +  symtab_node *node = lsei_node (lsei);
> +
> +  if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p 
> ())
> + continue;
> +  write_symbol_extension_info (node->decl);
> +}
> +  for (lsei = lsei_start (encoder);
> +   !lsei_end_p (lsei); lsei_next ())
> +{
> +  symtab_node *node = lsei_node (lsei);
> +
> +  if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p 
> ())
> + continue;
> +  write_symbol_extension_info (node->decl);
> +}
> +
> +  lto_end_section ();
> +}

It seems fragile to me to duplicate the loops that needs to be kept in
sync because breaking that will likely get bit suprising outcome (i.e.
bootstrap will work since we do not care about symbol types). Perhaps it
would be more robust to simply push the extension bits into a vector in
write_symbol for later streaming.  Or at least add comment that loops
needs to be kept in sync.
> diff --git a/include/plugin-api.h b/include/plugin-api.h
> index 09e1202df07..804684449cf 100644
> --- a/include/plugin-api.h
> +++ b/include/plugin-api.h
> @@ -87,7 +87,17 @@ struct ld_plugin_symbol
>  {
>char *name;
>char *version;
> -  int def;
> +#ifdef __BIG_ENDIAN__
> +  char unused;
> +  char section_flags;
> +  char symbol_type;
> +  char def;
> +#else
> +  char def;
> +  char symbol_type;
> +  char section_flags;
> +  char unused;
> +#endif
Perhaps at least drop a comment why this is done this way (i.e.
compatibility with V3 version of API) so in 10 years we know?

Bit more systematic way would be to add a new hook to query extended
properties of a given symbol. I.e. something like
void *get_symbol_property (symbol, enum property)

Honza


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

Hi.

I'm sending extended version of the patch where I address Honza's note
about backward compatibility. So I add a next section _lto.extension_symtab
where I put new bits. With that, new LTO bytecode can be opened with older
LTO plugin.

Moreover, I've also tested lto.exp tests and binutils master. And I can confirm
that patches work with H.J.'s counterpart:
https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/plugin/master

$ cat bss.c
int global_zero;
int global_one = 1;

int main() { return 0; }

$ gcc -c -flto bss.c

Old bintuils (with new plugin):

$ ./binutils/nm-new /tmp/bss.o --plugin 
/home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0
 T global_one
 T global_zero
 T main

H.J.'s binutils (with new plugin):

$ ./binutils/nm-new /tmp/bss.o --plugin 
/home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0
 D global_one
 B global_zero
 T main

System binutils (with old plugin):

$ nm /tmp/bss.o
 T global_one
 T global_zero
 T main

Hope I'm done now.
Thoughts?

Thanks,
Martin
>From 8968bbdf627dc26056f1cb344e253eedae0efca1 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  

	* lto-section-in.c: Add extension_symtab.
	* lto-streamer-out.c (write_symbol_extension_info):
	New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
	...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c|   3 +-
 gcc/lto-streamer-out.c  |  64 +++-
 gcc/lto-streamer.h  |   1 +
 include/lto-symtab.h|  12 +++
 lto-plugin/lto-plugin.c | 165 +++-
 5 files changed, 226 insertions(+), 19 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..7bf59c513fc 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "mode_table",
   "hsa",
   "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "extension_symtab"
 };
 
 /* Hooks so that the ipa passes can call into the lto front end to get
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..2fd20252ba5 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,6 +2778,25 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+   ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+}
+  lto_write_data (_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
SET and VSET are cgraph/varpool node sets we are outputting.  */
 
@@ -2818,6 +2838,45 @@ produce_symtab (struct output_block *ob)
   lto_end_section ();
 }
 
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob)
+{
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+	 NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+ First write everything defined and then all declarations.
+ This is necessary to 

[PATCH] maintainer-scripts: Fix jit documentation build with update_web_docs_git

2020-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

scripts/update_web_docs_git -r 9.3.0 -d gcc-9.3.0
failed after the sourceware upgrade, there is no python-sphinx10 package and
python3-sphinx is new enough that the docs build succeeded.

Ok for trunk?

2020-03-12  Jakub Jelinek  

* update_web_docs_git: Use SPHINXBUILD=/usr/bin/sphinx-build rather
than SPHINXBUILD=/usr/bin/sphinx-1.0-build.

--- maintainer-scripts/update_web_docs_git.jj   2020-01-14 09:23:17.677789918 
+0100
+++ maintainer-scripts/update_web_docs_git  2020-03-12 13:12:47.052639530 
+0100
@@ -183,15 +183,16 @@ done
 # defaulting to "sphinx-build".
 #
 # sphinx is packaged in Fedora and EPEL 6 within "python-sphinx",
+# in RHEL 8 within "python3-sphinx",
 # and in openSUSE within "python-Sphinx".
 #
 # For EPEL6, python-sphinx is sphinx 0.6.6, which is missing various
 # directives (e.g. ":c:macro:"), so we need the variant
 # python-sphinx10 package.  The latter installs its executable as
 #   /usr/bin/sphinx-1.0-build
-# so we need to override SPHINXBUILD with this when invoking "make".
+# so we needed to override SPHINXBUILD with this when invoking "make".
 pushd gcc/gcc/jit/docs
-make SPHINXBUILD=/usr/bin/sphinx-1.0-build html || true
+make SPHINXBUILD=/usr/bin/sphinx-build html || true
 popd
 cp -a gcc/gcc/jit/docs/_build/html jit
 mkdir -p $DOCSDIR/jit

Jakub



[PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature

2020-03-12 Thread Dennis Zhang
Hi all,

This patch is part of a series that adds support for the ARMv8.m Custom 
Datapath Extension.
This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE on 
corresponding coprocessor 0-7.
It also adds new check-effective for CDE feature.

ISA has been announced at 
https://developer.arm.com/architectures/instruction-sets/custom-instructions

Regtested and bootstrapped.

Is it OK to commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-03-11  Dennis Zhang  

* config.gcc: Add arm_cde.h.
* config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
__ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
* config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
* config/arm/arm.c (arm_option_reconfigure_globals): Configure
arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
* config/arm/arm.h (TARGET_CDE): New macro.
* config/arm/arm_cde.h: New file.
* doc/invoke.texi: Document cdecp[0-7] options.

gcc/testsuite/ChangeLog:

2020-03-11  Dennis Zhang  

* gcc.target/arm/pragma_cde.c: New test.
* lib/target-supports.exp (arm_v8m_main_cde): New check effective.
(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2df4b36d190..43967b7d1ff 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -346,7 +346,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_cde.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 38edaff17a2..77753015b34 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -227,6 +227,12 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
 }
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_CDE", TARGET_CDE);
+  cpp_undef (pfile, "__ARM_FEATURE_CDE_COPROC");
+  if (TARGET_CDE)
+builtin_define_with_int_value ("__ARM_FEATURE_CDE_COPROC",
+   arm_arch_cde_coproc);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
 		  TARGET_BF16_FP);
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 96f584da325..5a7498e18db 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -207,6 +207,16 @@ define feature i8mm
 # Brain half-precision floating-point extension. Optional from v8.2-A.
 define feature bf16
 
+# Arm Custom Datapath Extension (CDE).
+define feature cdecp0
+define feature cdecp1
+define feature cdecp2
+define feature cdecp3
+define feature cdecp4
+define feature cdecp5
+define feature cdecp6
+define feature cdecp7
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -670,6 +680,14 @@ begin arch armv8-m.main
  option fp.dp add FPv5 FP_DBL
  option nofp remove ALL_FP
  option nodsp remove armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8-m.main
 
 begin arch armv8-r
@@ -701,6 +719,14 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add mve armv7em
  option mve.fp add mve FPv5 fp16 mve_float armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8.1-m.main
 
 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9cc7bc0e562..9f1e1ec5c88 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1021,6 +1021,13 @@ int arm_arch_i8mm = 0;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 int arm_arch_bf16 = 0;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+int arm_arch_cde = 0;
+int arm_arch_cde_coproc = 0;
+const int arm_arch_cde_coproc_bits[] = {
+  0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
+};
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3740,6 +3747,21 @@ arm_option_reconfigure_globals (void)
   arm_fp16_format = ARM_FP16_FORMAT_IEEE;
 }
 
+  arm_arch_cde = 0;
+  arm_arch_cde_coproc = 0;
+  int cde_bits[] = {isa_bit_cdecp0, isa_bit_cdecp1, isa_bit_cdecp2,
+		isa_bit_cdecp3, isa_bit_cdecp4, isa_bit_cdecp5,
+		isa_bit_cdecp6, isa_bit_cdecp7};
+  for (int i = 0, e = ARRAY_SIZE (cde_bits); i < 

Re: [PATCH] aarch64: Fix ICE in aarch64_add_offset_1 [PR94121]

2020-03-12 Thread Andreas Schwab
I'm getting this ICE with -mabi=ilp32:

during RTL pass: fwprop1
/opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c: In function 'bar':
/opt/gcc/gcc-20200312/gcc/testsuite/gcc.dg/pr94121.c:16:1: internal compiler 
error: in decompose, at rtl.h:2279
0xca5063 wi::int_traits >::decompose(long*, 
unsigned int, std::pair const&)
../../gcc/rtl.h:2279
0xca5063 wide_int_ref_storage::wide_int_ref_storage 
>(std::pair const&)
../../gcc/wide-int.h:1024
0xca5063 generic_wide_int 
>::generic_wide_int >(std::pair const&)
../../gcc/wide-int.h:782
0xca5063 poly_int<2u, generic_wide_int > 
>::poly_int >(std::pair const&)
../../gcc/poly-int.h:670
0xca5063 wi::to_poly_wide(rtx_def const*, machine_mode)
../../gcc/rtl.h:2364
0xca5063 neg_poly_int_rtx
../../gcc/simplify-rtx.c:64
0xcab637 simplify_binary_operation_1
../../gcc/simplify-rtx.c:2677
0xcacc87 simplify_binary_operation(rtx_code, machine_mode, rtx_def*, rtx_def*)
../../gcc/simplify-rtx.c:2291
0xcacd33 simplify_gen_binary(rtx_code, machine_mode, rtx_def*, rtx_def*)
../../gcc/simplify-rtx.c:189
0x163797f propagate_rtx_1
../../gcc/fwprop.c:520
0x16384b3 propagate_rtx
../../gcc/fwprop.c:752
0x1639b83 forward_propagate_and_simplify
../../gcc/fwprop.c:1421
0x1639b83 forward_propagate_into
../../gcc/fwprop.c:1490
0x163a74f fwprop
../../gcc/fwprop.c:1580

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH v3][ARM][GCC][3/x]: MVE ACLE intrinsics framework patch.

2020-03-12 Thread Kyrill Tkachov

Hi Srinath,

On 3/10/20 6:19 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

This patch addresses all the comments in patch version v2.
(version v2) 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540417.html





Hello,

This patch is part of MVE ACLE intrinsics framework.

The patch supports the use of emulation for the single-precision 
arithmetic
operations for MVE. This changes are to support the MVE ACLE 
intrinsics which

operates on vector floating point arithmetic operations.

Please refer to Arm reference manual [1] for more details.
[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on target arm-none-eabi and armeb-none-eabi and 
found no regressions.


Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2020-03-06  Andre Vieira 
    Srinath Parvathaneni 

    * config/arm/arm.c (arm_libcall_uses_aapcs_base): Modify 
function to add

    emulator calls for dobule precision arithmetic operations for MVE.

2020-03-06  Srinath Parvathaneni 

    * gcc.target/arm/mve/intrinsics/mve_libcall1.c: New test.
    * gcc.target/arm/mve/intrinsics/mve_libcall2.c: Likewise.


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
c28a475629c7fbad48730beed5550e0cffdf2e1b..40db35a2a8b6dedb4f536b4995e80c8b9a38b588 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5754,9 +5754,25 @@ arm_libcall_uses_aapcs_base (const_rtx libcall)
   /* Values from double-precision helper functions are returned 
in core

  registers if the selected core only supports single-precision
  arithmetic, even if we are using the hard-float ABI.  The 
same is

-    true for single-precision helpers, but we will never be using the
-    hard-float ABI on a CPU which doesn't support single-precision
-    operations in hardware.  */
+    true for single-precision helpers except in case of MVE, 
because in
+    MVE we will be using the hard-float ABI on a CPU which 
doesn't support
+    single-precision operations in hardware.  In MVE the 
following check

+    enables use of emulation for the single-precision arithmetic
+    operations.  */
+  if (TARGET_HAVE_MVE)
+   {
+ add_libcall (libcall_htab, optab_libfunc (add_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sdiv_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (smul_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (neg_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (sub_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (eq_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (lt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (le_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (ge_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (gt_optab, SFmode));
+ add_libcall (libcall_htab, optab_libfunc (unord_optab, SFmode));
+   }
   add_libcall (libcall_htab, optab_libfunc (add_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (sdiv_optab, DFmode));
   add_libcall (libcall_htab, optab_libfunc (smul_optab, DFmode));
diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c

new file mode 100644
index 
..f89301228c577291fc3095420df1937e1a0c7104

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_libcall1.c
@@ -0,0 +1,70 @@
+/* { dg-do compile  } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve 
-mfloat-abi=hard -mthumb -mfpu=auto" } */

+
+float
+foo (float a, float b, float c)
+{
+  return a + b + c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fadd" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fadd" 2 } } */



What is the point of repeating the scan-assembler directives here? The 
first scan-assembler should be redundant given the scan-assembler-times ?


Otherwise ok.

Thanks,

Kyrill



+
+float
+foo1 (float a, float b, float c)
+{
+  return a - b - c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fsub" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fsub" 2 } } */
+
+float
+foo2 (float a, float b, float c)
+{
+  return a * b * c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fmul" }  } */
+/* { dg-final { scan-assembler-times "bl\\t__aeabi_fmul" 2 } } */
+
+float
+foo3 (float b, float c)
+{
+  return b / c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fdiv" }  } */
+
+int
+foo4 (float b, float c)
+{
+  return b < c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fcmplt" }  } */
+
+int
+foo5 (float b, float c)
+{
+  return b > c;
+}
+
+/* { dg-final { scan-assembler "bl\\t__aeabi_fcmpgt" }  } */
+
+int
+foo6 

Re: [PATCH v3][ARM][GCC][2/x]: MVE ACLE intrinsics framework patch.

2020-03-12 Thread Kyrill Tkachov

Hi Srinath,

On 3/10/20 6:19 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

This patch addresses all the comments in patch version v2.
(version v2) 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540416.html





Hello,

This patch is part of MVE ACLE intrinsics framework.
This patches add support to update (read/write) the APSR (Application 
Program Status Register)
register and FPSCR (Floating-point Status and Control Register) 
register for MVE.

This patch also enables thumb2 mov RTL patterns for MVE.

A new feature bit vfp_base is added. This bit is enabled for all VFP, 
MVE and MVE with floating point
extensions. This bit is used to enable the macro TARGET_VFP_BASE. For 
all the VFP instructions, RTL patterns,
status and control registers are guarded by TARGET_HAVE_FLOAT. But 
this patch modifies that and the
common instructions, RTL patterns, status and control registers 
bewteen MVE and VFP are guarded by

TARGET_VFP_BASE macro.

The RTL pattern set_fpscr and get_fpscr are updated to use 
VFPCC_REGNUM because few MVE intrinsics

set/get carry bit of FPSCR register.

Please refer to Arm reference manual [1] for more details.
[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on target arm-none-eabi and armeb-none-eabi and 
found no regressions.


Ok for trunk?



Ok, but make sure it bootstraps on arm-none-linux-gnueabihf (as with the 
other patches in this series)


Thanks,

Kyrill




Thanks,
Srinath
gcc/ChangeLog:

2020-03-06  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * common/config/arm/arm-common.c (arm_asm_auto_mfpu): When 
vfp_base
    feature bit is on and -mfpu=auto is passed as compiler option, 
do not
    generate error on not finding any match fpu. Because in this 
case fpu

    is not required.
    * config/arm/arm-cpus.in (vfp_base): Define feature bit, this 
bit is

    enabled for MVE and also for all VFP extensions.
    (VFPv2): Modify fgroup to enable vfp_base feature bit when 
ever VFPv2

    is enabled.
    (MVE): Define fgroup to enable feature bits mve, vfp_base and 
armv7em.
    (MVE_FP): Define fgroup to enable feature bits is fgroup MVE 
and FPv5

    along with feature bits mve_float.
    (mve): Modify add options in armv8.1-m.main arch for MVE.
    (mve.fp): Modify add options in armv8.1-m.main arch for MVE with
    floating point.
    * config/arm/arm.c (use_return_insn): Replace the
    check with TARGET_VFP_BASE.
    (thumb2_legitimate_index_p): Replace TARGET_HARD_FLOAT with
    TARGET_VFP_BASE.
    (arm_rtx_costs_internal): Replace "TARGET_HARD_FLOAT || 
TARGET_HAVE_MVE"
    with TARGET_VFP_BASE, to allow cost calculations for copies in 
MVE as

    well.
    (arm_get_vfp_saved_size): Replace TARGET_HARD_FLOAT with
    TARGET_VFP_BASE, to allow space calculation for VFP registers 
in MVE

    as well.
    (arm_compute_frame_layout): Likewise.
    (arm_save_coproc_regs): Likewise.
    (arm_fixed_condition_code_regs): Modify to enable using 
VFPCC_REGNUM

    in MVE as well.
    (arm_hard_regno_mode_ok): Replace "TARGET_HARD_FLOAT || 
TARGET_HAVE_MVE"

    with equivalent macro TARGET_VFP_BASE.
    (arm_expand_epilogue_apcs_frame): Likewise.
    (arm_expand_epilogue): Likewise.
    (arm_conditional_register_usage): Likewise.
    (arm_declare_function_name): Add check to skip printing .fpu 
directive
    in assembly file when TARGET_VFP_BASE is enabled and 
fpu_to_print is

    "softvfp".
    * config/arm/arm.h (TARGET_VFP_BASE): Define.
    * config/arm/arm.md (arch): Add "mve" to arch.
    (eq_attr "arch" "mve"): Enable on TARGET_HAVE_MVE is true.
    (vfp_pop_multiple_with_writeback): Replace "TARGET_HARD_FLOAT
    || TARGET_HAVE_MVE" with equivalent macro TARGET_VFP_BASE.
    * config/arm/constraints.md (Uf): Define to allow modification 
to FPCCR

    in MVE.
    * config/arm/thumb2.md (thumb2_movsfcc_soft_insn): Modify 
target guard

    to not allow for MVE.
    * config/arm/unspecs.md (UNSPEC_GET_FPSCR): Move to volatile 
unspecs

    enum.
    (VUNSPEC_GET_FPSCR): Define.
    * config/arm/vfp.md (thumb2_movhi_vfp): Add support for VMSR 
and VMRS
    instructions which move to general-purpose Register from 
Floating-point

    Special register and vice-versa.
    (thumb2_movhi_fp16): Likewise.
    (thumb2_movsi_vfp): Add support for VMSR and VMRS instructions 
along
    with MCR and MRC instructions which set and get Floating-point 
Status

    and Control Register (FPSCR).
    (movdi_vfp): Modify pattern to enable Single-precision scalar 
float move

    in MVE.
    (thumb2_movdf_vfp): Modify pattern to enable Double-precision 
scalar

    float move patterns in MVE.
    (thumb2_movsfcc_vfp): Modify pattern to enable single float 
conditional
    

Re: [PATCH v3][ARM][GCC][1/x]: MVE ACLE intrinsics framework patch.

2020-03-12 Thread Kyrill Tkachov

Hi Srinath,

On 3/10/20 6:19 PM, Srinath Parvathaneni wrote:

Hello Kyrill,

This patch addresses all the comments in patch version v2.
(version v2) 
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html




Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE 
intrinsics.


Header file arm_mve.h is added to source code, which contains the 
definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md 
is also added which contains the

RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated 
patterns. A new register class "VPR_REG"

is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move 
patterns. The prefix of neon functions

which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to 
simd_immediate_valid_for_move.


In the patch standard patterns mve_move, mve_store and move_load for 
MVE are added and neon.md and vfp.md

files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] https://developer.arm.com/docs/ddi0553/latest

Regression tested on target arm-none-eabi and armeb-none-eabi and 
found no regressions.


Ok for trunk?



This is ok but please bootstrap it on arm-none-linux-gnueabihf as well.

Thanks,

Kyrill




Thanks,
Srinath

gcc/ChangeLog:

2020-03-06  Andre Vieira 
    Mihail Ionescu  
    Srinath Parvathaneni 

    * config.gcc (arm_mve.h): Include mve intrinsics header file.
    * config/arm/aout.h (p0): Add new register name for MVE predicated
    cases.
    * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define 
macro

    common to Neon and MVE.
    (ARM_BUILTIN_NEON_LANE_CHECK): Renamed to 
ARM_BUILTIN_SIMD_LANE_CHECK.

    (arm_init_simd_builtin_types): Disable poly types for MVE.
    (arm_init_neon_builtins): Move a check to arm_init_builtins 
function.

    (arm_init_builtins): Use ARM_BUILTIN_SIMD_LANE_CHECK instead of
    ARM_BUILTIN_NEON_LANE_CHECK.
    (mve_dereference_pointer): Add function.
    (arm_expand_builtin_args): Call to mve_dereference_pointer 
when MVE is

    enabled.
    (arm_expand_neon_builtin): Moved to arm_expand_builtin function.
    (arm_expand_builtin): Moved from arm_expand_neon_builtin function.
    * config/arm/arm-c.c (__ARM_FEATURE_MVE): Define macro for MVE 
and MVE

    with floating point enabled.
    * config/arm/arm-protos.h (neon_immediate_valid_for_move): 
Renamed to

    simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Renamed from
    neon_immediate_valid_for_move function.
    * config/arm/arm.c (arm_options_perform_arch_sanity_checks): 
Generate

    error if vfpv2 feature bit is disabled and mve feature bit is also
    disabled for HARD_FLOAT_ABI.
    (use_return_insn): Check to not push VFP regs for MVE.
    (aapcs_vfp_allocate): Add MVE check to have same Procedure 
Call Standard

    as Neon.
    (aapcs_vfp_allocate_return_reg): Likewise.
    (thumb2_legitimate_address_p): Check to return 0 on valid Thumb-2
    address operand for MVE.
    (arm_rtx_costs_internal): MVE check to determine cost of rtx.
    (neon_valid_immediate): Rename to simd_valid_immediate.
    (simd_valid_immediate): Rename from neon_valid_immediate.
    (simd_valid_immediate): MVE check on size of vector is 128 bits.
    (neon_immediate_valid_for_move): Rename to
    simd_immediate_valid_for_move.
    (simd_immediate_valid_for_move): Rename from
    neon_immediate_valid_for_move.
    (neon_immediate_valid_for_logic): Modify call to 
neon_valid_immediate

    function.
    (neon_make_constant): Modify call to neon_valid_immediate 
function.
    (neon_vector_mem_operand): Return VFP register for POST_INC or 
PRE_DEC

    for MVE.
    (output_move_neon): Add MVE check to generate vldm/vstm 
instrcutions.
    (arm_compute_frame_layout): Calculate space for saved VFP 
registers for

    MVE.
    (arm_save_coproc_regs): Save coproc registers for MVE.
    (arm_print_operand): Add case 'E' to print memory operands for 
MVE.
    (arm_print_operand_address): Check to print register number 
for MVE.
    (arm_hard_regno_mode_ok): Check for arm hard regno mode ok for 
MVE.

    (arm_modes_tieable_p): Check to allow structure mode for MVE.
    (arm_regno_class): Add VPR_REGNUM check.
    (arm_expand_epilogue_apcs_frame): MVE check to calculate 
epilogue code

    for APCS frame.
    (arm_expand_epilogue): MVE check for enabling pop instructions in
    epilogue.
    (arm_print_asm_arch_directives): Modify function to disable 
print of

    

[committed] libstdc++: Fix test failure due to -Wnonnull warnings

2020-03-12 Thread Jonathan Wakely via Gcc-patches
This test fails in the Fedora RPM build (but not elsewhere, for unknown
reasons). The warning is correct, we're passing a null pointer.

* testsuite/tr1/8_c_compatibility/cstdlib/functions.cc: Do not pass
a null pointer to functions with nonnull(1) attribute.

Tested x86_64-linux, committed to master.

commit fcc443b97e19d9c8a2d8ccdfa4cc20682165827e
Author: Jonathan Wakely 
Date:   Thu Mar 12 11:03:04 2020 +

libstdc++: Fix test failure due to -Wnonnull warnings

This test fails in the Fedora RPM build (but not elsewhere, for unknown
reasons). The warning is correct, we're passing a null pointer.

* testsuite/tr1/8_c_compatibility/cstdlib/functions.cc: Do not pass
a null pointer to functions with nonnull(1) attribute.

diff --git a/libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdlib/functions.cc 
b/libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdlib/functions.cc
index 89c078ffe6f..227a7580bcf 100644
--- a/libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdlib/functions.cc
+++ b/libstdc++-v3/testsuite/tr1/8_c_compatibility/cstdlib/functions.cc
@@ -30,7 +30,7 @@ void test01()
 #if _GLIBCXX_USE_C99_STDLIB
 
   long long i = 0;
-  const char* s = 0;
+  const char* s = "";
   char** endptr = 0;
   int base = 0;
 


Re: [PATCH 6/6] i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV

2020-03-12 Thread H.J. Lu via Gcc-patches
On Wed, Mar 11, 2020 at 8:53 PM Jeff Law  wrote:
>
> On Sat, 2020-02-29 at 06:16 -0800, H.J. Lu wrote:
> > There is no need to set mode attribute to XImode since ix86_output_ssemov
> > can properly encode xmm16-xmm31 registers with and without AVX512VL.
> >
> > Remove ext_sse_reg_operand since it is no longer needed.
> >
> >   PR target/89229
> >   * config/i386/i386.c (ix86_output_ssemov): Handle MODE_V1DF and
> >   MODE_V2SF.
> >   * config/i386/mmx.md (MMXMODE:*mov_internal): Call
> >   ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
> >   check.
> >   * config/i386/predicates.md (ext_sse_reg_operand): Removed.
> This is OK.   I think once this is in, patch #2 becomes OK because this patch
> adds V2SF handling in ix86_output_ssemov.

I need to take out the ext_sse_reg_operand removal since it is still being
used.  I added MODE_DI to to ix86_output_ssemov.

> Similarly I think patch #4 is OK once this one goes in since it adds V1DF as
> well.
>
> So perhaps an integration plan would be to immediately install #6, followed 
> 24hrs
> later by patch #4, then 24hrs after patch #2.
>
> Then we can work on patch #5 and patch #3 where I think we go with patch #5 
> plus
> the MODE_SI hunk from patch #3.  THen 24hrs after that the remaining bits of
> patch #3.

I am enclosing the updated 5 remaining patches.  I will check in the
first one and
check in the rest one patch every 24hrs.

> I think that covers the whole series.
>

Thanks.

-- 
H.J.
From 555880dad82a9b511945250c0436ee05c4962f65 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 14 Feb 2020 11:07:34 -0800
Subject: [PATCH 1/5] i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV

There is no need to set mode attribute to XImode since ix86_output_ssemov
can properly encode xmm16-xmm31 registers with and without AVX512VL.

	PR target/89229
	* config/i386/i386.c (ix86_output_ssemov): Handle MODE_DI,
	MODE_V1DF and MODE_V2SF.
	* config/i386/mmx.md (MMXMODE:*mov_internal): Call
	ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
	check.
---
 gcc/config/i386/i386.c | 19 +++
 gcc/config/i386/mmx.md | 29 ++---
 2 files changed, 21 insertions(+), 27 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7bbfbb4c5a7..6d83855692f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5118,6 +5118,25 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands)
 case MODE_V4SF:
   return ix86_get_ssemov (operands, 16, insn_mode, mode);
 
+case MODE_DI:
+  /* Handle broken assemblers that require movd instead of movq. */
+  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
+	  && (GENERAL_REG_P (operands[0])
+	  || GENERAL_REG_P (operands[1])))
+	return "%vmovd\t{%1, %0|%0, %1}";
+  else
+	return "%vmovq\t{%1, %0|%0, %1}";
+
+case MODE_V1DF:
+  gcc_assert (!TARGET_AVX);
+  return "movlpd\t{%1, %0|%0, %1}";
+
+case MODE_V2SF:
+  if (TARGET_AVX && REG_P (operands[0]))
+	return "vmovlps\t{%1, %d0|%d0, %1}";
+  else
+	return "%vmovlps\t{%1, %0|%0, %1}";
+
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e1c8b0af4c7..c3f195bb34a 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -118,29 +118,7 @@ (define_insn "*mov_internal"
   return standard_sse_constant_opcode (insn, operands);
 
 case TYPE_SSEMOV:
-  switch (get_attr_mode (insn))
-	{
-	case MODE_DI:
-	  /* Handle broken assemblers that require movd instead of movq.  */
-	  if (!HAVE_AS_IX86_INTERUNIT_MOVQ
-	  && (GENERAL_REG_P (operands[0]) || GENERAL_REG_P (operands[1])))
-	return "%vmovd\t{%1, %0|%0, %1}";
-	  return "%vmovq\t{%1, %0|%0, %1}";
-	case MODE_TI:
-	  return "%vmovdqa\t{%1, %0|%0, %1}";
-	case MODE_XI:
-	  return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
-
-	case MODE_V2SF:
-	  if (TARGET_AVX && REG_P (operands[0]))
-	return "vmovlps\t{%1, %0, %0|%0, %0, %1}";
-	  return "%vmovlps\t{%1, %0|%0, %1}";
-	case MODE_V4SF:
-	  return "%vmovaps\t{%1, %0|%0, %1}";
-
-	default:
-	  gcc_unreachable ();
-	}
+  return ix86_output_ssemov (insn, operands);
 
 default:
   gcc_unreachable ();
@@ -189,10 +167,7 @@ (define_insn "*mov_internal"
  (cond [(eq_attr "alternative" "2")
 	  (const_string "SI")
 	(eq_attr "alternative" "11,12")
-	  (cond [(ior (match_operand 0 "ext_sse_reg_operand")
-			  (match_operand 1 "ext_sse_reg_operand"))
-			(const_string "XI")
-		 (match_test "mode == V2SFmode")
+	  (cond [(match_test "mode == V2SFmode")
 		   (const_string "V4SF")
 		 (ior (not (match_test "TARGET_SSE2"))
 			  (match_test "optimize_function_for_size_p (cfun)"))
-- 
2.24.1

From d02ae1b84bb6dcc30230808a57e12d49d6f4a853 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 14 Feb 2020 10:32:06 -0800
Subject: [PATCH 2/5] i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV

There is no need to set mode attribute to XImode 

Re: Strange read for simple_object_elf_find_sections for .a file

2020-03-12 Thread Martin Liška

The answer is simple:


diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index fbe2c6885f0..83f1ac38d37 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -1033,6 +1033,7 @@ static int
 process_lto_section (void *data, const char *name, off_t offset, off_t len)
 {
   struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  offset += obj->file->offset;
   if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
 {
   if (offset != lseek (obj->file->fd, offset, SEEK_SET))

Sorry for the noise.
Martin


Strange read for simple_object_elf_find_sections for .a file

2020-03-12 Thread Martin Liška

Hi.

I noticed a strange error during parsing of .gnu.lto_.lto section
in LTO plugin:

$ cat bss.c
int global_zero;
int global_one = 1;

int main() { return 0; }

$ gcc -flto bss.c -c
$ gcc bss.o
read version 9.0

while:

$ ar r x.a bss.o
ar: creating x.a

$ gcc x.a
read version 13617.13368

$ objdump -s -j .gnu.lto_.lto.655b2ec59df60b19 x.a
In archive x.a:

bss.o: file format elf64-x86-64

Contents of section .gnu.lto_.lto.655b2ec59df60b19:
  0900 01000100

which is the same as objdump of bss.o

Any idea why that happens?
Thanks,
Martin
>From f12831ebf6ed68b569f6d0c6a9243dfa15ba3b40 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 12 Mar 2020 11:30:17 +0100
Subject: [PATCH] Test x.

---
 lto-plugin/lto-plugin.c | 57 +++--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..fbe2c6885f0 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX ".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN (sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -1010,6 +1012,52 @@ process_offload_section (void *data, const char *name, off_t offset, off_t len)
   return 1;
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct lto_section lto_header;
+
+
+/* Find and parse .lto section of an object file.  */
+
+static int
+process_lto_section (void *data, const char *name, off_t offset, off_t len)
+{
+  struct plugin_objfile *obj = (struct plugin_objfile *)data;
+  if (strncmp (name, LTO_LTO_PREFIX, LTO_LTO_PREFIX_LEN) == 0)
+{
+  if (offset != lseek (obj->file->fd, offset, SEEK_SET))
+	goto err;
+
+  ssize_t got = read (obj->file->fd, _header,
+			  sizeof (struct lto_section));
+  if (got != sizeof (struct lto_section))
+	goto err;
+
+  fprintf (stderr, "read version %d.%d\n", lto_header.major_version,
+	   lto_header.minor_version);
+
+  return 0;
+}
+
+  return 1;
+
+err:
+  if (message)
+message (LDPL_FATAL, "%s: corrupt object file", obj->file->name);
+  obj->found = 0;
+  return 0;
+}
+
 /* Callback used by gold to check if the plugin will claim FILE. Writes
the result in CLAIMED. */
 
@@ -1055,8 +1103,13 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed)
   if (!obj.objfile && !err)
 goto err;
 
-  if (obj.objfile)
-errmsg = simple_object_find_sections (obj.objfile, process_symtab, , );
+   if (obj.objfile)
+{
+  errmsg = simple_object_find_sections (obj.objfile, process_symtab, , );
+
+  if (!errmsg)
+	errmsg = simple_object_find_sections (obj.objfile, process_lto_section, , );
+}
 
   if (!obj.objfile || errmsg)
 {
-- 
2.25.1



Re: [PATCH] libstdc++: Fix the return type of __cxa_finalize according to the Itanium C++ ABI

2020-03-12 Thread Jonathan Wakely via Gcc-patches

Please CC libstd...@gcc.gnu.org for all libstdc++ patches, as per
https://gcc.gnu.org/lists.html

On 11/03/20 21:24 -0700, Fangrui Song wrote:

Alternatively, we can delete it, because no user code should call it.
It may be weird that libc is expected to define this function.
This function is a language runtime interface that has nothing to do
with a libc.


It's coupled with __cxa_atexit, and the ABI says:

"It is expected that implementations supporting both C and C++ will
integrate this capability into the libc atexit implementation so that
C-only DSOs will nevertheless interact with C++ programs in a
C++-standard-conforming manner."


---
libstdc++-v3/libsupc++/cxxabi.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/libsupc++/cxxabi.h b/libstdc++-v3/libsupc++/cxxabi.h
index 50298205daa..000713ecdf8 100644
--- a/libstdc++-v3/libsupc++/cxxabi.h
+++ b/libstdc++-v3/libsupc++/cxxabi.h
@@ -127,7 +127,7 @@ namespace __cxxabiv1
  int
  __cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;

-  int
+  void
  __cxa_finalize(void*);


The change is correct, but I think we should wait for stage 1 (after
the GCC 10 release) to apply it. If I forget, please remind us in a
couple of months.



Re: [PATCH v2] generate EH info for volatile asm statements (PR93981)

2020-03-12 Thread Richard Sandiford
"J.W. Jagersma"  writes:
> On 2020-03-09 13:13, Richard Sandiford wrote:
>> Thanks for doing this.
>
> Hi Richard, thanks for your response.
>
>> "J.W. Jagersma"  writes:
>>> On 2020-03-07 20:20, Segher Boessenkool wrote:
 Some comments:

> +When non-call exceptions (@option{-fnon-call-exceptions}) are enabled, a
> +@code{volatile asm} statement is also allowed to throw exceptions.  If it
> +does, then the compiler assumes that its output operands have not been 
> written
> +yet.

 That reads as if the compiler assumes the outputs retain their original
 value, but that isn't true (I hope!)  The compiler assumes the output
 are clobbered, but it doesn't assume they are assigned any definite
 value?
>>>
>>> Register outputs are assumed to be clobbered, yes.  For memory outputs
>>> this is not the case, if the asm writes it before throwing then the
>>> memory operand retains this value.  It should be the user's
>>> responsibility to ensure that an asm has no side-effects if it throws.
>> 
>> I guess one problem is that something like "=" explicitly says that
>> the operand is clobbered "early", and I think it would be fair for
>> "early" to include clobbers before the exception.  So IMO we should
>> allow at least early-clobbered memory outputs to be clobbered by the
>> exception.
>
> Is "=" not equivalent to "=m" in every case?  As I understand it, the
> earlyclobber modifier is only relevant for register outputs.  It does
> not specify anything about clobbering the output itself, it only says
> that an input could be clobbered if it is allocated in the same
> register (or if a memory input uses this register as index).
>
>> And if we do that, then I'm not sure there's much benefit in trying to
>> treat the non-earlyclobber memory case specially.
>> 
>> It would be good to have testcases for the output cases.  E.g. for:
>> 
>> int foo;
>> int bar = 0;
>> try
>>   {
>> foo = 1;
>> asm volatile ("..." : "=m" (foo));
>>   }
>> catch (...whatever...)
>>   {
>> bar = foo;
>>   }
>> ...use bar...
>> 
>> What does "bar = foo" read?  Is it always undefined behaviour if executed?
>> Or does it always read "foo" from memory?  Can it be optimised to "bar = 1"?
>> Is "foo = 1" dead code?
>
> These are very good points.  But I am not sure how to test for all of
> these.  My test case now looks as follows:
>
> // PR inline-asm/93981
> // { dg-do run }
> // { dg-options "-fnon-call-exceptions -O3" }
> // { dg-xfail-run-if "" { ! *-linux-gnu } }
>
> #include 
>
> struct illegal_opcode { };
>
> extern "C" void
> sigill (int)
> {
>   throw illegal_opcode ( );
> }
>
> int
> test_mem ()
> {
>   int i = 2;
>   try
> {
>   asm volatile ("mov%z0 $1, %0; ud2" : "=m" (i));
> }
>   catch (const illegal_opcode&)
> {
>   if (i == 1) return 0;
> }
>   return i;
> }
>
> int
> test_reg ()
> {
>   int i = 2;
>   try
> {
>   asm volatile ("mov%z0 $1, %0; ud2" : "=r" (i));
> }
>   catch (const illegal_opcode&)
> {
>   if (i == 2) return 0;
> }
>   return i;
> }
>
> int
> main ()
> {
>   std::signal (SIGILL, sigill);
>   return test_reg () + test_mem ();
> }
>
> I think that should cover most of it.  Am I missing anything?

The other case I mentioned was equivalent to:

 int
 test_mem2 ()
 {
   int i = 2;
   try
 {
   asm volatile ("ud2; mov%z0 $1, %0" : "=m" (i));
 }
   catch (const illegal_opcode&)
 {
   if (i == 2) return 0;
 }
   return i;
 }

Is this test expected to pass?

However, these "mem" tests are testing gimple register types, so they'll
still be SSA names outside of the asm.  It would also be good to test what
happens for non-register types, e.g. something like:

 int
 test_mem3 ()
 {
   int i[5] = { 1, 2, 3, 4, 5 };
   try
 {
   asm volatile ("ud2; " : "=m" (i));
 }
   catch (const illegal_opcode&)
 {
   if (i[0] == 1 && ...) return 0;
 }
   return ...;
 }

and similarly for ud2 after the store.

It wasn't clear from my message above, but: I was mostly worried about
requiring the asm to treat memory operands in a certain way when the
exception is thrown.  IMO it would be better to say that the values of
memory operands are undefined on the exception edge.

Thanks,
Richard


Re: [RFC][gcc][vect] PR 91246: Prototype for vectorization of loops with breaks

2020-03-12 Thread Richard Biener via Gcc-patches
On Thu, Mar 12, 2020 at 10:15 AM Andre Vieira (lists)
 wrote:
>
> Hi all,
>
> This is a prototype I have put together to look at the feasibility and
> profitability of vectorizing loops with breaks as suggested in PR 91246.
> I am posting this here for comments as I am curious what your opinions
> are on my approach.  I don't expect much attention to this in stage 4,
> but I wanted to get it out now before I forget about it.
>
> The idea is that during ifcvt it checks whether it can safely transform
> the break away, version the loop and in the "to vectorize" version
> replace it with a hopefully a reduceable comparison and MIN_EXPR.
> Currently I have limited the cases it can transform to loops that meet
> all of the following conditions:
> - is a inner-loop with one break point,
> - the break condition is an EQ_EXPR or NE_EXPR of integers,
> - the entire loop is vectorize, i.e. it has no scalar epilogue,
> - the loop has no writes to memory and no variable escapes the loop,
> - all memory accesses are valid (i.e. within bounds) even if the loop
> doesn't break early.
>
> To check the last condition I copied and modified the checks used to
> warn for out of bounds accesses.  I modified these checks such that they
> are more conservative, we want to reject the transformation if we can't
> guarantee the accesses are within bounds for any possible range. I have
> noticed these range checks aren't very good yet, I am hoping that as
> project ranger evolves this will get easier. However, I am also worried
> that if they start to take control flow into consideration and
> "understand" the break, they will use it to reason that with the break
> in place the range of loop induction variables are limited. This
> obviously makes it unusable for us to check whether removing the break
> will cause out of bounds accesses.  Ideally the new ranger would allow
> us to "ignore" certain expressions in the range calculation. All this
> was with targets in mind that do not support masking, for targets with
> the ability to mask vector operations we could do without such analysis,
> as we may be able to prevent memory accesses taking place after the
> break condition has been met.

Not looking at the patch but just throwing in thoughts.

I think at some point we need to stop relying on a separate if-conversion
pass since that requires a GIMPLE representation for the scalar
if-converted loop which is really difficult if you start do apply predication
to more than just loads and stores.

That is, I'd like to see us experimenting with doing the if-conversion
"on the fly" during vectorization.  That probably means doing what
if-conversion does but represent the result in some non-IL data
structure - since we're going to do all-SLP it'll be the SLP graph,
maybe simply attach a vector of predicates to each node.
The benefit is that we'll eventually be able to if-convert for
non-loop vectorization as well.

Possibly easiest is of course to try do away with if-conversion as it
is now and only have the existing feature-set moved to the vectorizer.
I'd probably really start with copying the if-conversion analysis code
and simply skip its "code-generation", then handle multiple BBs
in all of the vectorizer code and see how analysis and code generation
would have to be adjusted.  Given the general direction I'd focus
on full-SLP testcases but in reality I cannot promise I'll finish
the only-SLP transition in time for GCC 11 (I hope to have a good
chunk ready for the Cauldron though).

> In trying to improve performance of this transformation I made a small
> change in the vectorizer, such that if we are dealing with such loops
> with breaks the ifcvt pass stores the variable holding the break
> condition in the loop struct under 'early_break_cond'. The vectorizer
> can then use this variable to check whether we have hit the break
> condition every iteration, preventing us to continue to go through the
> loop if we know we can stop early. This will benefit cases where we
> would break early, on the other hand if we break late, then we now
> introduced extra checks within the loop... However there is no way for
> us to know, one small improvement would be to use the known iteration
> count to decide whether or not to add these checks, one could argue that
> if we know the number of iterations to be small there is little point in
> adding these.  Thoughts welcome...
>
> Another issue I encountered was that the early loop unroller often gets
> in the way of things. For this reason I tried to teach it not to unroll
> inner loops which we can remove breaks from. Unfortunately the early
> loop unroller is performed before some other simplification
> transformations and loop canonicalisation, I have seen cases where these
> transformations were crucial in making our memory access analysis accept
> loops. I suggest disabling the early loop unroller to check whether this
> pass is able to handle certain loops, i.e. pass
> 

Re: Remove redundant zero extend

2020-03-12 Thread Richard Biener via Gcc-patches
On Thu, Mar 12, 2020 at 4:06 AM Jeff Law via Gcc-patches
 wrote:
>
> On Wed, 2020-03-11 at 13:04 +, Nidal Faour via Gcc-patches wrote:
> > This patch is a code density oriented and attempt to remove redundant 
> > sign/zero
> > extension from assignment statement.
> > The approach taken is to use VRP data while expanding the assignment to RTL 
> > to
> > determine whether a sign/zero extension is necessary.
> > Thought the motivation of the patch is code density but it also good for 
> > speed.
> >
> > for example:
> > extern unsigned int func ();
> >
> >   unsigned char
> >   foo (unsigned int arg)
> >   {
> > if (arg == 2)
> >   return 0;
> >
> > return (func() == arg || arg == 7);
> >   }
> >
> > the result of the comparison in the return will yield a False or True, which
> > will be converted to integer and then casting to unsigned char.
> > this casting from integer to unsigned char is redundant because the value is
> > either 0 or 1.
> >
> > this patch is targeting the RISCV-32bit only.
> > This patch has been tested on a real embedded project and saved about 0.2% 
> > of
> > code size.
> >
> > P.S. I have an FSF license under Western Digital incorporation.
> > P.S. I've attached the patch as a file in case my email client corrupted the
> > patch itself
> Just an FYI.  We're at stage4 in our development cycle, as a result most
> developers are focused on regression bugfixing until the gcc-10 release is 
> made.
> I've put your patch into the queue of things to evaluate for gcc-11.

Note there was also more general development in the past with regarding to
zext/sext removal and using VRP data for that.  Google for 'type promotion
pass' in the ml archives.  Quickly scanning your patch makes it look quite
ad-hoc to me.

Richard.

> Thanks,
> jeff
> >
>


[RFC][gcc][vect] PR 91246: Prototype for vectorization of loops with breaks

2020-03-12 Thread Andre Vieira (lists)

Hi all,

This is a prototype I have put together to look at the feasibility and 
profitability of vectorizing loops with breaks as suggested in PR 91246. 
I am posting this here for comments as I am curious what your opinions 
are on my approach.  I don't expect much attention to this in stage 4, 
but I wanted to get it out now before I forget about it.


The idea is that during ifcvt it checks whether it can safely transform 
the break away, version the loop and in the "to vectorize" version 
replace it with a hopefully a reduceable comparison and MIN_EXPR. 
Currently I have limited the cases it can transform to loops that meet 
all of the following conditions:

- is a inner-loop with one break point,
- the break condition is an EQ_EXPR or NE_EXPR of integers,
- the entire loop is vectorize, i.e. it has no scalar epilogue,
- the loop has no writes to memory and no variable escapes the loop,
- all memory accesses are valid (i.e. within bounds) even if the loop 
doesn't break early.


To check the last condition I copied and modified the checks used to 
warn for out of bounds accesses.  I modified these checks such that they 
are more conservative, we want to reject the transformation if we can't 
guarantee the accesses are within bounds for any possible range. I have 
noticed these range checks aren't very good yet, I am hoping that as 
project ranger evolves this will get easier. However, I am also worried 
that if they start to take control flow into consideration and 
"understand" the break, they will use it to reason that with the break 
in place the range of loop induction variables are limited. This 
obviously makes it unusable for us to check whether removing the break 
will cause out of bounds accesses.  Ideally the new ranger would allow 
us to "ignore" certain expressions in the range calculation. All this 
was with targets in mind that do not support masking, for targets with 
the ability to mask vector operations we could do without such analysis, 
as we may be able to prevent memory accesses taking place after the 
break condition has been met.


In trying to improve performance of this transformation I made a small 
change in the vectorizer, such that if we are dealing with such loops 
with breaks the ifcvt pass stores the variable holding the break 
condition in the loop struct under 'early_break_cond'. The vectorizer 
can then use this variable to check whether we have hit the break 
condition every iteration, preventing us to continue to go through the 
loop if we know we can stop early. This will benefit cases where we 
would break early, on the other hand if we break late, then we now 
introduced extra checks within the loop... However there is no way for 
us to know, one small improvement would be to use the known iteration 
count to decide whether or not to add these checks, one could argue that 
if we know the number of iterations to be small there is little point in 
adding these.  Thoughts welcome...


Another issue I encountered was that the early loop unroller often gets 
in the way of things. For this reason I tried to teach it not to unroll 
inner loops which we can remove breaks from. Unfortunately the early 
loop unroller is performed before some other simplification 
transformations and loop canonicalisation, I have seen cases where these 
transformations were crucial in making our memory access analysis accept 
loops. I suggest disabling the early loop unroller to check whether this 
pass is able to handle certain loops, i.e. pass 
'-fdisable-tree-cunrolli', you can even go as far as use 
-fdisable-tree-cunrolli= for more accurate 
benchmarking/analysis.


I made some changes to the testcase on the PR ticket such that the 
compiler has enough information to determine all memory accesses are 
within bounds, see testcase below. This is one of those cases that 
requires '-fdisable-tree-cunrolli'.


#include 
#define SIZE 8
int s, sum;

int *g_board;
int *g_moves;

static void f(int *moves, int *board, int cnt, int lastmove, int ko)
{
for (int i = 0; i < cnt; i++) {
if (moves[i] == ko) continue;

bool found = false;
for (int j = 0; j < SIZE; j++) {
if ((lastmove + board[j]) == moves[j]) {
found = true;
break;
}
}

if (!found) {
s *= 20;
}

if (s >= 40) {
sum += s;
}
}
}

void foo(int cnt, int lastmove, int ko)
{
  int board[SIZE];
  int moves[SIZE];
  for (int i = 0; i < SIZE; ++i)
{
  board[i] = g_board[i];
  moves[i] = g_moves[i];
}

  f (moves, board, cnt, lastmove, ko);
}


I have not found this transformation to have a big impact on performance 
so far.


Kind Regards,
Andre
diff --git a/gcc/cfganal.h b/gcc/cfganal.h
index 849e537eddbf8e551b5af26a48a4468155bb7635..cccba9f9628b7aab2febea98cb9cece11c99900c 100644
--- a/gcc/cfganal.h
+++ b/gcc/cfganal.h
@@ -81,6 +81,8 @@ extern void bitmap_union_of_preds 

Fwd: Re: [Fortran, OpenACC] Reject vars of different scope in $acc declare (PR94120)

2020-03-12 Thread Tobias Burnus

(I assume that should have gone to gcc-patches@ and fortran@ as well.)


 Forwarded Message 
Subject:Re: [Fortran, OpenACC] Reject vars of different scope in $acc
declare (PR94120)
Date:   Tue, 10 Mar 2020 18:02:41 +
From:   Paul Richard Thomas 
To: Tobias Burnus 



Hi Tobias,

This looks to be straightforward enough to me :-) OK for trunk.

Thanks

Paul

On Tue, 10 Mar 2020 at 17:18, Tobias Burnus  wrote:

[This fixes a bunch of issues found when actually only
wanting to add a check for the following restriction.]

OpenACC's "Declare Directive" has the following restriction:

"A declare directive must be in the same scope
   as the declaration of any var that appears in
   the data clauses of the directive."

The gfortran now rejects a "var" declared is in a different
namespace than the "$acc declare". (Use-associated variables
are already rejected.) Testing showed that a straight-forward
check fails if the result variable is the function name – as
then the function symbol is in the parent scope. — Extending
the failing test to use a result variable showed that the
current is-a-module-variable check didn't work and when fixing,
one was running into a likewise issue.

The reason that I exclude 's' being a module is that at
resolution time, an is-variable check is done.


The other changes are because the following restriction was
mishandled:

"In a Fortran module declaration section, only
   create, copyin, device_resident, and link clauses are allowed."

But all examples where for variables using those in a module
procedure …


OK for the trunk?

Cheers,

Tobias

PS: For those who wounder what happens in a BLOCK DATA construct:
gfortran outrightly rejects 'acc declare'. (It probably should
work for COMMON blocks with 'declare device_resident' – but
currently the spec only permits it in program/subroutine/function
+ declaration part of a module.)

PPS: The PR shows (for C) that one can construct a test case,
which violates the OpenACC restriction and actually fails with
an ICE. I have a draft patch for C (see PR) but not yet one for
C++, hence, I start with the Fortran side. – I currently still
struggle to write a same-scope check in C++.
[The C test case in turn was a fallout of debugging an
ICE-on-valid-code issue …]

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter




--
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] tree-dse: Fix mem* head trimming if call has lhs [PR94130]

2020-03-12 Thread Richard Biener
On Thu, 12 Mar 2020, Jakub Jelinek wrote:

> On Thu, Mar 12, 2020 at 09:20:08AM +0100, Richard Biener wrote:
> > On Thu, 12 Mar 2020, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > As the testcase shows, if DSE decides to head trim 
> > > {mem{set,cpy,move},strncpy}
> > > and the call has lhs, it is incorrect to leave the lhs as is, because it
> > > will then point to the adjusted address (base + head_trim) instead of the
> > > original base.
> > > The following patch fixes that by dropping the lhs of the call and 
> > > assigning
> > > lhs the original base in a following statement.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > OK, but I don't see where you need anything additional frmo gimplify.h?
> 
> unshare_expr here:
> > > + gassign *newop = gimple_build_assign (lhs, unshare_expr (*where));
> 
> While SSA_NAMEs obviously don't need to be unshared, if it is a gimple
> invariant, it might need to be.

Ah, ok.

Richard.


Re: [PATCH] tree-dse: Fix mem* head trimming if call has lhs [PR94130]

2020-03-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 12, 2020 at 09:20:08AM +0100, Richard Biener wrote:
> On Thu, 12 Mar 2020, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > As the testcase shows, if DSE decides to head trim 
> > {mem{set,cpy,move},strncpy}
> > and the call has lhs, it is incorrect to leave the lhs as is, because it
> > will then point to the adjusted address (base + head_trim) instead of the
> > original base.
> > The following patch fixes that by dropping the lhs of the call and assigning
> > lhs the original base in a following statement.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> OK, but I don't see where you need anything additional frmo gimplify.h?

unshare_expr here:
> > +   gassign *newop = gimple_build_assign (lhs, unshare_expr (*where));

While SSA_NAMEs obviously don't need to be unshared, if it is a gimple
invariant, it might need to be.

Jakub



Re: [PATCH] doc: Fix up ASM_OUTPUT_ALIGNED_DECL_LOCAL description

2020-03-12 Thread Richard Biener
On Thu, 12 Mar 2020, Jakub Jelinek wrote:

> Hi!
> 
> When looking into PR94134, I've noticed bugs in the
> ASM_OUTPUT_ALIGNED_DECL_LOCAL documentation.  varasm.c has:
> #if defined ASM_OUTPUT_ALIGNED_DECL_LOCAL
>   unsigned int align = symtab_node::get (decl)->definition_alignment ();
>   ASM_OUTPUT_ALIGNED_DECL_LOCAL (asm_out_file, decl, name,
>  size, align);
>   return true;
> #elif defined ASM_OUTPUT_ALIGNED_LOCAL
>   unsigned int align = symtab_node::get (decl)->definition_alignment ();
>   ASM_OUTPUT_ALIGNED_LOCAL (asm_out_file, name, size, align);
>   return true;
> #else
>   ASM_OUTPUT_LOCAL (asm_out_file, name, size, rounded);
>   return false;
> #endif
> and the ASM_OUTPUT_ALIGNED_LOCAL documentation properly mentions:
> Like @code{ASM_OUTPUT_LOCAL} and mentions the same macro in another place.
> The ASM_OUTPUT_ALIGNED_DECL_LOCAL description mentions non-existing macros
> ASM_OUTPUT_ALIGNED_DECL and ASM_OUTPUT_DECL instead of the right ones
> ASM_OUTPUT_ALIGNED_LOCAL and ASM_OUTPUT_LOCAL.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK

> 2020-03-12  Jakub Jelinek  
> 
>   * doc/tm.texi.in (ASM_OUTPUT_ALIGNED_DECL_LOCAL): Change
>   ASM_OUTPUT_ALIGNED_DECL in description to ASM_OUTPUT_ALIGNED_LOCAL
>   and ASM_OUTPUT_DECL to ASM_OUTPUT_LOCAL.
>   * doc/tm.texi: Regenerated.
> 
> --- gcc/doc/tm.texi.in.jj 2020-01-12 11:54:36.555411266 +0100
> +++ gcc/doc/tm.texi.in2020-03-11 15:25:47.794562267 +0100
> @@ -5410,11 +5410,11 @@ as the number of bits.
>  @end defmac
>  
>  @defmac ASM_OUTPUT_ALIGNED_DECL_LOCAL (@var{stream}, @var{decl}, @var{name}, 
> @var{size}, @var{alignment})
> -Like @code{ASM_OUTPUT_ALIGNED_DECL} except that @var{decl} of the
> +Like @code{ASM_OUTPUT_ALIGNED_LOCAL} except that @var{decl} of the
>  variable to be output, if there is one, or @code{NULL_TREE} if there
>  is no corresponding variable.  If you define this macro, GCC will use it
> -in place of both @code{ASM_OUTPUT_DECL} and
> -@code{ASM_OUTPUT_ALIGNED_DECL}.  Define this macro when you need to see
> +in place of both @code{ASM_OUTPUT_LOCAL} and
> +@code{ASM_OUTPUT_ALIGNED_LOCAL}.  Define this macro when you need to see
>  the variable's decl in order to chose what to output.
>  @end defmac
>  
> --- gcc/doc/tm.texi.jj2020-01-24 22:34:36.096644965 +0100
> +++ gcc/doc/tm.texi   2020-03-11 15:26:07.899268563 +0100
> @@ -8384,11 +8384,11 @@ as the number of bits.
>  @end defmac
>  
>  @defmac ASM_OUTPUT_ALIGNED_DECL_LOCAL (@var{stream}, @var{decl}, @var{name}, 
> @var{size}, @var{alignment})
> -Like @code{ASM_OUTPUT_ALIGNED_DECL} except that @var{decl} of the
> +Like @code{ASM_OUTPUT_ALIGNED_LOCAL} except that @var{decl} of the
>  variable to be output, if there is one, or @code{NULL_TREE} if there
>  is no corresponding variable.  If you define this macro, GCC will use it
> -in place of both @code{ASM_OUTPUT_DECL} and
> -@code{ASM_OUTPUT_ALIGNED_DECL}.  Define this macro when you need to see
> +in place of both @code{ASM_OUTPUT_LOCAL} and
> +@code{ASM_OUTPUT_ALIGNED_LOCAL}.  Define this macro when you need to see
>  the variable's decl in order to chose what to output.
>  @end defmac
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-dse: Fix mem* head trimming if call has lhs [PR94130]

2020-03-12 Thread Richard Biener
On Thu, 12 Mar 2020, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, if DSE decides to head trim {mem{set,cpy,move},strncpy}
> and the call has lhs, it is incorrect to leave the lhs as is, because it
> will then point to the adjusted address (base + head_trim) instead of the
> original base.
> The following patch fixes that by dropping the lhs of the call and assigning
> lhs the original base in a following statement.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, but I don't see where you need anything additional frmo gimplify.h?

Thanks,
Richard.

> 2020-03-12  Jakub Jelinek  
> 
>   PR tree-optimization/94130
>   * tree-ssa-dse.c: Include gimplify.h.
>   (increment_start_addr): If stmt has lhs, drop the lhs from call and
>   set it after the call to the original value of the first argument.
>   Formatting fixes.
>   (decrement_count): Formatting fix.
> 
>   * gcc.c-torture/execute/pr94130.c: New test.
> 
> --- gcc/tree-ssa-dse.c.jj 2020-03-03 11:04:46.367821907 +0100
> +++ gcc/tree-ssa-dse.c2020-03-11 13:57:38.671845186 +0100
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-ssa-dse.h"
>  #include "builtins.h"
>  #include "gimple-fold.h"
> +#include "gimplify.h"
>  
>  /* This file implements dead store elimination.
>  
> @@ -422,29 +423,38 @@ decrement_count (gimple *stmt, int decre
>gcc_assert (TREE_CODE (*countp) == INTEGER_CST);
>*countp = wide_int_to_tree (TREE_TYPE (*countp), (TREE_INT_CST_LOW 
> (*countp)
>   - decrement));
> -
>  }
>  
>  static void
>  increment_start_addr (gimple *stmt, tree *where, int increment)
>  {
> +  if (tree lhs = gimple_call_lhs (stmt))
> +if (where == gimple_call_arg_ptr (stmt, 0))
> +  {
> + gassign *newop = gimple_build_assign (lhs, unshare_expr (*where));
> + gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> + gsi_insert_after (, newop, GSI_SAME_STMT);
> + gimple_call_set_lhs (stmt, NULL_TREE);
> + update_stmt (stmt);
> +  }
> +
>if (TREE_CODE (*where) == SSA_NAME)
>  {
>tree tem = make_ssa_name (TREE_TYPE (*where));
>gassign *newop
> -= gimple_build_assign (tem, POINTER_PLUS_EXPR, *where,
> + = gimple_build_assign (tem, POINTER_PLUS_EXPR, *where,
>  build_int_cst (sizetype, increment));
>gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>gsi_insert_before (, newop, GSI_SAME_STMT);
>*where = tem;
> -  update_stmt (gsi_stmt (gsi));
> +  update_stmt (stmt);
>return;
>  }
>  
>*where = build_fold_addr_expr (fold_build2 (MEM_REF, char_type_node,
> - *where,
> - build_int_cst (ptr_type_node,
> -increment)));
> +   *where,
> +   build_int_cst (ptr_type_node,
> +  increment)));
>  }
>  
>  /* STMT is builtin call that writes bytes in bitmap ORIG, some bytes are dead
> --- gcc/testsuite/gcc.c-torture/execute/pr94130.c.jj  2020-03-11 
> 14:01:49.431180291 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr94130.c 2020-03-11 
> 14:01:37.654352419 +0100
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/94130 */
> +
> +int
> +main ()
> +{
> +  int a[8];
> +  char *b = __builtin_memset (a, 0, sizeof (a));
> +  a[0] = 1;
> +  a[1] = 2;
> +  a[2] = 3;
> +  if (b != (char *) a)
> +__builtin_abort ();
> +  else
> +asm volatile ("" : : "g" (a) : "memory");
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] tree-optimization/94103 avoid CSE of loads with padding

2020-03-12 Thread Richard Biener


VN currently replaces a load of a 16 byte entity 128 bits of precision
(TImode) with the result of a load of a 16 byte entity with 80 bits of
mode precision (XFmode).  That will go downhill since if the padding
bits are not actually filled with memory contents those bits are
missing.

I'm conservative on the optimization side for now since I'd like
to see actual testcases.  The fix is likely not comprehensive
either since I only touch a single place in VN.  At least the
fix is somewhat obvious and independent of the mode representation
issue of XFmode on x86.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2020-03-12  Richard Biener  

PR tree-optimization/94103
* tree-ssa-sccvn.c (visit_reference_op_load): Avoid type
punning when the mode precision is not sufficient.

* gcc.target/i386/pr94103.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr94103.c | 17 +
 gcc/tree-ssa-sccvn.c| 23 ---
 2 files changed, 33 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94103.c

diff --git a/gcc/testsuite/gcc.target/i386/pr94103.c 
b/gcc/testsuite/gcc.target/i386/pr94103.c
new file mode 100644
index 000..91b5fc64b5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr94103.c
@@ -0,0 +1,17 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-options "-O3" } */
+
+int main()
+{
+  long double x;
+  unsigned long u[2] = {0xUL, 0xUL};
+  __builtin_memcpy(, , sizeof x);
+  __builtin_memcpy(, , sizeof u);
+  ++*(unsigned char *)
+  (void)-x;
+  __builtin_memcpy(, , sizeof u);
+  if (u[1] != 0xUL
+  || u[0] != 0xEEEFUL)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index b7174cd603f..150ddad3e69 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4899,13 +4899,22 @@ visit_reference_op_load (tree lhs, tree op, gimple 
*stmt)
   if (result
   && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
 {
-  /* We will be setting the value number of lhs to the value number
-of VIEW_CONVERT_EXPR  (result).
-So first simplify and lookup this expression to see if it
-is already available.  */
-  gimple_match_op res_op (gimple_match_cond::UNCOND,
- VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
-  result = vn_nary_build_or_lookup (_op);
+  /* Avoid the type punning in case the result mode has padding where
+the op we lookup has not.  */
+  if (maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
+   GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
+   result = NULL_TREE;
+  else
+   {
+ /* We will be setting the value number of lhs to the value number
+of VIEW_CONVERT_EXPR  (result).
+So first simplify and lookup this expression to see if it
+is already available.  */
+ gimple_match_op res_op (gimple_match_cond::UNCOND,
+ VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
+ result = vn_nary_build_or_lookup (_op);
+   }
+
   /* When building the conversion fails avoid inserting the reference
  again.  */
   if (!result)
-- 
2.16.4


[PATCH] doc: Fix up ASM_OUTPUT_ALIGNED_DECL_LOCAL description

2020-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

When looking into PR94134, I've noticed bugs in the
ASM_OUTPUT_ALIGNED_DECL_LOCAL documentation.  varasm.c has:
#if defined ASM_OUTPUT_ALIGNED_DECL_LOCAL
  unsigned int align = symtab_node::get (decl)->definition_alignment ();
  ASM_OUTPUT_ALIGNED_DECL_LOCAL (asm_out_file, decl, name,
 size, align);
  return true;
#elif defined ASM_OUTPUT_ALIGNED_LOCAL
  unsigned int align = symtab_node::get (decl)->definition_alignment ();
  ASM_OUTPUT_ALIGNED_LOCAL (asm_out_file, name, size, align);
  return true;
#else
  ASM_OUTPUT_LOCAL (asm_out_file, name, size, rounded);
  return false;
#endif
and the ASM_OUTPUT_ALIGNED_LOCAL documentation properly mentions:
Like @code{ASM_OUTPUT_LOCAL} and mentions the same macro in another place.
The ASM_OUTPUT_ALIGNED_DECL_LOCAL description mentions non-existing macros
ASM_OUTPUT_ALIGNED_DECL and ASM_OUTPUT_DECL instead of the right ones
ASM_OUTPUT_ALIGNED_LOCAL and ASM_OUTPUT_LOCAL.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2020-03-12  Jakub Jelinek  

* doc/tm.texi.in (ASM_OUTPUT_ALIGNED_DECL_LOCAL): Change
ASM_OUTPUT_ALIGNED_DECL in description to ASM_OUTPUT_ALIGNED_LOCAL
and ASM_OUTPUT_DECL to ASM_OUTPUT_LOCAL.
* doc/tm.texi: Regenerated.

--- gcc/doc/tm.texi.in.jj   2020-01-12 11:54:36.555411266 +0100
+++ gcc/doc/tm.texi.in  2020-03-11 15:25:47.794562267 +0100
@@ -5410,11 +5410,11 @@ as the number of bits.
 @end defmac
 
 @defmac ASM_OUTPUT_ALIGNED_DECL_LOCAL (@var{stream}, @var{decl}, @var{name}, 
@var{size}, @var{alignment})
-Like @code{ASM_OUTPUT_ALIGNED_DECL} except that @var{decl} of the
+Like @code{ASM_OUTPUT_ALIGNED_LOCAL} except that @var{decl} of the
 variable to be output, if there is one, or @code{NULL_TREE} if there
 is no corresponding variable.  If you define this macro, GCC will use it
-in place of both @code{ASM_OUTPUT_DECL} and
-@code{ASM_OUTPUT_ALIGNED_DECL}.  Define this macro when you need to see
+in place of both @code{ASM_OUTPUT_LOCAL} and
+@code{ASM_OUTPUT_ALIGNED_LOCAL}.  Define this macro when you need to see
 the variable's decl in order to chose what to output.
 @end defmac
 
--- gcc/doc/tm.texi.jj  2020-01-24 22:34:36.096644965 +0100
+++ gcc/doc/tm.texi 2020-03-11 15:26:07.899268563 +0100
@@ -8384,11 +8384,11 @@ as the number of bits.
 @end defmac
 
 @defmac ASM_OUTPUT_ALIGNED_DECL_LOCAL (@var{stream}, @var{decl}, @var{name}, 
@var{size}, @var{alignment})
-Like @code{ASM_OUTPUT_ALIGNED_DECL} except that @var{decl} of the
+Like @code{ASM_OUTPUT_ALIGNED_LOCAL} except that @var{decl} of the
 variable to be output, if there is one, or @code{NULL_TREE} if there
 is no corresponding variable.  If you define this macro, GCC will use it
-in place of both @code{ASM_OUTPUT_DECL} and
-@code{ASM_OUTPUT_ALIGNED_DECL}.  Define this macro when you need to see
+in place of both @code{ASM_OUTPUT_LOCAL} and
+@code{ASM_OUTPUT_ALIGNED_LOCAL}.  Define this macro when you need to see
 the variable's decl in order to chose what to output.
 @end defmac
 

Jakub



[PATCH] tree-dse: Fix mem* head trimming if call has lhs [PR94130]

2020-03-12 Thread Jakub Jelinek via Gcc-patches
Hi!

As the testcase shows, if DSE decides to head trim {mem{set,cpy,move},strncpy}
and the call has lhs, it is incorrect to leave the lhs as is, because it
will then point to the adjusted address (base + head_trim) instead of the
original base.
The following patch fixes that by dropping the lhs of the call and assigning
lhs the original base in a following statement.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-03-12  Jakub Jelinek  

PR tree-optimization/94130
* tree-ssa-dse.c: Include gimplify.h.
(increment_start_addr): If stmt has lhs, drop the lhs from call and
set it after the call to the original value of the first argument.
Formatting fixes.
(decrement_count): Formatting fix.

* gcc.c-torture/execute/pr94130.c: New test.

--- gcc/tree-ssa-dse.c.jj   2020-03-03 11:04:46.367821907 +0100
+++ gcc/tree-ssa-dse.c  2020-03-11 13:57:38.671845186 +0100
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
 #include "tree-ssa-dse.h"
 #include "builtins.h"
 #include "gimple-fold.h"
+#include "gimplify.h"
 
 /* This file implements dead store elimination.
 
@@ -422,29 +423,38 @@ decrement_count (gimple *stmt, int decre
   gcc_assert (TREE_CODE (*countp) == INTEGER_CST);
   *countp = wide_int_to_tree (TREE_TYPE (*countp), (TREE_INT_CST_LOW (*countp)
- decrement));
-
 }
 
 static void
 increment_start_addr (gimple *stmt, tree *where, int increment)
 {
+  if (tree lhs = gimple_call_lhs (stmt))
+if (where == gimple_call_arg_ptr (stmt, 0))
+  {
+   gassign *newop = gimple_build_assign (lhs, unshare_expr (*where));
+   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+   gsi_insert_after (, newop, GSI_SAME_STMT);
+   gimple_call_set_lhs (stmt, NULL_TREE);
+   update_stmt (stmt);
+  }
+
   if (TREE_CODE (*where) == SSA_NAME)
 {
   tree tem = make_ssa_name (TREE_TYPE (*where));
   gassign *newop
-= gimple_build_assign (tem, POINTER_PLUS_EXPR, *where,
+   = gimple_build_assign (tem, POINTER_PLUS_EXPR, *where,
   build_int_cst (sizetype, increment));
   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
   gsi_insert_before (, newop, GSI_SAME_STMT);
   *where = tem;
-  update_stmt (gsi_stmt (gsi));
+  update_stmt (stmt);
   return;
 }
 
   *where = build_fold_addr_expr (fold_build2 (MEM_REF, char_type_node,
- *where,
- build_int_cst (ptr_type_node,
-increment)));
+ *where,
+ build_int_cst (ptr_type_node,
+increment)));
 }
 
 /* STMT is builtin call that writes bytes in bitmap ORIG, some bytes are dead
--- gcc/testsuite/gcc.c-torture/execute/pr94130.c.jj2020-03-11 
14:01:49.431180291 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr94130.c   2020-03-11 
14:01:37.654352419 +0100
@@ -0,0 +1,16 @@
+/* PR tree-optimization/94130 */
+
+int
+main ()
+{
+  int a[8];
+  char *b = __builtin_memset (a, 0, sizeof (a));
+  a[0] = 1;
+  a[1] = 2;
+  a[2] = 3;
+  if (b != (char *) a)
+__builtin_abort ();
+  else
+asm volatile ("" : : "g" (a) : "memory");
+  return 0;
+}

Jakub



Re: [PATCH] c++: Fix wrong conversion error with non-viable overload [PR94124]

2020-03-12 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 12, 2020 at 01:58:20AM -0400, Jason Merrill wrote:
> > +  if (reuse && nelts < CONSTRUCTOR_NELTS (new_init))
> > +   {
> > + vec *v = NULL;
> > + if (nelts)
> 
> vec_alloc does nothing if nelts is 0, so this test seems unnecessary. OK
> either way.

I wasn't sure, but now I've verified it doesn't do anything but v = NULL,
which is exactly what we want.  Dropped the " = NULL" and if (nelts)
and committed, thanks.
> 
> > +   vec_alloc (v, nelts);

Jakub