date:20160324

[committed] fix comment for SSA_PROP_NOT_INTERESTING

2016-03-24 Thread Aldy Hernandez

It's not obvious in the comment at the top of the file that 
PROP_NOT_INTERESTING may be simulated again.  Interestingly enough, 
Diego's GCC summit presentation on the propagator has much better docs 
than our internal docs/comments.  We should probably merge a lot from 
there.  Anyways...


Committed as obvious.  I hope nobody cares that I'm improving docs and 
comments :).


Aldy
commit a76f8d620455a7edba07e66d37651f4e17cdeff1
Author: Aldy Hernandez 
Date:   Fri Mar 25 00:50:46 2016 -0500

* tree-ssa-propagate.c: Enhance docs for
SSA_PROP_NOT_INTERESTING.

diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c
index 3277e49..c4535a4 100644
--- a/gcc/tree-ssa-propagate.c
+++ b/gcc/tree-ssa-propagate.c
@@ -55,6 +55,8 @@
 
SSA_PROP_NOT_INTERESTING: Statement S produces nothing of
interest and does not affect any of the work lists.
+   The statement may be simulated again if any of its input
+   operands change in future iterations of the simulator.
 
SSA_PROP_VARYING: The value produced by S cannot be determined
at compile time.  Further simulation of S is not required.

[committed] fix typo in pure attribute documentation

2016-03-24 Thread Aldy Hernandez


Committed as obvious.
commit cdc4f177c26c1949be630634a72a6622250624a8
Author: Aldy Hernandez 
Date:   Thu Mar 24 22:55:07 2016 -0500

* doc/extend.texi: Fix typo in documentation to pure attribute.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 623a5d0..6e27029 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3045,7 +3045,7 @@ int square (int) __attribute__ ((pure));
 says that the hypothetical function @code{square} is safe to call
 fewer times than the program says.
 
-Some of common examples of pure functions are @code{strlen} or @code{memcmp}.
+Some common examples of pure functions are @code{strlen} or @code{memcmp}.
 Interesting non-pure functions are functions with infinite loops or those
 depending on volatile memory or other system resource, that may change between
 two consecutive calls (such as @code{feof} in a multithreading environment).

Re: out of bounds access in insn-automata.c

2016-03-24 Thread Aldy Hernandez


On 03/24/2016 10:02 AM, Alexander Monakov wrote:

Hi,

On Thu, 24 Mar 2016, Bernd Schmidt wrote:

On 03/24/2016 11:17 AM, Aldy Hernandez wrote:

On 03/23/2016 10:25 AM, Bernd Schmidt wrote:

It looks like this block of code is written by a helper function that is
really intended for other purposes than for maximal_insn_latency. Might
be worth changing to
   int insn_code = dfa_insn_code (as_a  (insn));
   gcc_assert (insn_code <= DFA__ADVANCE_CYCLE);


dfa_insn_code_* and friends can return > DFA__ADVANCE_CYCLE so I can't
put that assert on the helper function.


So don't use the helper function? Just emit the block above directly.


Let me chime in :)  The function under scrutiny, maximal_insn_latency, was
added as part of selective scheduling merge; at the same time,
output_default_latencies was factored out of
output_internal_insn_latency_func, and the pair of new functions
output_internal_maximal_insn_latency_func/output_maximal_insn_latency_func
tried to mirror existing pair of
output_internal_insn_latency_func/output_insn_latency_func.

In particular, output_insn_latency_func also invokes
output_internal_insn_code_evaluation (twice, for each argument).  This means
that generated 'insn_latency' can also call 'internal_insn_latency' with
DFA__ADVANCE_CYCLE in arguments.  However, 'internal_insn_latency' then has a
specially emitted 'if' statement that checks if either of the arguments is
' >= DFA__ADVANCE_CYCLE', and returns 0 in that case.

So ultimately pre-existing code was checking ' > DFA__ADVANCE_CYCLE' first and
' >= DFA_ADVANCE_CYCLE' second (for no good reason as far as I can see), and
when the new '_maximal_' functions were introduced, the second check was not
duplicated in the new copy.

So as long we are not looking for hacking it up further, I'd like to clean up
both functions at the same time.  If calling the 'internal_' variants with
DFA__ADVANCE_CYCLE is rare, extending 'default_insn_latencies' by 1 zero
element corresponding to DFA__ADVANCE_CYCLE is a simple suitable fix. If
either DFA__ADVANCE_CYCLE is not guaranteed to be rare, or extending the table
in that style is undesired, I suggest creating a variant of
'output_internal_insn_code_evaluation' that performs a '>=' rather than '>'
test in the first place, and use it in both output_insn_latency_func and
output_maximal_insn_latency_func.  If acknowledged, I volunteer to regstrap on
x86_64 and submit that in stage1.

Thoughts?


If Bernd is fine with this, I'm happy to retract my patch and any 
possible followups.  I'm just interested in having no path causing a 
possible out of bounds access.  If your patch will do that, I'm cool.


Aldy

[committed] Fix bswapdi2 pattern in pa.md

2016-03-24 Thread John David Anglin

This fixes PR target/70319.  We need to us a scratch register to avoid 
clobbering operand 1.

Tested on hppa64-hp-hpux11.11.  Committed to trunk and gcc-5.

Dave
--
John David Anglin   dave.ang...@bell.net


2016-03-24  John David Anglin  

PR target/70319
* config/pa/pa.md (bswapdi2): Use a scratch register.

Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 234427)
+++ config/pa/pa.md (working copy)
@@ -1229,9 +1229,10 @@
 
 (define_insn "bswapdi2"
   [(set (match_operand:DI 0 "register_operand" "=")
-   (bswap:DI (match_operand:DI 1 "register_operand" "+r")))]
+   (bswap:DI (match_operand:DI 1 "register_operand" "r")))
+   (clobber (match_scratch:DI 2 "=r"))]
   "TARGET_64BIT"
-  "permh,3210 %1,%1\;hshl %1,8,%0\;hshr,u %1,8,%1\;or %0,%1,%0"
+  "permh,3210 %1,%2\;hshl %2,8,%0\;hshr,u %2,8,%2\;or %0,%2,%0"
   [(set_attr "type" "multi")
(set_attr "length" "16")])

check-target-libgomp wall time, without vs. with offloading (was: Also test -O0 for OpenACC C, C++ offloading test cases)

2016-03-24 Thread Thomas Schwinge

Hi!

On Wed, 23 Mar 2016 20:02:01 +0100, Jakub Jelinek  wrote:
> On Tue, Mar 22, 2016 at 11:23:43AM +0100, Thomas Schwinge wrote:
> > As discussed in
> > 
> > (and similar to what we're already doing for Fortran, and similar to what
> > recently got committed to libgomp/testsuite/libgomp.hsa.c/c.exp), it has
> > been helpful to also run C, C++ offloading test cases with -O0 in
> > addition to the -O2 default.  Making my earlier gomp-4_0-branch patch
> > conceptually simpler, I came up with the following; OK for trunk?
> 
> How big difference in make check-target-libgomp time is that?
> Without PTX offloading I bet zero, but with PTX offloading configured, is it
> 10% or 50% slower?

15 %.  The major part of the total time is still spent in Fortran
testing...  ;-/

Offloading compilation is slow; I suppose because of having to invoke
several tools (LTO streaming -> mkoffload -> offload compilers,
assemblers, linkers -> combine the resulting images; but I have not done
a detailed analysis on that).  I used the following patch to gather the
following numbers:

Baseline, without offloading:

TIME 1458823399 START [...]/libgomp.c/c.exp
TIME 1458823544 (145) END [...]/libgomp.c/c.exp
TIME 1458823544 START [...]/libgomp.c++/c++.exp
TIME 1458823672 (128) END [...]/libgomp.c++/c++.exp
TIME 1458823672 START [...]/libgomp.fortran/fortran.exp
TIME 1458824080 (408) END [...]/libgomp.fortran/fortran.exp
TIME 1458824080 START [...]/libgomp.graphite/graphite.exp
TIME 1458824083 (3) END [...]/libgomp.graphite/graphite.exp
TIME 1458824083 START [...]/libgomp.hsa.c/c.exp
TIME 1458824083 (0) END [...]/libgomp.hsa.c/c.exp
TIME 1458824084 START [...]/libgomp.oacc-c/c.exp
TIME 1458824109 (25) END [...]/libgomp.oacc-c/c.exp
TIME 1458824109 START [...]/libgomp.oacc-c++/c++.exp
TIME 1458824141 (32) END [...]/libgomp.oacc-c++/c++.exp
TIME 1458824141 START [...]/libgomp.oacc-fortran/fortran.exp
TIME 1458824215 (74) END [...]/libgomp.oacc-fortran/fortran.exp

Total: 680 s (OpenMP) + 130 s (OpenACC) = 810 s.

With OpenMP IntelMIC (emulated) and OpenACC nvptx offloading:

TIME 1458824215 START [...]/libgomp.c/c.exp
TIME 1458824461 (246) END [...]/libgomp.c/c.exp
TIME 1458824461 START [...]/libgomp.c++/c++.exp
TIME 1458824664 (203) END [...]/libgomp.c++/c++.exp
TIME 1458824664 START [...]/libgomp.fortran/fortran.exp
TIME 1458825269 (605) END [...]/libgomp.fortran/fortran.exp
TIME 1458825269 START [...]/libgomp.graphite/graphite.exp
TIME 1458825272 (3) END [...]/libgomp.graphite/graphite.exp
TIME 1458825273 START [...]/libgomp.hsa.c/c.exp
TIME 1458825273 (0) END [...]/libgomp.hsa.c/c.exp
TIME 1458825273 START [...]/libgomp.oacc-c/c.exp
TIME 1458825533 (260) END [...]/libgomp.oacc-c/c.exp
TIME 1458825533 START [...]/libgomp.oacc-c++/c++.exp
TIME 1458825860 (327) END [...]/libgomp.oacc-c++/c++.exp
TIME 1458825860 START [...]/libgomp.oacc-fortran/fortran.exp
TIME 1458826459 (599) END [...]/libgomp.oacc-fortran/fortran.exp

Total: 1050 s (OpenMP; + 54 %) + 1190 s (OpenACC; + 815 %) = 2240 s (+ 177 %).

Patched with "Also test -O0 for OpenACC C, C++ offloading test cases",
that results in the following changes (with offloading only):

TIME 1458834409 START [...]/libgomp.oacc-c/c.exp
TIME 1458834814 (405) END [...]/libgomp.oacc-c/c.exp
TIME 1458834814 START [...]/libgomp.oacc-c++/c++.exp
TIME 1458835338 (524) END [...]/libgomp.oacc-c++/c++.exp

Total: 1050 s (OpenMP) + 1530 s (OpenACC; + 29 %) = 2580 s (+ 15 %).

--- libgomp/testsuite/lib/libgomp-dg.exp
+++ libgomp/testsuite/lib/libgomp-dg.exp
@@ -1,3 +1,18 @@
+rename dg-init dg-init_
+proc dg-init { } {
+dg-init_
+global CLOCK_START
+set CLOCK_START [clock seconds]
+verbose "TIME $CLOCK_START START [info script]" 0
+}
+rename dg-finish dg-finish_
+proc dg-finish { } {
+dg-finish_
+set CLOCK [clock seconds]
+global CLOCK_START
+verbose "TIME $CLOCK ([expr $CLOCK - $CLOCK_START]) END [info script]" 0
+}
+
 proc libgomp-dg-test { prog do_what extra_tool_flags } {
 return [gcc-dg-test-1 libgomp_target_compile $prog $do_what 
$extra_tool_flags]
 }


Grüße
 Thomas

Re: Also test -O0 for OpenACC C, C++ offloading test cases

2016-03-24 Thread Thomas Schwinge

Hi!

On Wed, 23 Mar 2016 19:57:50 +0100, Bernd Schmidt  wrote:
> Ok with [...].

Thanks for the review; committed in r234471:

commit 02662647911b3296b07d7f4e3e3ed0200619da48
Author: tschwinge 
Date:   Thu Mar 24 21:29:55 2016 +

Also test -O0 for OpenACC C, C++ offloading test cases

libgomp/
* testsuite/libgomp.oacc-c++/c++.exp: Set up torture testing, use
gcc-dg-runtest.
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Specify
-fno-builtin-acc_on_device instead of -O0.
* testsuite/libgomp.oacc-c-c++-common/acc-on-device.c: Skip for
-O0.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
Don't specify -O2.
* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@234471 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  | 36 ++
 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 29 +
 .../libgomp.oacc-c-c++-common/acc-on-device-2.c|  5 ++-
 .../libgomp.oacc-c-c++-common/acc-on-device.c  |  3 +-
 .../kernels-alias-ipa-pta-2.c  |  2 +-
 .../kernels-alias-ipa-pta-3.c  |  2 +-
 .../kernels-alias-ipa-pta.c|  2 +-
 .../libgomp.oacc-c-c++-common/loop-auto-1.c|  5 +--
 .../libgomp.oacc-c-c++-common/loop-dim-default.c   |  6 ++--
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-1.c |  5 +--
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-2.c |  5 +--
 .../libgomp.oacc-c-c++-common/loop-gwv-1.c |  5 +--
 .../libgomp.oacc-c-c++-common/loop-red-g-1.c   |  5 +--
 .../libgomp.oacc-c-c++-common/loop-red-gwv-1.c |  5 +--
 .../libgomp.oacc-c-c++-common/loop-red-v-1.c   |  5 +--
 .../libgomp.oacc-c-c++-common/loop-red-v-2.c   |  5 +--
 .../libgomp.oacc-c-c++-common/loop-red-w-1.c   |  5 +--
 .../libgomp.oacc-c-c++-common/loop-red-w-2.c   |  5 +--
 .../testsuite/libgomp.oacc-c-c++-common/loop-v-1.c |  5 +--
 .../testsuite/libgomp.oacc-c-c++-common/loop-w-1.c |  5 +--
 .../libgomp.oacc-c-c++-common/loop-wv-1.c  |  5 +--
 .../libgomp.oacc-c-c++-common/routine-g-1.c|  5 +--
 .../libgomp.oacc-c-c++-common/routine-gwv-1.c  |  5 +--
 .../libgomp.oacc-c-c++-common/routine-v-1.c|  5 +--
 .../libgomp.oacc-c-c++-common/routine-w-1.c|  5 +--
 .../libgomp.oacc-c-c++-common/routine-wv-1.c   |  5 +--
 libgomp/testsuite/libgomp.oacc-c/c.exp | 29 +
 27 files changed, 147 insertions(+), 57 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 5f2c401..e0cd567 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,39 @@
+2016-03-24  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c++/c++.exp: Set up torture testing, use
+   gcc-dg-runtest.
+   * testsuite/libgomp.oacc-c/c.exp: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Specify
+   -fno-builtin-acc_on_device instead of -O0.
+   * testsuite/libgomp.oacc-c-c++-common/acc-on-device.c: Skip for
+   -O0.
+   * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c:
+   Likewise.
+   *

Re: Fix 69650, bogus line numbers from libcpp

2016-03-24 Thread Jeff Law


On 03/24/2016 09:20 AM, Bernd Schmidt wrote:



On 03/23/2016 03:21 PM, Richard Biener wrote:

On Wed, Mar 23, 2016 at 2:15 PM, Bernd Schmidt 
wrote:

On 03/23/2016 01:41 PM, Richard Biener wrote:


Btw, the issue in the PR is also fixed with a simple

Index: libcpp/line-map.c
===
--- libcpp/line-map.c   (revision 234415)
+++ libcpp/line-map.c   (working copy)
@@ -543,7 +543,7 @@ linemap_add (struct line_maps *set, enum
   to_file);

 /* A TO_FILE of NULL is special - we use the natural
values.  */
-  if (error || to_file == NULL)
+  if (to_file == NULL)
  {
to_file = ORDINARY_MAP_FILE_NAME (from);
to_line = SOURCE_LINE (from, from[1].start_location);



I looked at that, but that made it hard to add the testcase as the line
numbers no longer match the dg-error directives. By moving this code
we can
ignore the erroneous #line directive, and for this one testcase at
least,
that makes the line numbers (and caret diagnostics etc.) come out right.


After some more digging and looking at your patch I'd approve that if
it would
emit a warning rather than an error - so can you please adjust it?


Like this? No one has yet approved any better wording for the message,
so given that you said "it's not a regression" I've left it, but I would
now prefer "linemarker ignored due to incorrect nesting".


Bernd

cpp-leave.diff


PR lto/69650
* directives.c (do_linemarker): Test for file left but not entered
here.
* line-map.c (linemap_add): Not here.

PR lto/69650
* gcc.dg/pr69650.c: New test.

OK.

Also OK if you want to fixup the message.

jeff

[PATCH ARM v2] PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc generated by -pg

2016-03-24 Thread Charles Baylis

When compiling with -mlong-calls and -pg, calls to the __gnu_mcount_nc
function are not generated as long calls.

This is the sequel to this patch
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00881.html

This patch fixes the following problems with the previous patch.
. Nested functions now work (thanks to Richard E for spotting this)
. Thumb-1 now works

This patch works by adding new patterns (one for ARM/Thumb-2 and one
for Thumb-1) which are placed in the prologue as a placeholder for
some RTL which describes the address. This is either a SYMBOL_REF for
targets with MOVW/MOVT, or a literal pool reference for other targets.
The implementation of ARM_FUNCTION_PROFILER is changed to search for
this insn so that the the address of the __gnu_mcount_nc function can
be loaded using an appropriate sequence for the target.

I also tried generating the profiling call sequence directly in the
prologue, but this requires some unpleasant hacks to prevent spurious
register pushes from ASM_OUTPUT_REG_PUSH.

Tested with no new regressions on arm-unknown-linux-gnueabihf on QEMU.
The generated code sequences have been inspected for normal and nested
functions on ARM v6, ARM v7, Thumb-1, and Thumb-2 targets.

This does not fix a regression, so I don't expect to apply it for
GCC6, is it OK for when stage 1 re-opens.

gcc/ChangeLog:

2016-03-24  Charles Baylis  

* config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
* config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
(arm_expand_prologue): Likewise.
(thumb1_expand_prologue): Likewise.
(arm_output_long_call_to_profile_func): Likewise.
(arm_emit_long_call_profile): Likewise.
* config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
* config/arm/arm.md (arm_long_call_profile): New pattern.
* config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
define.
* config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
* config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.

gcc/testsuite/ChangeLog:

2016-03-24  Charles Baylis  

* gcc.target/arm/pr69770.c: New test.
From 5a39451f34be9b6ca98b3460bf40d879d6ee61a5 Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Thu, 24 Mar 2016 20:43:25 +
Subject: [PATCH] PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc
 generated by -pg

gcc/ChangeLog:

2016-03-24  Charles Baylis  

* config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
* config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
(arm_expand_prologue): Likewise.
(thumb1_expand_prologue): Likewise.
(arm_output_long_call_to_profile_func): Likewise.
(arm_emit_long_call_profile): Likewise.
* config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
* config/arm/arm.md (arm_long_call_profile): New pattern.
* config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
	define.
* config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
* config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.

gcc/testsuite/ChangeLog:

2016-03-24  Charles Baylis  

* gcc.target/arm/pr69770.c: New test.

Change-Id: I9b8de01fea083f17f729c3801f83174bedb3b0c6

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0083673..324c9f4 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -343,6 +343,7 @@ extern void arm_register_target_pragmas (void);
 extern void arm_cpu_cpp_builtins (struct cpp_reader *);
 
 extern bool arm_is_constant_pool_ref (rtx);
+void arm_emit_long_call_profile ();
 
 /* Flags used to identify the presence of processor capabilities.  */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c868490..040b255 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -21426,6 +21426,22 @@ output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
+static void
+arm_emit_long_call_profile_insn ()
+{
+  rtx sym_ref = gen_rtx_SYMBOL_REF (Pmode, "__gnu_mcount_nc");
+  /* if movt/movw are not available, use a constant pool */
+  if (!arm_arch_thumb2)
+  {
+sym_ref = force_const_mem(Pmode, sym_ref);
+  }
+  rtvec vec = gen_rtvec (1, sym_ref);
+  rtx tmp =
+gen_rtx_UNSPEC_VOLATILE (VOIDmode, vec, VUNSPEC_LONG_CALL_PROFILE);
+  emit_insn (tmp);
+}
+
+
 /* Generate the prologue instructions for entry into an ARM or Thumb-2
function.  */
 void
@@ -21789,6 +21805,10 @@ arm_expand_prologue (void)
   arm_load_pic_register (mask);
 }
 
+  if (crtl->profile && TARGET_LONG_CALLS
+  && ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS)
+arm_emit_long_call_profile_insn ();
+
   /* If we are profiling, make sure no instructions are scheduled before
  the call to

[gomp-nvptx 1/2] libgomp: avoid malloc calls in gomp_nvptx_main

2016-03-24 Thread Alexander Monakov

Avoid calling malloc where it's easy to use stack storage instead: device
malloc is very slow in CUDA.  This cuts about 60-80 microseconds from target
region entry/exit time, slimming down empty target regions from ~95 to ~17
microseconds (as measured on a GTX Titan).

* config/nvptx/target.c (GOMP_teams): Do not call 'free'.
* config/nvptx/team.c (gomp_nvptx_main): Use 'alloca' instead of
'malloc' to obtain storage.  Do not call 'free'.
* team.c (gomp_free_thread) [__nvptx__]: Do not call 'free'.
---
 libgomp/ChangeLog.gomp-nvptx  | 7 +++
 libgomp/config/nvptx/target.c | 1 -
 libgomp/config/nvptx/team.c   | 9 +
 libgomp/team.c| 4 +++-
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index dbf4710..38ea7f7 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -43,7 +43,6 @@ GOMP_teams (unsigned int num_teams, unsigned int thread_limit)
   else if (block_id >= num_teams)
 {
   gomp_free_thread (nvptx_thrs);
-  free (nvptx_thrs);
   asm ("exit;");
 }
   gomp_num_teams_var = num_teams - 1;
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index b9f9f9f..933f5a0 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -29,6 +29,7 @@
 
 #include "libgomp.h"
 #include 
+#include 
 
 struct gomp_thread *nvptx_thrs __attribute__((shared));
 
@@ -46,10 +47,11 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
   /* Starting additional threads is not supported.  */
   gomp_global_icv.dyn_var = true;
 
-  nvptx_thrs = gomp_malloc_cleared (ntids * sizeof (*nvptx_thrs));
+  nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
+  memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
 
-  struct gomp_thread_pool *pool = gomp_malloc (sizeof (*pool));
-  pool->threads = gomp_malloc (ntids * sizeof (*pool->threads));
+  struct gomp_thread_pool *pool = alloca (sizeof (*pool));
+  pool->threads = alloca (ntids * sizeof (*pool->threads));
   for (tid = 0; tid < ntids; tid++)
pool->threads[tid] = nvptx_thrs + tid;
   pool->threads_size = ntids;
@@ -63,7 +65,6 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
   fn (fn_data);
 
   gomp_free_thread (nvptx_thrs);
-  free (nvptx_thrs);
 }
   else
 {
diff --git a/libgomp/team.c b/libgomp/team.c
index 9a43a10..e301345 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -274,10 +274,12 @@ gomp_free_thread (void *arg __attribute__((unused)))
  gomp_mutex_unlock (_managed_threads_lock);
 #endif
}
-  free (pool->threads);
   if (pool->last_team)
free_team (pool->last_team);
+#ifndef __nvptx__
+  free (pool->threads);
   free (pool);
+#endif
   thr->thread_pool = NULL;
 }
   if (thr->ts.level == 0 && __builtin_expect (thr->ts.team != NULL, 0))

[gomp-nvptx 0/2] gomp_nvptx_main tweaks

2016-03-24 Thread Alexander Monakov

I have committed two nvptx libgomp tweaks to amonakov/gomp-nvptx branch, one
to improve efficiency, another to workaround a Maxwell-specific driver bug.

Alexander Monakov (2):
  libgomp: avoid malloc calls in gomp_nvptx_main
  libgomp: avoid triggering a driver bug on sm_50

 libgomp/ChangeLog.gomp-nvptx  | 12 
 libgomp/config/nvptx/target.c |  1 -
 libgomp/config/nvptx/team.c   | 15 ++-
 libgomp/team.c|  4 +++-
 4 files changed, 25 insertions(+), 7 deletions(-)

[gomp-nvptx 2/2] libgomp: avoid triggering a driver bug on sm_50

2016-03-24 Thread Alexander Monakov

Loops lacking exit edges can trigger an NVIDIA driver sm_50 code generation
bug, which manifested as stack pointer (SASS register R1) corruption in this
case. Adjusting source by hand to arrange a cheap exit branch seems to be the
most reasonable workaround.  NVIDIA bug ID 200177879.

* config/nvptx/team.c (gomp_thread_start): Work around NVIDIA driver
bug by adding an exit edge to the loop,
---
 libgomp/ChangeLog.gomp-nvptx | 5 +
 libgomp/config/nvptx/team.c  | 6 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 933f5a0..0291539 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -84,7 +84,7 @@ gomp_thread_start (struct gomp_thread_pool *pool)
   gomp_sem_init (>release, 0);
   thr->thread_pool = pool;
 
-  for (;;)
+  do
 {
   gomp_simple_barrier_wait (>threads_dock);
   if (!thr->fn)
@@ -96,6 +96,10 @@ gomp_thread_start (struct gomp_thread_pool *pool)
   gomp_team_barrier_wait_final (>ts.team->barrier);
   gomp_finish_task (task);
 }
+  /* Work around an NVIDIA driver bug: when generating sm_50 machine code,
+ it can trash stack pointer R1 in loops lacking exit edges.  Add a cheap
+ artificial exit that the driver would not be able to optimize out.  */
+  while (nvptx_thrs);
 }
 
 /* Launch a team.  */

C++ PATCH for c++/70386 (constexpr ICE with -Wall and PMF)

2016-03-24 Thread Jason Merrill

The compiler passes around PMF temporaries as bare CONSTRUCTORs, so we 
need to be prepared for that.


Tested x86_64-pc-linux-gnu, applying to trunk.

commit ebbe164ea708e318be4aa911cc9e98fa333dcd02
Author: Jason Merrill 
Date:   Thu Mar 24 14:52:25 2016 -0400

	PR c++/70386
	* constexpr.c (cxx_eval_bare_aggregate): Handle PMFs.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 2d30a84..8ea7111 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2268,8 +2268,19 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
   vec *v = CONSTRUCTOR_ELTS (t);
   bool changed = false;
   gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (t));
+  tree type = TREE_TYPE (t);
 
-  verify_ctor_sanity (ctx, TREE_TYPE (t));
+  constexpr_ctx new_ctx;
+  if (TYPE_PTRMEMFUNC_P (type))
+{
+  /* We don't really need the ctx->ctor business for a PMF, but it's
+	 simpler to use the same code.  */
+  new_ctx = *ctx;
+  new_ctx.ctor = build_constructor (type, NULL);
+  new_ctx.object = NULL_TREE;
+  ctx = _ctx;
+};
+  verify_ctor_sanity (ctx, type);
   vec **p = _ELTS (ctx->ctor);
   vec_alloc (*p, vec_safe_length (v));
 
@@ -2280,7 +2291,6 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
   FOR_EACH_CONSTRUCTOR_ELT (v, i, index, value)
 {
   tree orig_value = value;
-  constexpr_ctx new_ctx;
   init_subob_ctx (ctx, new_ctx, index, value);
   if (new_ctx.ctor != ctx->ctor)
 	/* If we built a new CONSTRUCTOR, attach it now so that other
@@ -2334,7 +2344,7 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
   CONSTRUCTOR_NO_IMPLICIT_ZERO (t) = false;
   TREE_CONSTANT (t) = constant_p;
   TREE_SIDE_EFFECTS (t) = side_effects_p;
-  if (VECTOR_TYPE_P (TREE_TYPE (t)))
+  if (VECTOR_TYPE_P (type))
 t = fold (t);
   return t;
 }
diff --git a/gcc/testsuite/g++.dg/expr/pmf-2.C b/gcc/testsuite/g++.dg/expr/pmf-2.C
new file mode 100644
index 000..79e36cf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/pmf-2.C
@@ -0,0 +1,18 @@
+// PR c++/70386
+// { dg-options "-Wall" }
+
+struct A { void f () {} };
+struct B : public A {};
+struct C : public A {};
+struct D : public B, public C {};
+
+typedef void (C::*cp) ();
+typedef void (D::*dp) ();
+
+int
+main ()
+{
+  cp c = ::f;
+  dp d = c;
+  return (cp () == d);
+}

Re: [AArch64] Emit square root using the Newton series

2016-03-24 Thread Evandro Menezes


On 03/17/16 17:46, Evandro Menezes wrote:
This patch refactors the function to emit the reciprocal square root 
approximation to also emit the square root approximation.


   2016-03-23  Evandro Menezes 
Wilco Dijkstra  

   gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros.
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_rsqrt): Replace with
   "aarch64_emit_approx_sqrt".
(AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro.
* config/aarch64/aarch64.c
(exynosm1_tunings): Use the new macro.
(aarch64_emit_approx_sqrt): Define new function.
(aarch64_override_options_after_change_1): Handle new option.
* config/aarch64/aarch64.md
(rsqrt2): Use new function instead.
(sqrt2): New expansion and insn definitions.
* config/aarch64/aarch64-simd.md: Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-sqrt): Add new option description.
* doc/invoke.texi (mlow-precision-sqrt): Likewise.

This version of the patch cleans up the changes to the MD files and 
fixes some bugs introduced in it since the first proposal.


Thanks for your feedback,

--
Evandro Menezes

>From 712e330bf651393bb788e85ebe7b3d9a37f54ae7 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 17 Mar 2016 17:39:55 -0500
Subject: [PATCH] [AArch64] Emit square root using the Newton series

2016-03-23  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros.
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_rsqrt): Replace with "aarch64_emit_approx_sqrt".
	(AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro.
	* config/aarch64/aarch64.c
	(exynosm1_tunings): Use the new macro.
	(aarch64_emit_approx_sqrt): Define new function.
	(aarch64_override_options_after_change_1): Handle new option.
	* config/aarch64/aarch64.md
	(rsqrt2): Use new function instead.
	(sqrt2): New expansion and insn definitions.
	* config/aarch64/aarch64-simd.md: Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-sqrt): Add new option description.
	* doc/invoke.texi (mlow-precision-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |   5 +-
 gcc/config/aarch64/aarch64-simd.md  |  13 ++-
 gcc/config/aarch64/aarch64-tuning-flags.def |   3 +-
 gcc/config/aarch64/aarch64.c| 129 ++--
 gcc/config/aarch64/aarch64.md   |  11 ++-
 gcc/config/aarch64/aarch64.opt  |   9 +-
 gcc/doc/invoke.texi |  10 +++
 7 files changed, 147 insertions(+), 33 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index dced209..24c2125 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags
 };
 #undef AARCH64_EXTRA_TUNING_OPTION
 
+#define AARCH64_EXTRA_TUNE_APPROX_SQRT \
+  (AARCH64_EXTRA_TUNE_APPROX_SQRT_DF | AARCH64_EXTRA_TUNE_APPROX_SQRT_SF)
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -361,7 +364,7 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_emit_approx_rsqrt (rtx, rtx);
+bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..47ccb18 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -405,7 +405,7 @@
 		 UNSPEC_RSQRT))]
   "TARGET_SIMD"
 {
-  aarch64_emit_approx_rsqrt (operands[0], operands[1]);
+  aarch64_emit_approx_sqrt (operands[0], operands[1], true);
   DONE;
 })
 
@@ -4307,7 +4307,16 @@
 
 ;; sqrt
 
-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
+DONE;
+})
+
+(define_insn "*sqrt2"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
 (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
   "TARGET_SIMD"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..725a79c 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,5 @@
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs",

Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts

2016-03-24 Thread Martin Jambor

Hi,

On Mon, Mar 21, 2016 at 06:22:17PM +0800, Chung-Lin Tang wrote:
> Hi Martin, I think you're the one to CC for this,
> as I mentioned in the first email, this has been build tested, however I did
> not know if I could test this without a Radeon card.  If convenient,
> could you or anyone familiar with the setup do a make check-target-libgomp
> with this patch series?
> 
> Thanks,
> Chung-Lin
> 
> 
> * plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
> to 'hsa_error_msg', for clarity.
> (hsa_fatal): Likewise.
> (hsa_error): New function.
> (init_hsa_context): Change return type to bool, adjust to return
> false on error.
> (queue_callback): Adjust to call hsa_error.
> (GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
> return value.
> (GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
> return false on error.
> (get_agent_info): Adjust to return NULL on error.
> (destroy_hsa_program): Change return type to bool, adjust to
> return false on error.
> (GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
> (destroy_module): Change return type to bool, adjust to
> return false on error.
> (GOMP_OFFLOAD_unload_image): Likewise.
> (GOMP_OFFLOAD_fini_device): Likewise.
> (GOMP_OFFLOAD_alloc): Change to return NULL when called.
> (GOMP_OFFLOAD_free): Change to return false when called.
> (GOMP_OFFLOAD_dev2host): Likewise.
> (GOMP_OFFLOAD_host2dev): Likewise.
> (GOMP_OFFLOAD_dev2dev): Likewise.

On the whole, I am fine with the patch but there are two issues:

First, and generally, when you change the return type of a function,
you must document what return values mean in the comment of the
function.  Most importantly, it must be immediately apparent whether a
function returns true or false on failure from its comment.  So please
fix that.

Second...

> Index: libgomp/plugin/plugin-hsa.c
> ===
> --- libgomp/plugin/plugin-hsa.c   (revision 234358)
> +++ libgomp/plugin/plugin-hsa.c   (working copy)
> @@ -175,10 +175,10 @@ hsa_warn (const char *str, hsa_status_t status)
>if (!debug)
>  return;
>  
> -  const char *hsa_error;
> -  hsa_status_string (status, _error);
> +  const char *hsa_error_msg;
> +  hsa_status_string (status, _error_msg);
>  
> -  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error);
> +  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, 
> hsa_error_msg);
>  }
>  
>  /* Report a fatal error STR together with the HSA error corresponding to 
> STATUS
> @@ -187,12 +187,25 @@ hsa_warn (const char *str, hsa_status_t status)
>  static void
>  hsa_fatal (const char *str, hsa_status_t status)
>  {
> -  const char *hsa_error;
> -  hsa_status_string (status, _error);
> +  const char *hsa_error_msg;
> +  hsa_status_string (status, _error_msg);
>GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
> -  hsa_error);
> +  hsa_error_msg);
>  }
>  
> +/* Like hsa_fatal, except only report error message, and return FALSE
> +   for propagating error processing to outside of plugin.  */
> +
> +static bool
> +hsa_error (const char *str, hsa_status_t status)
> +{
> +  const char *hsa_error_msg;
> +  hsa_status_string (status, _error_msg);
> +  GOMP_PLUGIN_error ("HSA fatal error: %s\nRuntime message: %s", str,
> +  hsa_error_msg);
> +  return false;
> +}
> +
>  struct hsa_kernel_description
>  {
>const char *name;

...

>  /* Callback of dispatch queues to report errors.  */
> @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status,
>   hsa_queue_t *queue __attribute__ ((unused)),
>   void *data __attribute__ ((unused)))
>  {
> -  hsa_fatal ("Asynchronous queue error", status);
> +  hsa_error ("Asynchronous queue error", status);
>  }

...I believe this hunk is wrong.  Errors reported in this way mean
that something is very wrong and generally happen during execution of
code on HSA GPU, i.e. within GOMP_OFFLOAD_run.  And since you left
calls in create_single_kernel_dispatch, which is called as a part of
GOMP_OFFLOAD_run, intact, I believe you actually want to leave
hsa_fatel here too.

Thanks,

Martin

Re: [PATCH 1/4, libgomp] Resolve deadlock on plugin exit

2016-03-24 Thread Martin Jambor

Hi,

On Mon, Mar 21, 2016 at 06:21:02PM +0800, Chung-Lin Tang wrote:
> Hi, this is the set of patches from 
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01411.html
> revised again, this time also with audits for the HSA plugin.
> 
> The changes are pretty minor, mainly that the unload_image hook now
> receives similar error handling treatment.
> 
> Tested again without regressions for nvptx and intelmic, however
> while I was able to build the toolchain with HSA offloading support, I was
> unsure how I could test it, as I currently don't have any AMD hardware (not
> aware if there's an emulator like intelmic).  I would be grateful if
> the HSA folks can run them for me.

I have just tested the whole patch-set on my HSA box (i.e. gomp.exp
tests and all libgomp tests on trunk + some extra testing on the hsa
branch) and found no issues.

I have had a very superficial look over the patch and have no
objections but since I am not familiar with the issue this addresses
and because I do not have detailed understanding of the of internals
of copying data to/from devices, my opinion should not really count
much.

Nevertheless, thanks for thinking about HSA and making me aware of the
change,

Martin


> 
> Thanks,
> Chung-Lin
> 
> ChangeLog for the libgomp proper parts, patch as attached.
> 
> 2016-03-20  Chung-Lin Tang  
> 
> * target.c (gomp_device_copy): New function.
> (gomp_copy_host2dev): Likewise.
> (gomp_copy_dev2host): Likewise.
> (gomp_free_device_memory): Likewise.
> (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev().
> (gomp_map_pointer): Likewise.
> (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle
> NULL value from alloc_func plugin hook.
> (gomp_unmap_tgt): Adjust to call gomp_free_device_memory().
> (gomp_copy_from_async): Adjust to call gomp_copy_dev2host().
> (gomp_unmap_vars): Likewise.
> (gomp_update): Adjust to call gomp_copy_dev2host() and
> gomp_copy_host2dev() functions.
> (gomp_unload_image_from_device): Handle false value from
> unload_image_func plugin hook.
> (gomp_init_device): Handle false value from init_device_func
> plugin hook.
> (gomp_exit_data): Adjust to call gomp_copy_dev2host().
> (omp_target_free): Adjust to call gomp_free_device_memory().
> (omp_target_memcpy): Handle return values from host2dev_func,
> dev2host_func, and dev2dev_func plugin hooks.
> (omp_target_memcpy_rect_worker): Likewise.
> (gomp_target_fini): Handle false value from fini_device_func
> plugin hook.
> * libgomp.h (struct gomp_device_descr): Adjust return type of
> init_device_func, fini_device_func, unload_image_func, free_func,
> dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
> * oacc-host.c (host_init_device): Change return type to bool.
> (host_fini_device): Likewise.
> (host_unload_image): Likewise.
> (host_free): Likewise.
> (host_dev2host): Likewise.
> (host_host2dev): Likewise.
> * oacc-mem.c (acc_free): Handle plugin hook fatal error case.
> (acc_memcpy_to_device): Likewise.
> (acc_memcpy_from_device): Likewise.
> (delete_copyout): Add libfnname parameter, handle free_func
> hook fatal error case.
> (acc_delete): Adjust delete_copyout call.
> (acc_copyout): Likewise.
> (update_dev_host): Move gomp_mutex_unlock to after
> host2dev/dev2host hook calls.
>

Re: [patch] libstdc++/69945 Add __gnu_cxx::__freeres hook

2016-03-24 Thread Jonathan Wakely

On 16/03/16 16:29 +0100, Mark Wielaard wrote:

On Thu, 2016-03-03 at 16:34 +0100, Mark Wielaard wrote:

On Wed, 2016-02-24 at 18:35 +, Jonathan Wakely wrote:
> This adds a new function to libsupc++ which will free the memory still
> in use by the pool used for allocating exceptions when malloc fails.
>
> This is similar to glibc's __libc_freeres, which valgrind (and other
> tools?) use to tell glibc to deallocate everything before exiting.
>
> I initially called it __gnu_cxx::__free_eh_pool() but I figured we
> might have other memory in use at some later date, and we wouldn't
> want valgrind to have to start calling a second function, nor make a
> function called __free_eh_pool() actually free other things.

I tested this on x86_64-pc-linux-gnu with Ivo's valgrind patch from
https://bugs.kde.org/show_bug.cgi?id=345307 and it works pretty nicely.
No more spurious still reachable memory issues with memcheck.

Is there any possibility to get this backported for 5.4?

If there is anything I can do to help move this patch forward, please
let me know.

Sorry for stalling, other things distracted me. It's committed to
trunk now though.

Re: [C++ PATCH] Diagnose constexpr overflow (PR c++/70323)

2016-03-24 Thread Jason Merrill


On 03/23/2016 02:34 PM, Jason Merrill wrote:

For GCC 7 we should do constexpr evaluation on the unfolded function,
but for GCC 6 this is OK.


And here's a fix for the -Wall case.

Tested x86_64-pc-linux-gnu, applying to trunk.


commit 75f153ad9c455c7f2340b6da6791e5a9a0787a8e
Author: Jason Merrill 
Date:   Thu Mar 24 12:53:32 2016 -0400

	PR c++/70323
	* constexpr.c (cxx_eval_call_expression): Don't cache result if
	*overflow_p.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 8427513..2d30a84 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1448,7 +1448,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 
   if (result == error_mark_node)
 	*non_constant_p = true;
-  if (*non_constant_p)
+  if (*non_constant_p || *overflow_p)
 	result = error_mark_node;
   else if (!result)
 	result = void_node;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C
new file mode 100644
index 000..d166787
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C
@@ -0,0 +1,11 @@
+// PR c++/70323
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wall" }
+
+constexpr int overflow_if_0 (int i) { return __INT_MAX__ + !i; }
+constexpr int overflow_if_1 (int i) { return __INT_MAX__ + i; }
+
+constexpr bool i0_0 = overflow_if_0 (0);   // { dg-error "overflow in constant expression" }
+constexpr bool i0_1 = overflow_if_0 (1);
+constexpr bool i1_0 = overflow_if_1 (0);
+constexpr bool i1_1 = overflow_if_1 (1);   // { dg-error "overflow in constant expression" }

Re: [PATCH] Fix 69845

2016-03-24 Thread Jeff Law


On 03/22/2016 11:40 AM, Richard Henderson wrote:

In PR68142 you added a check for overflow + __INT_MIN__.
I can't figure out why the check for __INT_MIN__, except
that it seems specific to the test case you examined.

And indeed, this test case shows how things go wrong
with other distributed folding leading to overflow.

I added two tests, one signed, one unsigned.  The second
verifies that we do still fold for the defined-overflow case.

Ok?

Richi ack'd.  I went ahead and committed this to the trunk.

jeff

Re: [DOC Patch] Add sample for @cc constraint

2016-03-24 Thread Sandra Loosemore


On 03/24/2016 09:00 AM, Bernd Schmidt wrote:

In principle we probably should have an example, but once again I have
some problems with the style of the added documentation. I prefer
concise writing without unnecessary repetition. Any other reviewers can
of course override me, but the following is my opinion on these changes.

More problematic than a lack of documentation is that I haven't been
able to find an executable testcase. If you could adapt your example for
use in gcc.target/i386, that would be even more important.


FAOD, I've been keeping my mouth shut on this patch because I am not at 
all familiar with low-level x86 features, the example makes little sense 
to me, and I can't make any useful suggestions of my own about how to 
improve this section of the documentation.  :-(  Generally, though, I 
agree with Bernd's preference for conciseness and not wandering off into 
side discussions or repetition of material already covered elsewhere.


-Sandra

[PATCH GCC]Reduce compilation time for IVOPT by skipping cost computation in use group

2016-03-24 Thread Bin Cheng

Hi,
Quite lot of time is used when IVOPT computes cost for  pairs.  As a 
matter of fact, some pairs are very similar to each other, and we can abstract 
and compute cost only once for these pairs.  This is a patch doing so, the idea 
is skipping cost computation for sub-uses in each group, of course it may 
result in different assembly code for some complicated cases because it 
estimates cost rather than doing real computation.  I did double check one of 
such case that the change in generated assembly is not degeneration.  For an 
IVOPT heavy program (spec2k/173), this patch reduces IVOPT's compilation time 
by 7~8%, as well as the memory consumption on my developing machine.

Bootstrap & test on x86_64.

For spec2k6 data on x86_64.  Maybe because I ran spec2k6 compiled with patched 
GCC in unclean environment, some cases are regressed by small amount (< %1).  I 
manually compared assembly code for several cases, including ones with the 
largest regression (still within <1%).  I could confirm that generated assembly 
code is exact the same as unpatched GCC, except for function 
emit_library_call_value_1 in 403.gcc/calls.c.

In this case, difference of IVOPT dumps is as below:

$ diff -y trunk/calls.c.154t.ivopts patch/calls.c.154t.ivopts 

  ::
  # val_21 = PHI   # val_21 = 
PHI 
  _811 = (void *) ivtmp.322_829;  _811 = (void 
*) ivtmp.322_829;
  MEM[base: _811, offset: -48B] = val_21; |   MEM[base: 
_811, offset: -32B] = val_21;
  _810 = (void *) ivtmp.322_829;  _810 = (void 
*) ivtmp.322_829;
  MEM[base: _810, offset: -40B] = mode_163;   |   MEM[base: 
_810, offset: -24B] = mode_163;
  _182 = function_arg (_so_far, mode_163, 0B, 1);_182 = 
function_arg (_so_far, mode_163, 0B, 1);
  _809 = (void *) ivtmp.322_829;  _809 = (void 
*) ivtmp.322_829;
  MEM[base: _809, offset: -32B] = _182;   |   MEM[base: 
_809, offset: -16B] = _182;
  _807 = (void *) ivtmp.322_829;  _807 = (void 
*) ivtmp.322_829;
  MEM[base: _807, offset: -24B] = 0;  |   MEM[base: 
_807, offset: -8B] = 0;
  _185 = (struct args_size *) ivtmp.322_829;  |   _801 = 
ivtmp.322_829 + 16;
  _801 = ivtmp.322_829 + 18446744073709551600;<
  _800 = (struct args_size *) _801;   _800 = 
(struct args_size *) _801;
  _186 = _800;|   _185 = _800;
  >   _186 = 
(struct args_size *) ivtmp.322_829;
  _187 = _182 != 0B;  _187 = _182 
!= 0B;
  _188 = (int) _187;  _188 = (int) 
_187;
  locate_and_pad_parm (mode_163, 0B, _188, 0B, _size, _1 
locate_and_pad_parm (mode_163, 0B, _188, 0B, _size, _1
  _802 = (void *) ivtmp.322_829;  _802 = (void 
*) ivtmp.322_829;
  _190 = MEM[base: _802, offset: 8B]; |   _190 = 
MEM[base: _802, offset: 24B];
  if (_190 != 0B) if (_190 != 
0B)
goto ;   goto ;
  elseelse
goto ;   goto ;

  ::
  fancy_abort ("calls.c", 3724, &__FUNCTION__);   fancy_abort 
("calls.c", 3724, &__FUNCTION__);

It's only an offset difference in IV.  And below is difference of generated 
assembly:
$ diff -y trunk/calls.S patch/calls.S 
.L489:  .L489:
leaq-80(%rbp), %rdi leaq
-80(%rbp), %rdi
xorl%edx, %edx  xorl
%edx, %edx
movl$1, %ecxmovl
$1, %ecx
movl%r13d, %esi movl
%r13d, %esi
movq%rax, -48(%r15)   <
movl%r13d, -40(%r15)  <
callfunction_arg  <
movl$0, -24(%r15) <
movq%rax, -32(%r15) movq
%rax, -32(%r15)
  > movl
%r13d, -24(%r15)
  > call
function_arg
xorl%edx, %edx  xorl
%edx, %edx
pushq

Re: [PR70366] fix chromium build failure with LTO due to segfault in inline_call

2016-03-24 Thread Jeff Law


On 03/24/2016 08:43 AM, Prathamesh Kulkarni wrote:

Hi,
The following fix suggested by Richard fixes chromium build failing
due to segfault in inline_call.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
Cross-tested on arm*-*-* and aarch64*-*-*.
Ok for trunk ?

OK.
jeff

[PATCH] Remove incorrect warning for parallel firstprivate clause

2016-03-24 Thread Tom de Vries


Hi,

This patch fixes an incorrect warning for the oacc firstprivate clause.

Consider this test-case:
...
void
foo (void)
{
  int i;

#pragma acc parallel
  {
i = 1;
  }
}
...


When compiling with -fopenacc -Wuninitialized, we get an 'is used 
uninitialized' warning for variable 'i', which is confusing given that 
'i' is not used, but only set in the parallel region.


The warning occurs because there's an implicit firstprivate(i) clause on 
the parallel region, and that firstprivate clause generates a read of i 
before the region, and a write to i in the region.


The patch silences the warning by marking the variable in the 
firstprivate clause with TREE_NO_WARNING.


Build and reg-tested with goacc.exp, gomp.exp and target-libgomp.

OK for trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom
Remove incorrect warning for parallel firstprivate clause

2016-03-24  Tom de Vries  

	* omp-low.c (lower_omp_target): Set TREE_NO_WARNING for oacc
	firstprivate clause.

	* c-c++-common/goacc/uninit-firstprivate-clause.c: New test.
	* gfortran.dg/goacc/uninit-firstprivate-clause.f95: New test.

---
 gcc/omp-low.c  |  5 -
 .../goacc/uninit-firstprivate-clause.c | 25 ++
 .../goacc/uninit-firstprivate-clause.f95   | 18 
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d107961..41eb3c8 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -16068,7 +16068,10 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		  {
 		gcc_assert (is_gimple_omp_oacc (ctx->stmt));
 		if (!is_reference (var))
-		  var = build_fold_addr_expr (var);
+		  {
+			TREE_NO_WARNING (var) = 1;
+			var = build_fold_addr_expr (var);
+		  }
 		else
 		  talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
 		gimplify_assign (x, var, );
diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-firstprivate-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-firstprivate-clause.c
new file mode 100644
index 000..3d3a03e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-firstprivate-clause.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wuninitialized" } */
+
+void
+foo (void)
+{
+  int i;
+
+#pragma acc parallel
+  {
+i = 1;
+  }
+}
+
+
+void
+foo2 (void)
+{
+  int i;
+
+#pragma acc parallel firstprivate (i)
+  {
+i = 1;
+  }
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-firstprivate-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-firstprivate-clause.f95
new file mode 100644
index 000..c18765b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-firstprivate-clause.f95
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-additional-options "-Wuninitialized" }
+
+subroutine test
+  INTEGER :: i
+
+  !$acc parallel
+  i = 1
+  !$acc end parallel
+end subroutine test
+
+subroutine test2
+  INTEGER :: i
+
+  !$acc parallel firstprivate (i)
+  i = 1
+  !$acc end parallel
+end subroutine test2

[PATCH] Remove incorrect warning for kernels copy clause

2016-03-24 Thread Tom de Vries


Hi,

This patch fixes an incorrect warning for the oacc copy clause.

Consider this test-case:
...
void
foo (void)
{
  int i;

#pragma acc kernels
  {
i = 1;
  }
}
...


When compiling with -fopenacc -Wuninitialized, we get an 'is used 
uninitialized' warning for variable 'i', which is confusing given that 
'i' is not used, but only set in the kernels region.


The warning occurs because there's an implicit copy(i) clause on the 
kernels region, and that copy generates a read of i before the region, 
and a write to i in region.


The patch silences the warning by marking the variable in the copy 
clause with TREE_NO_WARNING.


Build and reg-tested with goacc.exp, gomp.exp and target-libgomp.

OK for trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom
Remove incorrect warning for kernels copy clause

2016-03-24  Tom de Vries  

	* omp-low.c (lower_omp_target): Set TREE_NO_WARNING for oacc copy
	clause.

	* c-c++-common/goacc/uninit-copy-clause.c: New test.
	* gfortran.dg/goacc/uninit-copy-clause.f95: New test.

---
 gcc/omp-low.c  |  6 +++-
 .../c-c++-common/goacc/uninit-copy-clause.c| 38 ++
 .../gfortran.dg/goacc/uninit-copy-clause.f95   | 29 +
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3fd6eb3..d107961 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -16083,7 +16083,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			|| map_kind == GOMP_MAP_POINTER
 			|| map_kind == GOMP_MAP_TO_PSET
 			|| map_kind == GOMP_MAP_FORCE_DEVICEPTR)
-		  gimplify_assign (avar, var, );
+		  {
+			if (is_gimple_omp_oacc (ctx->stmt))
+			  TREE_NO_WARNING (var) = 1;
+			gimplify_assign (avar, var, );
+		  }
 		avar = build_fold_addr_expr (avar);
 		gimplify_assign (x, avar, );
 		if ((GOMP_MAP_COPY_FROM_P (map_kind)
diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
new file mode 100644
index 000..b3cc445
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-copy-clause.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wuninitialized" } */
+
+void
+foo (void)
+{
+  int i;
+
+#pragma acc kernels
+  {
+i = 1;
+  }
+
+}
+
+void
+foo2 (void)
+{
+  int i;
+
+#pragma acc kernels copy (i)
+  {
+i = 1;
+  }
+
+}
+
+void
+foo3 (void)
+{
+  int i;
+
+#pragma acc kernels copyin(i)
+  {
+i = 1;
+  }
+
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
new file mode 100644
index 000..b2aae1d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-copy-clause.f95
@@ -0,0 +1,29 @@
+! { dg-do compile }
+! { dg-additional-options "-Wuninitialized" }
+
+subroutine foo
+  integer :: i
+
+  !$acc kernels
+  i = 1
+  !$acc end kernels
+
+end subroutine foo
+
+subroutine foo2
+  integer :: i
+
+  !$acc kernels copy (i)
+  i = 1
+  !$acc end kernels
+
+end subroutine foo2
+
+subroutine foo3
+  integer :: i
+
+  !$acc kernels copyin (i)
+  i = 1
+  !$acc end kernels
+
+end subroutine foo3

Re: [gomp4] Some additional OpenACC reduction tests

2016-03-24 Thread Thomas Schwinge

Hi!

On Wed, 29 Jul 2015 18:23:12 +0100, Julian Brown  
wrote:
> This is a set of 19 new tests for OpenACC reductions, covering several
> ways of performing reductions over the parallel and loop directives
> using gang or worker/vector level parallelism.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
> @@ -0,0 +1,40 @@
> +#include 
> +
> +/* Test of reduction on both parallel and loop directives (workers and 
> vectors
> +   in gang-partitioned mode, int type with XOR).  */
> +
> +int
> +main (int argc, char *argv[])
> +{
> +  int i, j, arr[32768], res = 0, hres = 0;
> +
> +  for (i = 0; i < 32768; i++)
> +arr[i] = i;
> +
> +  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
> +reduction(^:res)
> +  {
> +#pragma acc loop gang
> +for (j = 0; j < 32; j++)
> +  {
> + #pragma acc loop worker vector reduction(^:res)
> + for (i = 0; i < 1024; i++)
> +   res ^= arr[j * 1024 + i];
> +
> + #pragma acc loop worker vector reduction(^:res)
> + for (i = 0; i < 1024; i++)
> +   res ^= arr[j * 1024 + (1023 - i)];
> +  }
> +  }
> +
> +  for (j = 0; j < 32; j++)
> +for (i = 0; i < 1024; i++)
> +  {
> +hres ^= arr[j * 1024 + i];
> + hres ^= arr[j * 1024 + (1023 - i)];
> +  }
> +
> +  assert (res == hres);
> +
> +  return 0;
> +}

Given the interpretation of the OpenACC specification that the current
implementation of OpenACC reductions in GCC is base upon (which we're
currently re-visiting), it had been neccessary to add data clauses next
to parallel constructs' reduction clauses -- but not for this test case.
I now found why; it just happend to ;-) always pass, because apparently
the two XOR loops' iterations just cancelled their values, so in the end,
we'd always get an "unremarkable" result of zero for both res and hres.
In gomp-4_0-branch r234461, I have now committed the following:

commit 8fff8ae7117c21d6b4a701a63cdd4634950418d1
Author: tschwinge 
Date:   Thu Mar 24 16:54:55 2016 +

Improve libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c

libgomp/
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
Make failure observable.  Add data clause next to parallel
construct's reduction clause.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@234461 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp  | 6 ++
 .../testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c | 6 +++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 53ae315..b10ae94 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2016-03-24  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
+   Make failure observable.  Add data clause next to parallel
+   construct's reduction clause.
+
 2016-03-11  Cesar Philippidis  
 
* testsuite/libgomp.oacc-c-c++-common/vprop.c: New test.
diff --git 
libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
index a7a75a9..5e4590f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
@@ -12,14 +12,14 @@ main (int argc, char *argv[])
 arr[i] = i;
 
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-  reduction(^:res)
+reduction(^:res) copy(res)
   {
 #pragma acc loop gang
 for (j = 0; j < 32; j++)
   {
#pragma acc loop worker vector reduction(^:res)
for (i = 0; i < 1024; i++)
- res ^= arr[j * 1024 + i];
+ res ^= 3 * arr[j * 1024 + i];
 
#pragma acc loop worker vector reduction(^:res)
for (i = 0; i < 1024; i++)
@@ -30,7 +30,7 @@ main (int argc, char *argv[])
   for (j = 0; j < 32; j++)
 for (i = 0; i < 1024; i++)
   {
-hres ^= arr[j * 1024 + i];
+   hres ^= 3 * arr[j * 1024 + i];
hres ^= arr[j * 1024 + (1023 - i)];
   }
 


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: rs6000 stack_tie mishap again

2016-03-24 Thread Jeff Law


On 03/24/2016 02:17 AM, Olivier Hainque wrote:

[snip]

So, aside from the dependency issue which needs to be fixed somehow, I
think it would make sense to consider using a strong blockage mecanism in
expand_epilogue.


That's what we both said here
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html

and David agreed too
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html

but if you can have the alias analysis changes accepted that would be
even better.


I'd really like to come to a resolution we're confident is robust,
because these are really very nasty bugs.
The robust solution is to have a scheduling barrier just before the 
point where the stack is deallocated.


The alternative some folks have suggested would be for the generic parts 
of the compiler to add the scheduling barrier before the stack pointer 
adjustment.  I wouldn't object to that.



Jeff

Re: [RS6000, PATCH] PR70052, ICE compiling _Decimal128 test case

2016-03-24 Thread David Edelsohn

On Thu, Mar 24, 2016 at 7:01 AM, Alan Modra  wrote:
> This fixes the PR70052 ICE by modifying easy_fp_constant to correctly
> return false for decimal floating point zero.  0.0D is not an all-zero
> bit pattern, at least, not the canonical form.
>
> I've also taken on Mike's suggestion of using a mode dependent
> constraint for insns that currently use "j".  Note that
> "easy_fp_constant" is already part of "input_operand" so in the usual
> case we ought to be prevented from generating 0.0D immediate
> constants.  However, in the past I've seen reload do some nasty tricks
> when pseudos don't get hard regs, and believe that a pseudo that is
> known to be equal to 0.0D may have the constant substituted with only
> constraints being checked, not the operand predicates.  So either the
> "j" constraint needs fixing to reject decimal float (as I had in my
> original patch) or not used with decimal float (Mike's approach).
> I left in a small tidy for "j" from my original patch.
>
> Bootstrapped and regression tested powerpc64le-linux.  OK to apply?
>
> gcc/
> PR target/70052
> * config/rs6000/constraints.md (j): Simplify.
> * config/rs6000/predicates.md (easy_fp_constant): Exclude
> decimal float 0.D.
> * config/rs6000/rs6000.md (zero_fp): New mode_attr.  Use in place
> of "j" in all constraints.

The patch did not convert all "j" constraints, so the ChangeLog needs
to be a little clearer to explain which alternatives required the
change.

> (movtd_64bit_nodm): Delete "j" constraint alternative.
> gcc/testsuite/
> * gcc.dg/dfp/pr70052.c: New.

Okay with that clarification.

Thanks, David

[testsuite, committed] Add goacc/uninit-use-device-clause.{c,f95}

2016-03-24 Thread Tom de Vries


Hi,

I've committed attached patch, which adds testcases that test the 
-Wuninitialized warning for the use-device-clause on the openacc 
directive host_data.


I found an issue with compilation of the host_data directive on C/C++, 
which I've filed as PR70388 - '[openacc] host_data use_device triggers 
error mentioning use_device_ptr'.


I've added xfails related to that PR.

Thanks,
- Tom
Add goacc/uninit-use-device-clause.{c,f95}

2016-03-24  Tom de Vries  

	* c-c++-common/goacc/uninit-use-device-clause.c: New test.
	* gfortran.dg/goacc/uninit-use-device-clause.f95: New test.

---
 .../c-c++-common/goacc/uninit-use-device-clause.c  | 14 ++
 .../gfortran.dg/goacc/uninit-use-device-clause.f95 | 10 ++
 2 files changed, 24 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-use-device-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-use-device-clause.c
new file mode 100644
index 000..c5d327c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-use-device-clause.c
@@ -0,0 +1,14 @@
+/* Test fails due to PR70388.  */
+/* { dg-do compile } */
+/* { dg-excess-errors "PR70388" { xfail *-*-* } } */
+/* { dg-additional-options "-Wuninitialized" } */
+
+void
+foo (void)
+{
+  int i;
+
+#pragma acc host_data use_device(i) /* { dg-warning "is used uninitialized in this function" "" { xfail *-*-* } } */
+  {
+  }
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-use-device-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-use-device-clause.f95
new file mode 100644
index 000..873eea7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-use-device-clause.f95
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-additional-options "-Wuninitialized" }
+
+subroutine test
+  integer :: i
+
+  !$acc host_data use_device(i) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end host_data
+end subroutine test
+

[testsuite, committed] Add goacc/uninit-dim-clause.{c,f95}

2016-03-24 Thread Tom de Vries


Hi,

I've committed attached patch, which adds testcases that test the 
-Wuninitialized warning for the num_gangs, num_workers, vector_length 
clauses on the openacc directive parallel.


Thanks,
- Tom
Add goacc/uninit-dim-clause.{c,f95}

2016-03-24  Tom de Vries  

	* c-c++-common/goacc/uninit-dim-clause.c: New test.
	* gfortran.dg/goacc/uninit-dim-clause.f95: New test.

---
 gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c  | 19 +++
 gcc/testsuite/gfortran.dg/goacc/uninit-dim-clause.f95 | 17 +
 2 files changed, 36 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
new file mode 100644
index 000..0a006e3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wuninitialized" } */
+
+#include 
+
+int
+main (void)
+{
+  int i, j, k;
+
+  #pragma acc parallel num_gangs(i) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+
+  #pragma acc parallel num_workers(j) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+
+  #pragma acc parallel vector_length(k) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-dim-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-dim-clause.f95
new file mode 100644
index 000..b87d26f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-dim-clause.f95
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-additional-options "-Wuninitialized" }
+
+program test
+  implicit none
+  integer :: i, j, k
+
+  !$acc parallel num_gangs(i) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end parallel
+
+  !$acc parallel num_workers(j) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end parallel
+
+  !$acc parallel vector_length(k) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end parallel
+
+end program test

[testsuite, committed] Add missing initialization in goacc/host_data-tree.f95

2016-03-24 Thread Tom de Vries


Hi,

I've run the goacc.exp testcases with -Wuninitialized and found one more 
missing initialization. Fixed and committed as attached.


Thanks,
- Tom
Add missing initialization in goacc/host_data-tree.f95

2016-03-24  Tom de Vries  

	* gfortran.dg/goacc/host_data-tree.f95: Add missing initialization.

---
 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95
index 23aba8c..4a11b9d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95
@@ -3,7 +3,7 @@
 
 program test
   implicit none
-  integer :: i
+  integer :: i = 1
 
   !$acc host_data use_device(i)
   !$acc end host_data

Re: [PATCH] 69517 - [5/6 regression] SEGV on a VLA with excess initializer elements

2016-03-24 Thread Jason Merrill


On 03/23/2016 03:47 PM, Martin Sebor wrote:

Thanks for the comments.


2) It hardwires a rather arbitrarily restrictive limit of 64 KB
on the size of the biggest C++ VLA.  (This could stand to be
improved and made more intelligent, and perhaps integrated
with stack checking via -fstack-limit, after the GCC 6
release.)


The bounds checking should share code with build_new_1.


I agree that sharing the same code is right long term approach.

I had initially started out down that path, by factoring out code
from build_new_1 into a general function that I had thought could
be shared between it and cp_finish_decl.  But I ended up abandoning
that approach two reasons:

a) The checking expression built for array new is quite a bit less
involved because only the major dimension of the array in requires
runtime checking (the others must be constant and are checked at
compile time).  In contrast, all VLA dimensions are potentially
dynamic and so must be checked at runtime.

b) While (a) can be solved by making the checking function smart
and general enough, it felt too intrusive and potentially risky
to change array new at this stage.

That said, I'm happy to do this refactoring in stage 1.


Fair enough.  I don't think we can impose an arbitrary 64K limit, 
however, as that is a lot smaller than the 8MB default stack size, and 
programs can use setrlimit to increase the stack farther.  For GCC 6 let 
not impose any limit beyond non-negative/overflowing, and as you say we 
can do something better in GCC 7.



3) By throwing an exception for erroneous VLAs the patch largely
defeats the VLA Sanitizer.  The sanitizer is still useful in
C++ 98 mode where the N3639 VLA runtime checking is disabled,
and when exceptions are disabled via -fno-exceptions.
Disabling  the VLA checking in C++ 98 mode doesn't seem like
a useful feature, but I didn't feel like reverting what was
a deliberate decision.


What deliberate decision?  The old code checked for C++14 mode because
the feature was part of the C++14 working paper.  What's the rationale
for C++11 as the cutoff?


Sorry, I had misremembered the original C++ 14 check as one for
C++ 11.  Are you suggesting to restore the checking only for C++
14 mode, or for all modes?  Either is easy enough to do though,
IMO, it should be safe in all modes and I would expect it to be
preferable to undefined behavior.


I think all modes.


+ /* Iterate over all non-empty initializers in this array, recursively
+building expressions to see if the elements of each are in excess
+of the (runtime) bounds of of the array type.  */
+ FOR_EACH_VEC_SAFE_ELT (v, i, ce)
+   {
+ if (1) // !vec_safe_is_empty (CONSTRUCTOR_ELTS (ce->value)))
+   {
+ tree subcheck = build_vla_check (TREE_TYPE (type),
+  ce->value,
+  vlasize, max_vlasize,
+  cst_size);
+ check = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
+  check, subcheck);
+   }
+   }


The if (1) seems left over from development.

It looks like this will multiply *cst_size by the sub-array length once 
for each element of the outer array, leading to a too-large result.  And 
also generate redundant code to check the bounds of the sub-array 
multiple times.


It seems to me that we want use the existing check for excess 
initializers in build_vec_init, in the if (length_check) section, though 
as you mention in 70019 that needs to be improved to handle STRING_CST.


Also, I think we should check for invalid bounds in 
compute_array_index_type, next to the UBsan code.  Checking bounds only 
from cp_finish_decl means that we don't check uses of VLA types other 
than variable declarations.



+ /* Avoid instrumenting constexpr functions.  Those must
+be checked statically, and the (non-constexpr) dynamic
+instrumentation would cause them to be rejected.  */


Hmm, this sounds wrong; constexpr functions can also be called with 
non-constant arguments, and the instrumentation should be folded away 
when evaluating a call with constant arguments.


Jason

Re: [DOC Patch] Add sample for @cc constraint

2016-03-24 Thread Bernd Schmidt

In principle we probably should have an example, but once again I have 
some problems with the style of the added documentation. I prefer 
concise writing without unnecessary repetition. Any other reviewers can 
of course override me, but the following is my opinion on these changes.


More problematic than a lack of documentation is that I haven't been 
able to find an executable testcase. If you could adapt your example for 
use in gcc.target/i386, that would be even more important.


On 03/13/2016 05:00 AM, David Wohlferd wrote:

Index: extend.texi
===
--- extend.texi (revision 234163)
+++ extend.texi (working copy)
@@ -8047,6 +8047,7 @@

  Because of the special nature of the flag output operands, the constraint
  may not include alternatives.
+Do not clobber flags if they are being used as outputs.


I don't think the manual should point out the obvious. I'd be surprised 
if this wasn't documented or at least strongly implied elsewhere for 
normal operands.



+For builds that don't support flag output operands,


This feels strange, we should just be documenting the capabilities of 
this feature. Other parts of the docs already show what to do without 
it. Hence, reduce the example to this (plus the surrounding setup stuff):



+/* Avoid the redundant setc/testb and use the carry flag directly.  */
+asm ("bt $0, %1"
+  : "=@@ccc" (a)
+  : "r" (b));
+
+#endif



+Note: On the x86 platform, flags are normally considered clobbered by
+extended asm whether the @code{"cc"} clobber is specified or not.


Is it really necessary or helpful to mention that here? Not only is it 
not strictly correct (an output operand is not also considered 
clobbered), but to me it breaks the flow because you're left wondering 
how that sentence relates to the example (it doesn't).



  @anchor{InputOperands}
@@ -8260,6 +8298,8 @@
  On other machines, condition code handling is different,
  and specifying @code{"cc"} has no effect. But
  it is valid no matter what the target.
+For platform-specific uses of flags, see also
+@ref{FlagOutputOperands,Flag Output Operands}.


Is this likely to be helpful? Someone who's looking at how to use flag 
outputs probably isn't looking in the "Clobbers" section?



Bernd

Re: Fix 69650, bogus line numbers from libcpp

2016-03-24 Thread Bernd Schmidt




On 03/23/2016 03:21 PM, Richard Biener wrote:

On Wed, Mar 23, 2016 at 2:15 PM, Bernd Schmidt  wrote:

On 03/23/2016 01:41 PM, Richard Biener wrote:


Btw, the issue in the PR is also fixed with a simple

Index: libcpp/line-map.c
===
--- libcpp/line-map.c   (revision 234415)
+++ libcpp/line-map.c   (working copy)
@@ -543,7 +543,7 @@ linemap_add (struct line_maps *set, enum
   to_file);

 /* A TO_FILE of NULL is special - we use the natural values.  */
-  if (error || to_file == NULL)
+  if (to_file == NULL)
  {
to_file = ORDINARY_MAP_FILE_NAME (from);
to_line = SOURCE_LINE (from, from[1].start_location);



I looked at that, but that made it hard to add the testcase as the line
numbers no longer match the dg-error directives. By moving this code we can
ignore the erroneous #line directive, and for this one testcase at least,
that makes the line numbers (and caret diagnostics etc.) come out right.


After some more digging and looking at your patch I'd approve that if it would
emit a warning rather than an error - so can you please adjust it?


Like this? No one has yet approved any better wording for the message, 
so given that you said "it's not a regression" I've left it, but I would 
now prefer "linemarker ignored due to incorrect nesting".



Bernd

	PR lto/69650
	* directives.c (do_linemarker): Test for file left but not entered
	here.
	* line-map.c (linemap_add): Not here.

	PR lto/69650
	* gcc.dg/pr69650.c: New test.

Index: gcc/testsuite/gcc.dg/pr69650.c
===
--- gcc/testsuite/gcc.dg/pr69650.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr69650.c	(working copy)
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+# 9 "somefile" 2 /* { dg-warning "left but not entered" } */
+not_a_type a; /* { dg-error "unknown type" } */
Index: libcpp/directives.c
===
--- libcpp/directives.c	(revision 234341)
+++ libcpp/directives.c	(working copy)
@@ -1046,6 +1046,19 @@
 
   skip_rest_of_line (pfile);
 
+  if (reason == LC_LEAVE)
+{
+  const line_map_ordinary *from;  
+  if (MAIN_FILE_P (map)
+	  || (new_file
+	  && (from = INCLUDED_FROM (pfile->line_table, map)) != NULL
+	  && filename_cmp (ORDINARY_MAP_FILE_NAME (from), new_file) != 0))
+	{
+	  cpp_warning (pfile, CPP_W_NONE,
+		 "file \"%s\" left but not entered", new_file);
+	  return;
+	}
+}
   /* Compensate for the increment in linemap_add that occurs in
  _cpp_do_file_change.  We're currently at the start of the line
  *following* the #line directive.  A separate source_location for this
Index: libcpp/line-map.c
===
--- libcpp/line-map.c	(revision 234341)
+++ libcpp/line-map.c	(working copy)
@@ -512,43 +512,23 @@
 	 "included", this variable points the map in use right before the
 	 #include "included", inside the same "includer" file.  */
   line_map_ordinary *from;
-  bool error;
 
-  if (MAIN_FILE_P (map - 1))
-	{
-	  /* So this _should_ mean we are leaving the main file --
-	 effectively ending the compilation unit. But to_file not
-	 being NULL means the caller thinks we are leaving to
-	 another file. This is an erroneous behaviour but we'll
-	 try to recover from it. Let's pretend we are not leaving
-	 the main file.  */
-	  error = true;
-  reason = LC_RENAME;
-  from = map - 1;
-	}
-  else
-	{
-	  /* (MAP - 1) points to the map we are leaving. The
-	 map from which (MAP - 1) got included should be the map
-	 that comes right before MAP in the same file.  */
-	  from = INCLUDED_FROM (set, map - 1);
-	  error = to_file && filename_cmp (ORDINARY_MAP_FILE_NAME (from),
-	   to_file);
-	}
+  linemap_assert (!MAIN_FILE_P (map - 1));
+  /* (MAP - 1) points to the map we are leaving. The
+	 map from which (MAP - 1) got included should be the map
+	 that comes right before MAP in the same file.  */
+  from = INCLUDED_FROM (set, map - 1);
 
-  /* Depending upon whether we are handling preprocessed input or
-	 not, this can be a user error or an ICE.  */
-  if (error)
-	fprintf (stderr, "line-map.c: file \"%s\" left but not entered\n",
-		 to_file);
-
   /* A TO_FILE of NULL is special - we use the natural values.  */
-  if (error || to_file == NULL)
+  if (to_file == NULL)
 	{
 	  to_file = ORDINARY_MAP_FILE_NAME (from);
 	  to_line = SOURCE_LINE (from, from[1].start_location);
 	  sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
 	}
+  else
+	linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
+  to_file) == 0);
 }
 
   map->sysp = sysp;

[PATCH, HSA]: Fix PR hsa/70399

2016-03-24 Thread Martin Liška

Hello.

Current HSA back-end wrongly handles memory stores. Although, we properly 
identify
that an immediate operand needs to respect type of a memory store instruction 
it belongs to,
the binary representation of the operand is not updated.

Following patch delays emission of the binary representation and updates 
hsa_op_immmed::m_brig_repr_size
every time the m_type field of the operand is updated.

I've been testing the patch, ready after it finishes?

Thanks,
Martin
>From 8fa067df55566d7a52ad6070a8844d434519fe46 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 24 Mar 2016 15:41:59 +0100
Subject: [PATCH] Fix PR hsa/70399

gcc/ChangeLog:

2016-03-24  Martin Liska  

	* hsa-brig.c (hsa_op_immed::emit_to_buffer): Emit either
	a tree value or an immediate integer value to a buffer
	that is eventually copied to a BRIG section.
	(emit_immediate_operand): Call the function here.
	* hsa-gen.c (hsa_op_immed::hsa_op_immed): Remove this
	early emission to buffer.
	(hsa_op_immed::set_type): Update size of m_brig_repr_size.
	(gen_hsa_insns_for_store): Use hsa_op_immed::set_type.
	* hsa.h (hsa_op_immed::emit_to_buffer): Update signature.
---
 gcc/hsa-brig.c | 112 +++--
 gcc/hsa-gen.c  |  34 +++---
 gcc/hsa.h  |   2 +-
 3 files changed, 76 insertions(+), 72 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 9b6c0b8..8d18b0f 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -933,61 +933,88 @@ emit_immediate_scalar_to_buffer (tree value, char *data, unsigned need_len)
 }
 
 void
-hsa_op_immed::emit_to_buffer (tree value)
+hsa_op_immed::emit_to_buffer ()
 {
-  unsigned total_len = m_brig_repr_size;
+  if (m_tree_value != NULL_TREE)
+{
+  unsigned total_len = m_brig_repr_size;
 
-  /* As we can have a constructor with fewer elements, fill the memory
- with zeros.  */
-  m_brig_repr = XCNEWVEC (char, total_len);
-  char *p = m_brig_repr;
+  /* As we can have a constructor with fewer elements, fill the memory
+	 with zeros.  */
+  m_brig_repr = XCNEWVEC (char, total_len);
+  char *p = m_brig_repr;
 
-  if (TREE_CODE (value) == VECTOR_CST)
-{
-  int i, num = VECTOR_CST_NELTS (value);
-  for (i = 0; i < num; i++)
+  if (TREE_CODE (m_tree_value) == VECTOR_CST)
+	{
+	  int i, num = VECTOR_CST_NELTS (m_tree_value);
+	  for (i = 0; i < num; i++)
+	{
+	  tree v = VECTOR_CST_ELT (m_tree_value, i);
+	  unsigned actual = emit_immediate_scalar_to_buffer (v, p, 0);
+	  total_len -= actual;
+	  p += actual;
+	}
+	  /* Vectors should have the exact size.  */
+	  gcc_assert (total_len == 0);
+	}
+  else if (TREE_CODE (m_tree_value) == STRING_CST)
+	memcpy (m_brig_repr, TREE_STRING_POINTER (m_tree_value),
+		TREE_STRING_LENGTH (m_tree_value));
+  else if (TREE_CODE (m_tree_value) == COMPLEX_CST)
 	{
+	  gcc_assert (total_len % 2 == 0);
 	  unsigned actual;
 	  actual
-	= emit_immediate_scalar_to_buffer (VECTOR_CST_ELT (value, i), p, 0);
-	  total_len -= actual;
+	= emit_immediate_scalar_to_buffer (TREE_REALPART (m_tree_value), p,
+	   total_len / 2);
+
+	  gcc_assert (actual == total_len / 2);
 	  p += actual;
+
+	  actual
+	= emit_immediate_scalar_to_buffer (TREE_IMAGPART (m_tree_value), p,
+	   total_len / 2);
+	  gcc_assert (actual == total_len / 2);
 	}
-  /* Vectors should have the exact size.  */
-  gcc_assert (total_len == 0);
-}
-  else if (TREE_CODE (value) == STRING_CST)
-memcpy (m_brig_repr, TREE_STRING_POINTER (value),
-	TREE_STRING_LENGTH (value));
-  else if (TREE_CODE (value) == COMPLEX_CST)
-{
-  gcc_assert (total_len % 2 == 0);
-  unsigned actual;
-  actual
-	= emit_immediate_scalar_to_buffer (TREE_REALPART (value), p,
-	   total_len / 2);
-
-  gcc_assert (actual == total_len / 2);
-  p += actual;
-
-  actual
-	= emit_immediate_scalar_to_buffer (TREE_IMAGPART (value), p,
-	   total_len / 2);
-  gcc_assert (actual == total_len / 2);
+  else if (TREE_CODE (m_tree_value) == CONSTRUCTOR)
+	{
+	  unsigned len = vec_safe_length (CONSTRUCTOR_ELTS (m_tree_value));
+	  for (unsigned i = 0; i < len; i++)
+	{
+	  tree v = CONSTRUCTOR_ELT (m_tree_value, i)->value;
+	  unsigned actual = emit_immediate_scalar_to_buffer (v, p, 0);
+	  total_len -= actual;
+	  p += actual;
+	}
+	}
+  else
+	emit_immediate_scalar_to_buffer (m_tree_value, p, total_len);
 }
-  else if (TREE_CODE (value) == CONSTRUCTOR)
+  else
 {
-  unsigned len = vec_safe_length (CONSTRUCTOR_ELTS (value));
-  for (unsigned i = 0; i < len; i++)
+  hsa_bytes bytes;
+
+  switch (m_brig_repr_size)
 	{
-	  tree v = CONSTRUCTOR_ELT (value, i)->value;
-	  unsigned actual = emit_immediate_scalar_to_buffer (v, p, 0);
-	  total_len -= actual;
-	  p += actual;
+	case 1:
+	  bytes.b8 = (uint8_t) m_int_value;
+	  break;
+	case 2:
+	  bytes.b16 = (uint16_t)

Re: out of bounds access in insn-automata.c

2016-03-24 Thread Alexander Monakov

Hi,

On Thu, 24 Mar 2016, Bernd Schmidt wrote:
> On 03/24/2016 11:17 AM, Aldy Hernandez wrote:
> > On 03/23/2016 10:25 AM, Bernd Schmidt wrote:
> > > It looks like this block of code is written by a helper function that is
> > > really intended for other purposes than for maximal_insn_latency. Might
> > > be worth changing to
> > >   int insn_code = dfa_insn_code (as_a  (insn));
> > >   gcc_assert (insn_code <= DFA__ADVANCE_CYCLE);
> >
> > dfa_insn_code_* and friends can return > DFA__ADVANCE_CYCLE so I can't
> > put that assert on the helper function.
> 
> So don't use the helper function? Just emit the block above directly.

Let me chime in :)  The function under scrutiny, maximal_insn_latency, was
added as part of selective scheduling merge; at the same time,
output_default_latencies was factored out of
output_internal_insn_latency_func, and the pair of new functions
output_internal_maximal_insn_latency_func/output_maximal_insn_latency_func
tried to mirror existing pair of
output_internal_insn_latency_func/output_insn_latency_func.

In particular, output_insn_latency_func also invokes
output_internal_insn_code_evaluation (twice, for each argument).  This means
that generated 'insn_latency' can also call 'internal_insn_latency' with
DFA__ADVANCE_CYCLE in arguments.  However, 'internal_insn_latency' then has a
specially emitted 'if' statement that checks if either of the arguments is 
' >= DFA__ADVANCE_CYCLE', and returns 0 in that case.

So ultimately pre-existing code was checking ' > DFA__ADVANCE_CYCLE' first and
' >= DFA_ADVANCE_CYCLE' second (for no good reason as far as I can see), and
when the new '_maximal_' functions were introduced, the second check was not
duplicated in the new copy.

So as long we are not looking for hacking it up further, I'd like to clean up
both functions at the same time.  If calling the 'internal_' variants with
DFA__ADVANCE_CYCLE is rare, extending 'default_insn_latencies' by 1 zero
element corresponding to DFA__ADVANCE_CYCLE is a simple suitable fix. If
either DFA__ADVANCE_CYCLE is not guaranteed to be rare, or extending the table
in that style is undesired, I suggest creating a variant of
'output_internal_insn_code_evaluation' that performs a '>=' rather than '>'
test in the first place, and use it in both output_insn_latency_func and
output_maximal_insn_latency_func.  If acknowledged, I volunteer to regstrap on
x86_64 and submit that in stage1.

Thoughts?

Thanks.
Alexander

[PR70366] fix chromium build failure with LTO due to segfault in inline_call

2016-03-24 Thread Prathamesh Kulkarni

Hi,
The following fix suggested by Richard fixes chromium build failing
due to segfault in inline_call.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
Cross-tested on arm*-*-* and aarch64*-*-*.
Ok for trunk ?

Thanks,
Prathamesh
diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
index 5dc0b5a..f966fb0 100644
--- a/gcc/ipa-inline-transform.c
+++ b/gcc/ipa-inline-transform.c
@@ -329,8 +329,7 @@ inline_call (struct cgraph_edge *e, bool update_original,
 {
   struct gcc_options opts = global_options;
 
-  cl_optimization_restore (,
-TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)));
+  cl_optimization_restore (, opts_for_fn (to->decl));
   opts.x_flag_strict_aliasing = false;
   if (dump_file)
fprintf (dump_file, "Dropping flag_strict_aliasing on %s:%i\n",


ChangeLog
Description: Binary data

Re: [PATCH] Slightly improve TARGET_STV splitters (PR target/70321)

2016-03-24 Thread Uros Bizjak

On Wed, Mar 23, 2016 at 8:35 AM, Uros Bizjak  wrote:
> On Tue, Mar 22, 2016 at 10:37 PM, Jakub Jelinek  wrote:
>> Hi!
>>
>> As the PR mentions, DImode AND/IOR/XOR patterns often result in too ugly
>> code, regression from when the patterns weren't there (before STV has been
>> added).  This patch attempts to improve it a little bit by improving the
>> splitter for these, rather than always generating two SImode AND/IOR/XOR
>> instructions, if the last operand's subword is either 0 or -1, optimize
>> the corresponding instruction in the pair to nothing, or to clearing, or
>> negation.  More improvement can be IMHO only achieved by moving the STV
>> pass before combiner and split patterns we don't adjust into vector patterns
>> into corresponding SImode patterns, so that the combiner can handle them,
>> but that sounds like stage1 material.

Following patch fixes:

FAIL: gcc.c-torture/execute/bitfld-3.c   -O1  (internal compiler error)
FAIL: gcc.c-torture/execute/bitfld-3.c   -O1  (test for excess errors)

We should not expand post reload via gen_andsi3, since we can generate
movzbl with unsupported QImode register.

2016-03-24  Uros Bizjak  

* config/i386/i386.md (*anddi3_doubleword): Generate AND insn
using ix86_expand_binary_operator instead of gen_andsi3.

Bootstrap and regression test in process, will commit the patch when
regtest finish.

Uros.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 51e9a6e..339a134 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8154,7 +8154,7 @@
   ix86_expand_move (SImode, [0]);
 }
   else if (operands[2] != constm1_rtx)
-emit_insn (gen_andsi3 (operands[0], operands[1], operands[2]));
+ix86_expand_binary_operator (AND, SImode, [0]);
   else if (operands[5] == constm1_rtx)
 emit_note (NOTE_INSN_DELETED);
   if (operands[5] == const0_rtx)
@@ -8163,7 +8163,7 @@
   ix86_expand_move (SImode, [3]);
 }
   else if (operands[5] != constm1_rtx)
-emit_insn (gen_andsi3 (operands[3], operands[4], operands[5]));
+ix86_expand_binary_operator (AND, SImode, [3]);
   DONE;
 })

Re: out of bounds access in insn-automata.c

2016-03-24 Thread Bernd Schmidt




On 03/24/2016 11:17 AM, Aldy Hernandez wrote:

On 03/23/2016 10:25 AM, Bernd Schmidt wrote:

It looks like this block of code is written by a helper function that is
really intended for other purposes than for maximal_insn_latency. Might
be worth changing to
  int insn_code = dfa_insn_code (as_a  (insn));
  gcc_assert (insn_code <= DFA__ADVANCE_CYCLE);


dfa_insn_code_* and friends can return > DFA__ADVANCE_CYCLE so I can't
put that assert on the helper function.


So don't use the helper function? Just emit the block above directly.


Bernd

Re: [patch] avoid double evaluation of PIC_OFFSET_TABLE_REGNUM

2016-03-24 Thread Bernd Schmidt


On 03/24/2016 11:32 AM, Aldy Hernandez wrote:

On x86, PIC_OFFSET_TABLE_REGNUM calls a function
(ix86_use_pseudo_pic_reg) so its value can theoretically change between
its first and second use in the following conditional:

if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM
   && fixed_regs[PIC_OFFSET_TABLE_REGNUM])

Since the macro can return -1 on x86, the second use can cause an out of
bounds access.

In practice ix86_use_pseudo_pic_reg() is probably a pure function, since
we really shouldn't be changing the semantics of the pic register
mid-flight, but it's probably safer to just avoid calling the function
twice.

OK pending tests?


Ok for stage 1.


Bernd

[PATCH] Fix PR70396

2016-03-24 Thread Richard Biener


I am testing the following obvious patch on x86_64-unknown-linux-gnu.

Richard.

2016-03-24  Richard Biener  

PR tree-optimization/70396
* tree-vect-stmts.c (vectorizable_comparison): Use
get_vectype_for_scalar_type.

* gcc.dg/torture/pr70396.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 234453)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -7790,8 +7789,8 @@ vectorizable_comparison (gimple *stmt, g
   /* Invariant comparison.  */
   if (!vectype)
 {
-  vectype = build_vector_type (TREE_TYPE (rhs1), nunits);
-  if (tree_to_shwi (TYPE_SIZE_UNIT (vectype)) != current_vector_size)
+  vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1));
+  if (TYPE_VECTOR_SUBPARTS (vectype) != nunits)
return false;
 }
   else if (nunits != TYPE_VECTOR_SUBPARTS (vectype))
Index: gcc/testsuite/gcc.dg/torture/pr70396.c
===
--- gcc/testsuite/gcc.dg/torture/pr70396.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70396.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+struct S2 {
+signed f1 : 3;
+};
+int a[];
+int b, c;
+char d;
+void fn1() {
+struct S2 e;
+b / e.f1;
+c = 2;
+for (; c; c++) {
+   d = 0;
+   a[c] = ~e.f1 != d;
+}
+}

Re: [PATCH] Properly assign to packet header (PR hsa/70394)

2016-03-24 Thread Martin Jambor

Hi,

On Thu, Mar 24, 2016 at 12:48:34PM +0100, Martin Liska wrote:
> Hello.
> 
> Following patch initializes whole packet->header field, which is eventually 
> stored
> to a packet in atomic manner. The function mechanism was adopted from the HSA 
> runtime
> manual.
> 
> I've been running bootstrap and regression tests.
> Ready to be installed after it finishes?
> 
> Thanks,
> Martin
> 
> libgomp/ChangeLog:
> 
> 2016-03-24  Martin Liska  
> 
>   * plugin/plugin-hsa.c (packet_store_release): New function
>   that is taken from the HSA runtime manual.
>   (GOMP_OFFLOAD_run): Use the function.

OK, thanks,

Martin

[testsuite, committed] Add goacc/uninit-if-clause.{c,f95}

2016-03-24 Thread Tom de Vries


Hi,

I've committed attached patch, which adds testcases that test the 
-Wuninitialized warning for the if clause on various openacc directives.


I found an issue with line-numbers of the warning in C++, which I've 
filed as PR70392 - '[openacc] inconsistent line numbers in uninitialised 
warnings for if clause'.


Some of the warning scans are xfailed in C++ due to that PR.

Thanks,
- Tom
Add goacc/uninit-if-clause.{c,f95}

2016-03-24  Tom de Vries  

	* c-c++-common/goacc/uninit-if-clause.c: New test.
	* gfortran.dg/goacc/uninit-if-clause.f95: New test.

---
 .../c-c++-common/goacc/uninit-if-clause.c  | 38 ++
 .../gfortran.dg/goacc/uninit-if-clause.f95 | 20 
 2 files changed, 58 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-if-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-if-clause.c
new file mode 100644
index 000..55caa4c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-if-clause.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wuninitialized" } */
+/* { dg-excess-errors "PR70392" { xfail c++ } } */
+
+#include 
+
+int
+main (void)
+{
+  int l, l2, l3, l4;
+  bool b, b2, b3, b4;
+  int i, i2;
+
+  #pragma acc parallel if(l) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+
+  #pragma acc parallel if(b) /* { dg-warning "is used uninitialized in this function" "" { xfail c++ } } */
+  ;
+
+  #pragma acc kernels if(l2) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+
+  #pragma acc kernels if(b2) /* { dg-warning "is used uninitialized in this function" "" { xfail c++ } } */
+  ;
+
+  #pragma acc data if(l3) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+
+  #pragma acc data if(b3) /* { dg-warning "is used uninitialized in this function" "" { xfail c++ } } */
+  ;
+
+  #pragma acc update if(l4) self(i) /* { dg-warning "is used uninitialized in this function" } */
+  ;
+
+  #pragma acc update if(b4) self(i2) /* { dg-warning "is used uninitialized in this function" "" { xfail c++ } } */
+  ;
+
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/uninit-if-clause.f95 b/gcc/testsuite/gfortran.dg/goacc/uninit-if-clause.f95
new file mode 100644
index 000..60dc53e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/uninit-if-clause.f95
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-additional-options "-Wuninitialized" }
+
+program test
+  implicit none
+  logical :: b, b2, b3, b4
+  integer :: data, data2
+
+  !$acc parallel if(b) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end parallel
+
+  !$acc kernels if(b2) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end kernels
+
+  !$acc data if(b3) ! { dg-warning "is used uninitialized in this function" }
+  !$acc end data
+
+  !$acc update if(b4) self(data2) ! { dg-warning "is used uninitialized in this function" }
+
+end program test

[testsuite, committed] Add missing initializations in oacc testcases

2016-03-24 Thread Tom de Vries


Hi,

I've run the goacc.exp testcases with -Wuninitialized and found a few 
more missing initializations. Fixed and committed as attached.


Thanks,
- Tom
Add missing initializations in oacc testcases

2016-03-24  Tom de Vries  

	* gfortran.dg/goacc/data-tree.f95: Add missing initialization.
	* gfortran.dg/goacc/kernels-tree.f95: Same.
	* gfortran.dg/goacc/parallel-tree.f95: Same.

---
 gcc/testsuite/gfortran.dg/goacc/data-tree.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95 | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/data-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/data-tree.f95
index 23745f3..44efc8a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/data-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/data-tree.f95
@@ -4,7 +4,7 @@
 program test
   implicit none
   integer :: q, i, j, k, m, n, o, p, r, s, t, u, v, w
-  logical :: l
+  logical :: l = .true.
 
   !$acc data if(l) copy(i), copyin(j), copyout(k), create(m) &
   !$acc present(o), pcopy(p), pcopyin(r), pcopyout(s), pcreate(t) &
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index fac5b85..4ec66de 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -4,7 +4,7 @@
 program test
   implicit none
   integer :: q, i, j, k, m, n, o, p, r, s, t, u, v, w
-  logical :: l
+  logical :: l = .true.
 
   !$acc kernels if(l) async copy(i), copyin(j), copyout(k), create(m) &
   !$acc present(o), pcopy(p), pcopyin(r), pcopyout(s), pcreate(t) &
diff --git a/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
index 9037f6c..5b2e01d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
@@ -6,7 +6,7 @@
 program test
   implicit none
   integer :: q, i, j, k, m, n, o, p, r, s, t, u, v, w
-  logical :: l
+  logical :: l = .true.
 
   !$acc parallel if(l) async num_gangs(i) num_workers(i) vector_length(i) &
   !$acc reduction(max:q), copy(i), copyin(j), copyout(k), create(m) &

[PATCH] Fix PR70370

2016-03-24 Thread Richard Biener


The following patch fixes a missed gimplification of components
of registers used as asm outputs with non-memory constraints.

The solution is to handle !allows_mem during gimplification rather
than leaving it up to RTL expansion.  I'm only handling the case
where we'd otherwise ICE because of invalid GIMPLE, not all
!allows_mem cases we could handle to simplify things at this stage
(in particular not the COMPONENT_REF case in the testcase or the
whole-aggregate case there).

This is a wrong-code bug when checking is disabled as the asm
misses virtual operands in that case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-03-24  Richard Biener  

PR middle-end/70370
* gimplify.c (gimplify_asm_expr): Handle !allows_mem outputs
with register bases.

* gcc.dg/torture/pr70370.c: New testcase.

Index: gcc/gimplify.c
===
*** gcc/gimplify.c  (revision 234415)
--- gcc/gimplify.c  (working copy)
*** gimplify_asm_expr (tree *expr_p, gimple_
*** 5191,5196 
--- 5205,5236 
  ret = tret;
}
  
+   /* If the constraint does not allow memory make sure we gimplify
+  it to a register if it is not already but its base is.  This
+happens for complex and vector components.  */
+   if (!allows_mem)
+   {
+ tree op = TREE_VALUE (link);
+ if (! is_gimple_val (op)
+ && is_gimple_reg_type (TREE_TYPE (op))
+ && is_gimple_reg (get_base_address (op)))
+   {
+ tree tem = create_tmp_reg (TREE_TYPE (op));
+ tree ass;
+ if (is_inout)
+   {
+ ass = build2 (MODIFY_EXPR, TREE_TYPE (tem),
+   tem, unshare_expr (op));
+ gimplify_and_add (ass, pre_p);
+   }
+ ass = build2 (MODIFY_EXPR, TREE_TYPE (tem), op, tem);
+ gimplify_and_add (ass, post_p);
+ 
+ TREE_VALUE (link) = tem;
+ tret = GS_OK;
+   }
+   }
+ 
vec_safe_push (outputs, link);
TREE_CHAIN (link) = NULL_TREE;
  
Index: gcc/testsuite/gcc.dg/torture/pr70370.c
===
*** gcc/testsuite/gcc.dg/torture/pr70370.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70370.c  (working copy)
***
*** 0 
--- 1,45 
+ /* { dg-do compile } */
+ 
+ _Complex float
+ test1 (_Complex float f)
+ {
+   __asm__ ("" : "+r" (__real f));
+   return f;
+ }
+ 
+ _Complex float
+ test2 (_Complex float f)
+ {
+   __asm__ ("" : "=r" (__real f));
+   return f;
+ }
+ 
+ struct X { int i; };
+ 
+ struct X 
+ test3 (struct X x)
+ {
+   __asm__ ("" : "=r" (x.i));
+   return x;
+ }
+ 
+ struct X
+ test4 (struct X x)
+ {
+   __asm__ ("" : "+r" (x.i));
+   return x;
+ }
+ 
+ struct X 
+ test5 (struct X x)
+ {
+   __asm__ ("" : "=r" (x));
+   return x;
+ }
+ 
+ struct X
+ test6 (struct X x)
+ {
+   __asm__ ("" : "+r" (x));
+   return x;
+ }

[PATCH] Properly assign to packet header (PR hsa/70394)

2016-03-24 Thread Martin Liška

Hello.

Following patch initializes whole packet->header field, which is eventually 
stored
to a packet in atomic manner. The function mechanism was adopted from the HSA 
runtime
manual.

I've been running bootstrap and regression tests.
Ready to be installed after it finishes?

Thanks,
Martin

libgomp/ChangeLog:

2016-03-24  Martin Liska  

* plugin/plugin-hsa.c (packet_store_release): New function
that is taken from the HSA runtime manual.
(GOMP_OFFLOAD_run): Use the function.
---
 libgomp/plugin/plugin-hsa.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index f7ef600..0b3b67a 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -1185,6 +1185,14 @@ failure:
   return false;
 }
 
+/* Atomically store pair of uint16_t values (HEADER and REST) to a PACKET.  */
+
+void
+packet_store_release (uint32_t* packet, uint16_t header, uint16_t rest)
+{
+  __atomic_store_n (packet, header | (rest << 16), __ATOMIC_RELEASE);
+}
+
 /* Part of the libgomp plugin interface.  Run a kernel on device N and pass it
an array of pointers in VARS as a parameter.  The kernel is identified by
FN_PTR which must point to a kernel_info structure.  */
@@ -1232,7 +1240,6 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void 
**args)
   + index % agent->command_q->size;
 
   memset (((uint8_t *) packet) + 4, 0, sizeof (*packet) - 4);
-  packet->setup |= (uint16_t) 1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS;
   packet->grid_size_x = kla->gdims[0];
   uint32_t wgs = kla->wdims[0];
   if (wgs == 0)
@@ -1275,7 +1282,9 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr, void *vars, void 
**args)
 
   HSA_DEBUG ("Going to dispatch kernel %s\n", kernel->name);
 
-  __atomic_store_n ((uint16_t *) (>header), header, __ATOMIC_RELEASE);
+  packet_store_release ((uint32_t *) packet, header,
+   1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS);
+
   hsa_signal_store_release (agent->command_q->doorbell_signal, index);
 
   /* TODO: GPU agents in Carrizo APUs cannot properly update L2 cache for
-- 
2.7.1

[committed] Fix up pr70290.C testcase

2016-03-24 Thread Jakub Jelinek

Hi!

I've noticed pr70290.C testcase fails on powerpc*, for the usual reasons,
extra ABI warnings.  -w -Wno-psabi is the standard way to deal with this.
Committed as obvious.

2016-03-24  Jakub Jelinek  

PR target/70290
* g++.dg/ext/pr70290.C: Add -Wno-psabi -w to dg-options.  Formatting.

--- gcc/testsuite/g++.dg/ext/pr70290.C.jj   2016-03-23 10:41:12.0 
+0100
+++ gcc/testsuite/g++.dg/ext/pr70290.C  2016-03-24 12:24:56.111834140 +0100
@@ -1,16 +1,18 @@
+/* PR target/70290 */
 /* { dg-do compile } */
+/* { dg-options "-Wno-psabi -w" } */
 /* { dg-additional-options "-mavx512vl" { target { i?86-*-* x86_64-*-* } } } */
 
 typedef int vec __attribute__((vector_size(32)));
 
 vec
-test1 (vec x,vec y)
+test1 (vec x, vec y)
 {
   return (x < y) ? 1 : 0;
 }
 
 vec
-test2 (vec x,vec y)
+test2 (vec x, vec y)
 {
   vec zero = { };
   vec one = zero + 1;


Jakub

[RS6000, PATCH] PR70052, ICE compiling _Decimal128 test case

2016-03-24 Thread Alan Modra

This fixes the PR70052 ICE by modifying easy_fp_constant to correctly
return false for decimal floating point zero.  0.0D is not an all-zero
bit pattern, at least, not the canonical form.

I've also taken on Mike's suggestion of using a mode dependent
constraint for insns that currently use "j".  Note that
"easy_fp_constant" is already part of "input_operand" so in the usual
case we ought to be prevented from generating 0.0D immediate
constants.  However, in the past I've seen reload do some nasty tricks
when pseudos don't get hard regs, and believe that a pseudo that is
known to be equal to 0.0D may have the constant substituted with only
constraints being checked, not the operand predicates.  So either the
"j" constraint needs fixing to reject decimal float (as I had in my
original patch) or not used with decimal float (Mike's approach).
I left in a small tidy for "j" from my original patch.

Bootstrapped and regression tested powerpc64le-linux.  OK to apply?

gcc/
PR target/70052
* config/rs6000/constraints.md (j): Simplify.
* config/rs6000/predicates.md (easy_fp_constant): Exclude
decimal float 0.D.
* config/rs6000/rs6000.md (zero_fp): New mode_attr.  Use in place
of "j" in all constraints.
(movtd_64bit_nodm): Delete "j" constraint alternative.
gcc/testsuite/
* gcc.dg/dfp/pr70052.c: New.

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 9eca757..ea15764 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -272,4 +272,4 @@ usually better to use @samp{m} or @samp{es} in @code{asm} 
statements)"
 
 (define_constraint "j"
   "Zero vector constant"
-  (match_test "op == const0_rtx || op == CONST0_RTX (GET_MODE (op))"))
+  (match_test "op == const0_rtx || op == CONST0_RTX (mode)"))
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 072291e..71fac76 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -527,13 +527,14 @@
   && mode != DImode)
 return 1;
 
+  /* 0.0D is not all zero bits.  */
+  if (DECIMAL_FLOAT_MODE_P (mode))
+return 0;
+
   /* The constant 0.0 is easy under VSX.  */
   if (TARGET_VSX && SCALAR_FLOAT_MODE_P (mode) && op == CONST0_RTX (mode))
 return 1;
 
-  if (DECIMAL_FLOAT_MODE_P (mode))
-return 0;
-
   /* If we are using V.4 style PIC, consider all constants to be hard.  */
   if (flag_pic && DEFAULT_ABI == ABI_V4)
 return 0;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d4678af..d47f93e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -428,6 +428,16 @@
(SD "REAL_VALUE_TO_TARGET_DECIMAL32")
(DD "REAL_VALUE_TO_TARGET_DECIMAL64")])
 
+; Whether 0.0 has an all-zero bit pattern
+(define_mode_attr zero_fp [(SF "j")
+  (DF "j")
+  (TF "j")
+  (IF "j")
+  (KF "j")
+  (SD "wn")
+  (DD "wn")
+  (TD "wn")])
+
 ; Definitions for load to 32-bit fpr register
 (define_mode_attr f32_lr  [(SF "f")  (SD "wz")])
 (define_mode_attr f32_lr2 [(SF "wb") (SD "wn")])
@@ -6472,7 +6482,7 @@
 
 (define_insn "mov_hardfloat"
   [(set (match_operand:FMOVE32 0 "nonimmediate_operand" 
"=!r,!r,m,f,,,!r,,Z,?,?r,*c*l,!r,*h")
-   (match_operand:FMOVE32 1 "input_operand" 
"r,m,r,f,,j,j,Z,,r,,r,h,0"))]
+   (match_operand:FMOVE32 1 "input_operand" 
"r,m,r,fZ,,r,,r,h,0"))]
   "(gpc_reg_operand (operands[0], mode)
|| gpc_reg_operand (operands[1], mode))
&& (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT)"
@@ -6612,7 +6622,7 @@
 
 (define_insn "*mov_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand" 
"=m,d,d,,Z,,o,,,!r,Y,r,!r")
-   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,Z,,o,,,j,j,r,Y,r"))]
+   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,Z,,o,r,Y,r"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && 
TARGET_DOUBLE_FLOAT 
&& (gpc_reg_operand (operands[0], mode)
|| gpc_reg_operand (operands[1], mode))"
@@ -6650,7 +6660,7 @@
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*mov_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand" 
"=m,d,d,,o,,Z,,,!r,Y,r,!r,*c*l,!r,*h,r,wg,r,")
-   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,o,,Z,,,j,j,r,Y,r,r,h,0,wg,r,,r"))]
+   (match_operand:FMOVE64 1 "input_operand" 
"d,m,d,o,,Z,r,Y,r,r,h,0,wg,r,,r"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
&& (gpc_reg_operand (operands[0], mode)
|| gpc_reg_operand (operands[1], mode))"
@@ -6713,7 +6723,7 @@
 
 (define_insn_and_split "*mov_64bit_dm"
   [(set (match_operand:FMOVE128_FPR 0

[patch] avoid double evaluation of PIC_OFFSET_TABLE_REGNUM

2016-03-24 Thread Aldy Hernandez

On x86, PIC_OFFSET_TABLE_REGNUM calls a function 
(ix86_use_pseudo_pic_reg) so its value can theoretically change between 
its first and second use in the following conditional:


   if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM
  && fixed_regs[PIC_OFFSET_TABLE_REGNUM])

Since the macro can return -1 on x86, the second use can cause an out of 
bounds access.


In practice ix86_use_pseudo_pic_reg() is probably a pure function, since 
we really shouldn't be changing the semantics of the pic register 
mid-flight, but it's probably safer to just avoid calling the function 
twice.


OK pending tests?
+   * builtins.c (expand_builtin_nonlocal_goto): Avoid evaluating
+   PIC_OFFSET_TABLE_REGNUM twice.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 058ecc3..d4f7e94 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1101,8 +1101,8 @@ expand_builtin_nonlocal_goto (tree exp)
 to targets with a nonlocal_goto pattern; they are free
 to implement it in their own way.  Note also that this is
 a no-op if the GP register is a global invariant.)  */
-  if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM
- && fixed_regs[PIC_OFFSET_TABLE_REGNUM])
+  unsigned regnum = PIC_OFFSET_TABLE_REGNUM;
+  if (regnum != INVALID_REGNUM && fixed_regs[regnum])
emit_use (pic_offset_table_rtx);
 
   emit_indirect_jump (r_label);

Re: out of bounds access in insn-automata.c

2016-03-24 Thread Aldy Hernandez


On 03/23/2016 10:25 AM, Bernd Schmidt wrote:

On 03/23/2016 07:32 AM, Aldy Hernandez wrote:


int
maximal_insn_latency (rtx insn)
{
   int insn_code;

   if (insn == 0)
 insn_code = DFA__ADVANCE_CYCLE;


   else
 {
   insn_code = dfa_insn_code (as_a  (insn));
   if (insn_code > DFA__ADVANCE_CYCLE)
 return 0;
 }
   return internal_maximal_insn_latency (insn_code, insn);
}

In the case where insn==0, insn_code is set to the size of
default_latencies[] which will get accessed in the return.

Does insn==0 never happen?


I suspect it never happens in this function. I'd add a gcc_assert to
that effect and try a bootstrap/test. Hmm, it seems to be a sel-sched
only thing so a normal bootstrap would be meaningless, but from the
context it looks fairly clearly like insn is always nonnull.


Vlad.  Bernd.  Thanks for your input.

I've added the assert on the caller (maximal_insn_latency), because as 
you mentioned, the helper function is used for other things in which 
insn==0 can happen.


Now we generate:


int
maximal_insn_latency (rtx insn)
{
  int insn_code;
  gcc_assert (insn != 0);   // --- Added code.

  if (insn == 0)
insn_code = DFA__ADVANCE_CYCLE;


  else
{
  insn_code = dfa_insn_code (as_a  (insn));
  if (insn_code > DFA__ADVANCE_CYCLE)
return 0;
}
  return internal_maximal_insn_latency (insn_code, insn);
}



It looks like this block of code is written by a helper function that is
really intended for other purposes than for maximal_insn_latency. Might
be worth changing to
  int insn_code = dfa_insn_code (as_a  (insn));
  gcc_assert (insn_code <= DFA__ADVANCE_CYCLE);


dfa_insn_code_* and friends can return > DFA__ADVANCE_CYCLE so I can't 
put that assert on the helper function.


While I was at it, I changed the helper function comment to reflect what 
it has been generating.  It was wrong.


First round of tests was ok, but test machine died.  OK pending tests?

Aldy
+
+   * genautomata.c (output_maximal_insn_latency_func): Assert that
+   insn is non-null.
+   (output_internal_insn_code_evaluation): Fix comment to match
+   generated code.

diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index e3a6c59..91abd2e 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -8113,14 +8113,14 @@ output_internal_trans_func (void)
 
 /* Output code
 
-  if (insn != 0)
+  if (insn == 0)
+insn_code = DFA__ADVANCE_CYCLE;
+  else
 {
   insn_code = dfa_insn_code (insn);
   if (insn_code > DFA__ADVANCE_CYCLE)
 return code;
 }
-  else
-insn_code = DFA__ADVANCE_CYCLE;
 
   where insn denotes INSN_NAME, insn_code denotes INSN_CODE_NAME, and
   code denotes CODE.  */
@@ -8527,6 +8527,7 @@ output_maximal_insn_latency_func (void)
   "maximal_insn_latency", INSN_PARAMETER_NAME);
   fprintf (output_file, "{\n  int %s;\n",
   INTERNAL_INSN_CODE_NAME);
+  fprintf (output_file, "  gcc_assert (%s != 0);\n", INSN_PARAMETER_NAME);
   output_internal_insn_code_evaluation (INSN_PARAMETER_NAME,
INTERNAL_INSN_CODE_NAME, 0);
   fprintf (output_file, "  return %s (%s, %s);\n}\n\n",

Re: rs6000 stack_tie mishap again

2016-03-24 Thread Olivier Hainque

Hello Alan,

> On 24 Mar 2016, at 05:10, Alan Modra  wrote:
> 
>>if (could_be_prologue_epilogue
>>&& prologue_epilogue_contains (insn))
>>  continue;

> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html

Ah, interesting, thanks!

>> My rough understanding is that we probably really care about frame_related
>> insns only here, at least on targets where the flag is supposed to be set
>> accurately.
> 
> Also, possibly just prologue insns.  So you might be able to modify
> init_alias_analysis just to ignore the prologue and skip any need for
> yet another hook.

Which would be good.

> Let's see what rth thinks.

Definitely.

> He did say the patch might need to be redone.  :)
> https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

And here we have a case :)

> [snip]
>> So, aside from the dependency issue which needs to be fixed somehow, I
>> think it would make sense to consider using a strong blockage mecanism in
>> expand_epilogue.
> 
> That's what we both said here
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html
> 
> and David agreed too
> https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html
> 
> but if you can have the alias analysis changes accepted that would be
> even better.

I'd really like to come to a resolution we're confident is robust,
because these are really very nasty bugs.

To tell the truth, my current feeling is that relying on the frame_related
bit still seems fragile (*) so I'd be happier with something stronger.

(*) First, it's not easy to be positive that all the insns we'd need to
catch are not frame_related, even if only looking at the current rs6000
epilogue expander. Second, it's very easy to consider flipping one such
bit for whatever reason and not think about this kind of implications.

Thanks for your feedback on this!

Olivier

Re: rs6000 stack_tie mishap again

2016-03-24 Thread Olivier Hainque

> On 24 Mar 2016, at 05:58, Alan Modra  wrote:
> 
> On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote:
>> The description and
>> references to prior SPE prologue and epilogue changes do not confirm a
>> wider problem.
> 
> There's a good chance this affects ABI_V4 large stack frames too.
> If restoring regs inline we'll be using r11 as a base, just like SPE
> does with moderate sized stack frames when restoring 64-bit regs.

Exactly. If I'm not mistaken, the set of problematic cases encompasses
everything which uses in the epilogue, as a base, a register which
might have been used last to designate a global object in the function
body. There are such uses of at least r11 not limited to SPE.

Olivier

50 matches

Mail list logo