date:20150908

[PATCH] Convert SPARC to LRA

2015-09-08 Thread David Miller


The following patch converts the sparc backend over to LRA.

The three major obstacles to overcome were:

1) The funky "U" constraint.  It was a register constraint, but
   did not evaluate to a register class, and was used to help
   handling unaligned integer register pairs.

   It turns out to be unnecessary, since GENERAL_REGS plus
   HARD_REGNO_MODE_OK() do the job just fine now.

2) Sparc generates unreasonable amounts of subregging because
   it did not define PROMOTE_MODE().  All of the subreg LRA
   problems I was running into went away once I simply added
   the define.

3) The sethi/or patterns accepting direct symbol references and
   similar should not be available when flag_pic.

The testsuite runs really well, there are no regressions and in
fact the LRA conversion fixes some failures.

I'm therefore reasonably confident in these changes, but I will
not apply them just yet to give the other sparc maintainers some
time to review and give feedback.

2015-09-08  David S. Miller  

* config/sparc/constraints.md: Make "U" constraint a real register
constraint.
* config/sparc/sparc.c (TARGET_LRA_P): Define.
(D_MODES, DF_MODES): Add missing cast.
(TF_MODES, TF_MODES_NO_S): Include T_MODE.
(OF_MODES, OF_MODES_NO_S): Include O_MODE.
(sparc_register_move_cost): Decrease Niagara/UltrsSPARC memory
cost to 8.
* config/sparc/sparc.h (PROMOTE_MODE): Define.
* config/sparc/sparc.md (*movsi_lo_sum, *movsi_high): Do not
provide these insn when flag_pic.

diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index e12efa1..7a18879 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -44,6 +44,8 @@
 (define_register_constraint "h" "(TARGET_V9 && TARGET_V8PLUS ? I64_REGS : 
NO_REGS)"
  "64-bit global or out register in V8+ mode")
 
+(define_register_constraint "U" "(TARGET_ARCH32 ? GENERAL_REGS : NO_REGS)")
+
 ;; Floating-point constant constraints
 
 (define_constraint "G"
@@ -135,51 +137,6 @@
   (match_code "mem")
   (match_test "memory_ok_for_ldd (op)")))
 
-;; This awkward register constraint is necessary because it is not
-;; possible to express the "must be even numbered register" condition
-;; using register classes.  The problem is that membership in a
-;; register class requires that all registers of a multi-regno
-;; register be included in the set.  It is add_to_hard_reg_set
-;; and in_hard_reg_set_p which populate and test regsets with these
-;; semantics.
-;;
-;; So this means that we would have to put both the even and odd
-;; register into the register class, which would not restrict things
-;; at all.
-;;
-;; Using a combination of GENERAL_REGS and HARD_REGNO_MODE_OK is not a
-;; full solution either.  In fact, even though IRA uses the macro
-;; HARD_REGNO_MODE_OK to calculate which registers are prohibited from
-;; use in certain modes, it still can allocate an odd hard register
-;; for DImode values.  This is due to how IRA populates the table
-;; ira_useful_class_mode_regs[][].  It suffers from the same problem
-;; as using a register class to describe this restriction.  Namely, it
-;; sets both the odd and even part of an even register pair in the
-;; regset.  Therefore IRA can and will allocate odd registers for
-;; DImode values on 32-bit.
-;;
-;; There are legitimate cases where DImode values can end up in odd
-;; hard registers, the most notable example is argument passing.
-;;
-;; What saves us is reload and the DImode splitters.  Both are
-;; necessary.  The odd register splitters cannot match if, for
-;; example, we have a non-offsetable MEM.  Reload will notice this
-;; case and reload the address into a single hard register.
-;;
-;; The real downfall of this awkward register constraint is that it does
-;; not evaluate to a true register class like a bonafide use of
-;; define_register_constraint would.  This currently means that we cannot
-;; use LRA on Sparc, since the constraint processing of LRA really depends
-;; upon whether an extra constraint is for registers or not.  It uses
-;; reg_class_for_constraint, and checks it against NO_REGS.
-(define_constraint "U"
- "Pseudo-register or hard even-numbered integer register"
- (and (match_test "TARGET_ARCH32")
-  (match_code "reg")
-  (ior (match_test "REGNO (op) < FIRST_PSEUDO_REGISTER")
-  (not (match_test "reload_in_progress && reg_renumber [REGNO (op)] < 
0")))
-  (match_test "register_ok_for_ldd (op)")))
-
 ;; Equivalent to 'T' but available in 64-bit mode
 (define_memory_constraint "W"
  "Memory reference for 'e' constraint floating-point register"
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index ed8a166..b41800c 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -808,6 +808,9 @@ char sparc_hard_reg_printed[8];
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE sparc_can_eliminate
 
+#undef TARGET_LRA_P
+#d

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread Ian Lance Taylor

Mike Stump  writes:

> Not a big issue, but slightly better if (O_CLOEXEC>>32) != 0 is also
> true.  See, if AIX should ever define this to a sensible value, the
> above would disappear the feature.  However, if they did, then this
> expression should then be false.

Yes, I think this might be even better in code.  How about something
like

  /* On some versions of AIX O_CLOEXEC does not fit in int, so use a
 cast to force it.  */
  descriptor = open (filename, (int) (O_RDONLY | O_BINARY | O_CLOEXEC));

Does that work on AIX?

Ian

Re: libbacktrace patch committed: Graceful fallback if out of memory

2015-09-08 Thread Ian Lance Taylor

On Tue, Sep 8, 2015 at 5:00 PM, Hans-Peter Nilsson
 wrote:
>
> I've committed the following as obvious, following the pattern
> of the other files including internal.h, after observing
> all-libbacktrace (i.e. built for the host) complete.
>
> libbacktrace:
> * backtrace.c: #include .

Thanks.

Ian

Re: libbacktrace patch committed: Graceful fallback if out of memory

2015-09-08 Thread Hans-Peter Nilsson

> From: Ian Lance Taylor 
> Date: Tue, 8 Sep 2015 18:46:21 +0200


> PR other/67457
> * backtrace.c: #include "internal.h".
> (struct backtrace_data): Add can_alloc field.
> (unwind): If can_alloc is false, don't try to get file/line
> information.
> (backtrace_full): Set can_alloc field in bdata.
> * alloc.c (backtrace_alloc): Don't call error_callback if it is
> NULL.
> * mmap.c (backtrace_alloc): Likewise.
> * internal.h: Update comments for backtrace_alloc and
> backtrace_free.

> Index: backtrace.c
> ===
> --- backtrace.c   (revision 227528)
> +++ backtrace.c   (working copy)
> @@ -34,6 +34,7 @@ POSSIBILITY OF SUCH DAMAGE.  */
>  
>  #include "unwind.h"
>  #include "backtrace.h"
> +#include "internal.h"
>  
>  /* The main backtrace_full routine.  */
>  


I don't know about your environment, but for me (cross from
x86_64-linux to cris-elf) that causes a:

/bin/sh ./libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. 
-I/tmp/hpautotest-gcc0/gcc/libbacktrace  -I 
/tmp/hpautotest-gcc0/gcc/libbacktrace/../include -I 
/tmp/hpautotest-gcc0/gcc/libbacktrace/../libgcc -I ../libgcc  -funwind-tables 
-frandom-seed=backtrace.lo -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute 
-Wcast-qual  -g -O2 -c -o backtrace.lo 
/tmp/hpautotest-gcc0/gcc/libbacktrace/backtrace.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. 
-I/tmp/hpautotest-gcc0/gcc/libbacktrace -I 
/tmp/hpautotest-gcc0/gcc/libbacktrace/../include -I 
/tmp/hpautotest-gcc0/gcc/libbacktrace/../libgcc -I ../libgcc -funwind-tables 
-frandom-seed=backtrace.lo -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute 
-Wcast-qual -g -O2 -c /tmp/hpautotest-gcc0/gcc/libbacktrace/backtrace.c  -fPIC 
-DPIC -o .libs/backtrace.o
In file included from /tmp/hpautotest-gcc0/gcc/libbacktrace/backtrace.c:37:
/tmp/hpautotest-gcc0/gcc/libbacktrace/internal.h:182: error: expected 
declaration specifiers or '...' before 'off_t'
make[3]: *** [backtrace.lo] Error 1
make[3]: Leaving directory `/tmp/hpautotest-gcc0/cris-elf/gccobj/libbacktrace'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/tmp/hpautotest-gcc0/cris-elf/gccobj/libbacktrace'
make[1]: *** [all-libbacktrace] Error 2
make[1]: Leaving directory `/tmp/hpautotest-gcc0/cris-elf/gccobj'
make: *** [all] Error 2

I've committed the following as obvious, following the pattern
of the other files including internal.h, after observing
all-libbacktrace (i.e. built for the host) complete.

libbacktrace:
* backtrace.c: #include .

Index: backtrace.c
===
--- backtrace.c (revision 227567)
+++ backtrace.c (working copy)
@@ -32,6 +32,8 @@ POSSIBILITY OF SUCH DAMAGE.  */
 
 #include "config.h"
 
+#include 
+
 #include "unwind.h"
 #include "backtrace.h"
 #include "internal.h"

brgds, H-P

Re: [PATCH] Make tsan tests less picky about ansi escape codes in diagnostics.

2015-09-08 Thread Jonathan Roelofs




On 9/8/15 4:09 PM, Jonathan Roelofs wrote:

In a similar vein to [1], this does the same for the tsan tests.

These tests also suffer from being overly picky w.r.t. ansi escape codes
in the runtime diagnostics (which can't be suppressed as explained in [2]).


Tested on a remote x86_64-linux-gnu target.


2015-09-08  Jonathan Roelofs  

 * c-c++-common/tsan/race_on_mutex.c: Don't be picky about ansi escape
 codes in diagnostics.
 * c-c++-common/tsan/free_race2.c: Ditto.
 * c-c++-common/tsan/mutexset1.c: Ditto.
 * c-c++-common/tsan/fd_pipe_race.c: Ditto.
 * c-c++-common/tsan/atomic_stack.c: Ditto.
 * c-c++-common/tsan/write_in_reader_lock.c: Ditto.
 * c-c++-common/tsan/free_race.c: Ditto.
 * c-c++-common/tsan/simple_stack.c: Ditto.


Forgot to attach the patch itself.


Jon




Cheers,

Jon


1: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01755.html
2: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00292.html



--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded
Index: gcc/testsuite/c-c++-common/tsan/simple_stack.c
===
--- gcc/testsuite/c-c++-common/tsan/simple_stack.c  (revision 454039)
+++ gcc/testsuite/c-c++-common/tsan/simple_stack.c  (working copy)
@@ -52,17 +52,17 @@
 }
 
 /* { dg-output "WARNING: ThreadSanitizer: data race.*" } */
-/* { dg-output "  Write of size 4 at .* by thread T1:(\n|\r\n|\r)" } */
+/* { dg-output "  Write of size 4 at .* by thread T1:.*" } */
 /* { dg-output "#0 foo1.* .*(simple_stack.c:11|\\?{2}:0) (.*)" } */
 /* { dg-output "#1 bar1.* .*(simple_stack.c:16|\\?{2}:0) (.*)" } */
 /* { dg-output "#2 Thread1.* .*(simple_stack.c:30|\\?{2}:0) (.*)" } */
-/* { dg-output "  Previous read of size 4 at .* by thread T2:(\n|\r\n|\r)" } */
+/* { dg-output "  Previous read of size 4 at .* by thread T2:.*" } */
 /* { dg-output "#0 foo2.* .*(simple_stack.c:20|\\?{2}:0) (.*)" } */
 /* { dg-output "#1 bar2.* .*(simple_stack.c:25|\\?{2}:0) (.*)" } */
 /* { dg-output "#2 Thread2.* .*(simple_stack.c:35|\\?{2}:0) (.*)" } */
-/* { dg-output "  Thread T1 \\(tid=.*, running\\) created by main thread 
at:(\n|\r\n|\r)" } */
+/* { dg-output "  Thread T1 \\(tid=.*, running\\) created by main thread 
at:.*" } */
 /* { dg-output "#0 pthread_create .* (.*)" } */
 /* { dg-output "#1 StartThread.* .*(simple_stack.c:41|\\?{2}:0) (.*)" } */
-/* { dg-output "  Thread T2 (.*) created by main thread at:(\n|\r\n|\r)" } */
+/* { dg-output "  Thread T2 (.*) created by main thread at:.*" } */
 /* { dg-output "#0 pthread_create .* (.*)" } */
 /* { dg-output "#1 StartThread.* .*(simple_stack.c:41|\\?{2}:0) (.*)" } */
Index: gcc/testsuite/c-c++-common/tsan/race_on_mutex.c
===
--- gcc/testsuite/c-c++-common/tsan/race_on_mutex.c (revision 454039)
+++ gcc/testsuite/c-c++-common/tsan/race_on_mutex.c (working copy)
@@ -36,10 +36,10 @@
   return 0;
 }
 
-/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
-/* { dg-output "  Atomic read of size 1 at .* by thread T2:(\n|\r\n|\r)" } */
+/* { dg-output "WARNING: ThreadSanitizer: data race.*" } */
+/* { dg-output "  Atomic read of size 1 at .* by thread T2:.*" } */
 /* { dg-output "#0 pthread_mutex_lock.*" } */
-/* { dg-output "#1 Thread2.* .*(race_on_mutex.c:22|\\?{2}:0) (.*)" } */
-/* { dg-output "  Previous write of size 1 at .* by thread T1:(\n|\r\n|\r)" } 
*/
+/* { dg-output "#1 Thread2.* .*(race_on_mutex.c:22|\\?{2}:0) .*" } */
+/* { dg-output "  Previous write of size 1 at .* by thread T1:.*" } */
 /* { dg-output "#0 pthread_mutex_init .* (.)*" } */
 /* { dg-output "#1 Thread1.* .*(race_on_mutex.c:12|\\?{2}:0) .*" } */
Index: gcc/testsuite/c-c++-common/tsan/free_race2.c
===
--- gcc/testsuite/c-c++-common/tsan/free_race2.c(revision 454039)
+++ gcc/testsuite/c-c++-common/tsan/free_race2.c(working copy)
@@ -17,12 +17,12 @@
   return 0;
 }
 
-/* { dg-output "WARNING: ThreadSanitizer: heap-use-after-free.*(\n|\r\n|\r)" } 
*/
-/* { dg-output "  Write of size 4.* by main thread:(\n|\r\n|\r)" } */
+/* { dg-output "WARNING: ThreadSanitizer: heap-use-after-free.*" } */
+/* { dg-output "  Write of size 4.* by main thread:.*" } */
 /* { dg-output "#0 bar.*" } */
 /* { dg-output "#1 main .*" } */
-/* { dg-output "  Previous write of size 8 at .* by main thread:(\n|\r\n|\r)" 
} */
+/* { dg-output "  Previous write of size 8 at .* by main thread:.*" } */
 /* { dg-output "#0 free .*" } */
-/* { dg-output "#\(1|2\) foo.*(\n|\r\n|\r)" } */
+/* { dg-output "#\(1|2\) foo.*" } */
 /* { dg-output "#\(2|3\) main .*" } */
 
Index: gcc/testsuite/c-c++-common/tsan/mutexset1.c
===
--- gcc/testsuite/c-c++-common/tsan/mutexset1.c (revision 454039)
+++ gcc/tests

Re: [PATCH] Make ubsan tests less picky about ansi escape codes in diagnostics.

2015-09-08 Thread Jonathan Roelofs




On 9/4/15 12:20 AM, Yury Gribov wrote:

On 09/03/2015 07:45 PM, Jonathan Roelofs wrote:



On 9/3/15 10:17 AM, Jakub Jelinek wrote:

On Thu, Sep 03, 2015 at 10:15:02AM -0600, Jonathan Roelofs wrote:

+kcc, mrs

Ping

On 8/27/15 4:44 PM, Jonathan Roelofs wrote:

The attached patch makes the ubsan tests agnostic to ansi escape codes
in their diagnostic output.


It wouldn't hurt if you explained in detail what is the problem you are
trying to solve and why something that works for most people doesn't
work in
your case.


Hi Jakub,

AFAICT, there are two ways to suppress the emission of color codes from
ubsan's diagnostics:

   1) Set an environment variable.
   2) Make the output stream not a tty.

#1 doesn't seem to be possible in DejaGnu without hacks.


AFAIR it can't be done for remote targets due to DejaGnu design
limitations.


#2 doesn't work in our environment because DejaGnu attempts to make
itself appear to the program under test as if it were a tty. This might
be an artifact of the fact that all of our testing is remote testing
(though that is just blind speculation on my part:


AFAIK that's indeed the case.

Added Max.


I've created another patch to address the same issue in the tsan tests, 
here: [1].



Cheers,

Jon

1: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00556.html

--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded

[PATCH] Make tsan tests less picky about ansi escape codes in diagnostics.

2015-09-08 Thread Jonathan Roelofs


In a similar vein to [1], this does the same for the tsan tests.

These tests also suffer from being overly picky w.r.t. ansi escape codes 
in the runtime diagnostics (which can't be suppressed as explained in [2]).



Tested on a remote x86_64-linux-gnu target.


2015-09-08  Jonathan Roelofs  

* c-c++-common/tsan/race_on_mutex.c: Don't be picky about ansi escape
codes in diagnostics.
* c-c++-common/tsan/free_race2.c: Ditto.
* c-c++-common/tsan/mutexset1.c: Ditto.
* c-c++-common/tsan/fd_pipe_race.c: Ditto.
* c-c++-common/tsan/atomic_stack.c: Ditto.
* c-c++-common/tsan/write_in_reader_lock.c: Ditto.
* c-c++-common/tsan/free_race.c: Ditto.
* c-c++-common/tsan/simple_stack.c: Ditto.


Cheers,

Jon


1: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01755.html
2: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00292.html

--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded

Re: [4/7] Use correct promoted mode sign for result of GIMPLE_CALL

2015-09-08 Thread Jim Wilson

On 09/08/2015 08:39 AM, Jeff Law wrote:
> Is this another instance of the PROMOTE_MODE issue that was raised by
> Jim Wilson a couple months ago?

It looks like a closely related problem.  The one I am looking at has
confusion with a function arg and a local variable as they have
different sign extension promotion rules.  Kugan's is with a function
return value and a local variable as they have different sign extension
promotion rules.

The bug report is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932

The gcc-patches thread spans a month end boundary, so it has multiple heads
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00112.html
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00524.html

Function args and function return values get the same sign extension
treatment when promoted, this is handled by
TARGET_PROMOTE_FUNCTION_MODE. Local variables are treated differently,
via PROMOTE_MODE. I think the function arg/return treatment is wrong,
but changing that is an ABI change which is undesirable.  I suppose we
could change local variables to match function args and return values,
but I think that is moving in the wrong direction.  Though Kugan's new
optimization pass will remove some of the extra unnecessary sign/zero
extensions added by the arm TARGET_PROMOTE_FUNCTION_MODE definition, so
maybe it won't matter enough to worry about any more.

If we can't fix this in the arm backend, then we may need different
middle fixes for these two cases.  I was looking at ways to fix this in
the tree-out-of-ssa pass.  I don't know if this will work for Kugan's
testcase, I'd need time to look at it.

Jim

[PATCH] [graphite] Remove limit_scops

2015-09-08 Thread Aditya Kumar


This patch removes graphite-scop-detection.c:limit_scops function and fix
related issues arising because of that. The functionality limit_scop was added
as an intermediate step to discard the loops which graphite could not
handle. Removing limit_scop required handling of different cases of loops and
surrounding code.  The scop is now larger so most test cases required 'number of
scops detected' to be fixed. By increasing the size of scop we can now optimize
loops which are 'siblings' of each other. This could enable loop fusion on a
number of loops. Since in the graphite framework we mostly want to opimize
loop-nests/adjacent-loops, we now discard scops with less than 2 loops. We
also discard scops without any data references.


Essentially:
 - Remove limite_scops.
 - Only select scops when there are at least two loops (loop nest or, side by 
side).
 - Discard loops without data-refs.
 - Fix test cases.


Passes bootstrap and reg-test.

gcc/ChangeLog:

2015-09-02  Aditya Kumar  
Sebastian Pop  

* graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_ast_expr_id):
Return the parameter if it was saved in corresponding
parameter_rename_map of the region.
(copy_def): Copy def from sese region to the newly created region.
(copy_internal_parameters): Copy all the internal parameters defined
within a region to the newly created region.
(graphite_regenerate_ast_isl): Copy parameters to the new region before
translating isl to gimple.
* graphite-scop-detection.c (graphite_can_represent_loop): Bail out if
  the loop-nest does not have any data-references.
(build_graphite_scops): Create a scop only when there is at least one
loop inside it.
(contains_only_close_phi_nodes): Deleted.
(print_graphite_scop_statistics): Deleted
(print_graphite_statistics): Deleted
(limit_scops): Deleted.
(build_scops): Removed call to limit_scops.
* sese.c (new_sese): Construct.
(free_sese): Destruct.
(sese_add_exit_phis_edge): update_stmt after exit phi edge has been
added.
(set_rename): Pass sese region so that parameters inside the region can
be added to its parameter_rename_map.
(rename_uses): Pass sese region.   
(graphite_copy_stmts_from_block): Do not copy parameters that have been
generated in the header of the scop. For each SSA_NAME in the
parameter_rename_map rename its usage.
(invariant_in_sese_p_rec): Return false if tree t is defined outside
sese region.
(scalar_evolution_in_region): If the tree t is invariant just return t.
* sese.h: Added a parameter renamne map (parameter_rename_map_t) to
  struct sese to keep track of all the parameters which need renaming.
* tree-data-ref.c (loop_nest_has_data_refs): Check if a loop nest has
  any data-refs.
* tree-data-ref.h: Declaration of loop_nest_has_data_refs.


gcc/testsuite/ChangeLog:

2015-09-02  Aditya Kumar  
Sebastian Pop  

* gcc.dg/graphite/block-0.c: Modifed test case to match current output.
* gcc.dg/graphite/block-1.c: Same.
* gcc.dg/graphite/block-5.c: Same.
* gcc.dg/graphite/block-6.c: Same.
* gcc.dg/graphite/interchange-1.c: Same.
* gcc.dg/graphite/interchange-10.c: Same.
* gcc.dg/graphite/interchange-11.c: Same.
* gcc.dg/graphite/interchange-13.c: Same.
* gcc.dg/graphite/interchange-14.c: Same.
* gcc.dg/graphite/interchange-3.c: Same.
* gcc.dg/graphite/interchange-4.c: Same.
* gcc.dg/graphite/interchange-7.c: Same.
* gcc.dg/graphite/interchange-8.c: Same.
* gcc.dg/graphite/interchange-9.c: Same.
* gcc.dg/graphite/isl-codegen-loop-dumping.c: Same.
* gcc.dg/graphite/pr35356-1.c (foo): Same.
* gcc.dg/graphite/pr37485.c: Same.
* gcc.dg/graphite/scop-0.c (int toto): Same.
* gcc.dg/graphite/scop-1.c: Same.
* gcc.dg/graphite/scop-10.c: Same.
* gcc.dg/graphite/scop-11.c: Same.
* gcc.dg/graphite/scop-12.c: Same.
* gcc.dg/graphite/scop-13.c: Same.
* gcc.dg/graphite/scop-16.c: Same.
* gcc.dg/graphite/scop-17.c: Same.
* gcc.dg/graphite/scop-18.c: Same.
* gcc.dg/graphite/scop-2.c: Same.
* gcc.dg/graphite/scop-21.c (int test): Same.
* gcc.dg/graphite/scop-22.c (void foo): Same.
* gcc.dg/graphite/scop-4.c: Same.
* gcc.dg/graphite/scop-5.c: Same.
* gcc.dg/graphite/scop-6.c: Same.
* gcc.dg/graphite/scop-7.c: Same.
* gcc.dg/graphite/scop-8.c: Same.
* gcc.dg/graphite/scop-9.c: Same.
* gcc.dg/graphite/scop-mvt.c (void mvt): Introduced dependency so that
  data-refs remain inside the inner loop.
* gcc.dg/graphite/uns-block-1.c: Modifed test case to match o/p.
* gcc.dg/

Re: [PATCH] Prevent unnecessary recompilation for trivial params.def changes

2015-09-08 Thread Jeff Law


On 09/08/2015 07:21 AM, Tom de Vries wrote:

[ was: Re: [RFC] Prevent unnecessary recompilation for trivial
params.def changes ]

On 08/09/15 14:03, Andreas Schwab wrote:

Tom de Vries  writes:


After a subsequent rebuild I don't see anything being rebuild. So I
don't
observe 'continuous rebuilding'.


What happens when you just touch params-list.h or params.def?
move-if-change will leave the target untouched when unchanged (that's
the whole point of it), so it will remain older than the dependencies.


I could reproduce the problem using these instructions, thanks.

I also found a bit "On the use of stamps" in gcc/Makefile.in, which
explains the problem and how to fix things.

Updated patch accordingly.

OK for trunk if bootstrap succeeds?

Yes.
jeff

Re: [Patch, fortran] PR66681 - Wrong result in assigning this_image() to a complex coarray

2015-09-08 Thread FX

> This is something of a corner case, where gfc_conv_expr comes back
> with a SAVE_EXPR, in the case of complex, scalar, coarray lvalues. The
> first field of the SAVE_EXPR is a perfectly viable expression to
> assign to, so I have taken that. If anybody out there has a better
> solution, please speak up!

If the SAVE_EXPR is a useless one, it should have SAVE_EXPR_RESOLVED_P(…) be 
true. Then you can simply discard it as you’re doing.
If not, we need to created a temp variable, as simply removing the SAVE_EXPR 
will lead to multiple side-effects evalution in some cases otherwise.

But I’m curious as to where the SAVE_EXPR is created. As far as I can tell, all 
SAVE_EXPR in our front-end are created by explicit calls to save_expr(), of 
which there are very few. I don’t see which one is the culprit here :(  But 
creating a SAVE_EXPR for a LHS is definitely not a good idea in the first place.

FX

Re: debug mode symbols cleanup

2015-09-08 Thread François Dumont

On 07/09/2015 13:03, Jonathan Wakely wrote:
> On 05/09/15 22:53 +0200, François Dumont wrote:
>>I remember Paolo saying once that we were not guarantiing any abi
>> compatibility for debug mode. I haven't found any section for
>> unversioned symbols in gnu.ver so I simply uncomment the global export.
>
> There is no section, because all exported symbols are versioned.
>
> It's OK if objects compiled with Debug Mode using one version of GCC
> don't link to objects compiled with Debug Mode using a different
> version of GCC, but you can't change the exported symbols in the DSO.
>
>
> Your changelog doesn't include the changes to config/abi/pre/gnu.ver,
> but those changes are not OK anyway, they fail the abi-check:
>
> FAIL: libstdc++-abi/abi_check
>
>=== libstdc++ Summary ===
>
> # of unexpected failures1
>
>
Sorry, I though policy regarding debug mode symbols was even more relax.
It is not so here is another patch that doesn"t break abi checks.

I eventually made all methods that should not be used deprecated, they
were normally not used explicitely anyway. Their implementation is now
empty. I just needed to add a symbol for the not const _M_message method
which is the correct signature.

François

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 8f9f99a..ac9a66b 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1879,6 +1879,8 @@ GLIBCXX_3.4.22 {
 
 _ZNSt6vectorIPSt12Catalog_info*;
 
+_ZN11__gnu_debug16_Error_formatter10_M_message*;
+
 } GLIBCXX_3.4.21;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/debug/formatter.h b/libstdc++-v3/include/debug/formatter.h
index f0ac694..1826e94 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -133,6 +133,13 @@ namespace __gnu_debug
 
   class _Error_formatter
   {
+// Tags denoting the type of parameter for construction
+struct _Is_iterator { };
+struct _Is_iterator_value_type { };
+struct _Is_sequence { };
+struct _Is_instance { };
+
+  public:
 /// Whether an iterator is constant, mutable, or unknown
 enum _Constness
 {
@@ -154,13 +161,6 @@ namespace __gnu_debug
   __last_state
 };
 
-// Tags denoting the type of parameter for construction
-struct _Is_iterator { };
-struct _Is_iterator_value_type { };
-struct _Is_sequence { };
-struct _Is_instance { };
-
-  public:
 // A parameter that may be referenced by an error message
 struct _Parameter
 {
@@ -376,15 +376,16 @@ namespace __gnu_debug
 
   void
   _M_print_field(const _Error_formatter* __formatter,
-		 const char* __name) const;
+		 const char* __name) const _GLIBCXX_DEPRECATED;
 
   void
-  _M_print_description(const _Error_formatter* __formatter) const;
+  _M_print_description(const _Error_formatter* __formatter)
+	const _GLIBCXX_DEPRECATED;
 };
 
 template
-  const _Error_formatter&
-  _M_iterator(const _Iterator& __it, const char* __name = 0)  const
+  _Error_formatter&
+  _M_iterator(const _Iterator& __it, const char* __name = 0)
   {
 	if (_M_num_parameters < std::size_t(__max_parameters))
 	  _M_parameters[_M_num_parameters++] = _Parameter(__it, __name,
@@ -393,98 +394,99 @@ namespace __gnu_debug
   }
 
 template
-  const _Error_formatter&
+  _Error_formatter&
   _M_iterator_value_type(const _Iterator& __it,
-			 const char* __name = 0)  const
+			 const char* __name = 0)
   {
-	if (_M_num_parameters < std::size_t(__max_parameters))
+	if (_M_num_parameters < __max_parameters)
 	  _M_parameters[_M_num_parameters++] =
 	_Parameter(__it, __name, _Is_iterator_value_type());
 	return *this;
   }
 
-const _Error_formatter&
-_M_integer(long __value, const char* __name = 0) const
+_Error_formatter&
+_M_integer(long __value, const char* __name = 0)
 {
-  if (_M_num_parameters < std::size_t(__max_parameters))
+  if (_M_num_parameters < __max_parameters)
 	_M_parameters[_M_num_parameters++] = _Parameter(__value, __name);
   return *this;
 }
 
-const _Error_formatter&
-_M_string(const char* __value, const char* __name = 0) const
+_Error_formatter&
+_M_string(const char* __value, const char* __name = 0)
 {
-  if (_M_num_parameters < std::size_t(__max_parameters))
+  if (_M_num_parameters < __max_parameters)
 	_M_parameters[_M_num_parameters++] = _Parameter(__value, __name);
   return *this;
 }
 
 template
-  const _Error_formatter&
-  _M_sequence(const _Sequence& __seq, const char* __name = 0) const
+  _Error_formatter&
+  _M_sequence(const _Sequence& __seq, const char* __name = 0)
   {
-	if (_M_num_parameters < std::size_t(__max_parameters))
+	if (_M_num_parameters < __max_parameters)
 	  _M_parameters[_M_num_parameters++] = _Parameter

[Ping] [C++ Patch] PR 53184 ("Unnecessary anonymous namespace warnings")

2015-09-08 Thread Paolo Carlini


Hi,

On 08/24/2015 02:55 PM, Paolo Carlini wrote:

Hi,

today I spent more time on this issue (which is getting duplicates). 
In fact it boils down to two separate issues:

1- Give a name to the warning, to make possible disabling it.
2- Don't talk about "anonymous namespace" when no anonymous namespace 
is involved.


Issue 1- should be rather straightforward, besides maybe the name 
itself: the below has Wsubobject-linkage. Makes sense?


Issue 2- seems slightly less trivial: the below first uses 
no_linkage_check and then falls back to the usual "anonymous 
namespace" message.

Pinging this...

https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01411.html

Thanks,
Paolo.

[hsa] Add support for __builtin_mem{set,cpy}.

2015-09-08 Thread Martin Liška

Hello.

Following patch adds support for HSAIL emission on the pair
of builtins.

Martin
>From 4bb6a85f805dd7cab4f4a2ce7f148803214e5a9e Mon Sep 17 00:00:00 2001
From: mliska 
Date: Fri, 4 Sep 2015 16:29:21 +0200
Subject: [PATCH 4/7] HSA: add support for __builtin_mem{set,cpy}.

gcc/ChangeLog:

2015-09-04  Martin Liska  

	* hsa-gen.c (build_memset_value): New function.
	(gen_hsa_memory_set): New function.
	(gen_hsa_insns_for_call): Add support for __builtin_memcpy and
	__builtin_memset.
---
 gcc/hsa-gen.c | 105 ++
 1 file changed, 105 insertions(+)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index cf43189..7796895 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2123,6 +2123,59 @@ gen_hsa_memory_copy (hsa_bb *hbb, hsa_op_address *target, hsa_op_address *src,
 }
 }
 
+/* Create a memset mask that is created by copying a CONSTANT byte value
+   to an integer of BYTE_SIZE bytes.  */
+
+static unsigned HOST_WIDE_INT
+build_memset_value (unsigned HOST_WIDE_INT constant, unsigned byte_size)
+{
+  HOST_WIDE_INT v = constant;
+
+  for (unsigned i = 1; i < byte_size; i++)
+v |= constant << (8 * i);
+
+  return v;
+}
+
+/* Generate memory set instructions that are going to be used
+   for setting a CONSTANT byte value to TARGET memory of SIZE bytes.  */
+
+static void
+gen_hsa_memory_set (hsa_bb *hbb, hsa_op_address *target,
+		unsigned HOST_WIDE_INT constant,
+		unsigned size)
+{
+  hsa_op_address *addr;
+  hsa_insn_mem *mem;
+
+  unsigned offset = 0;
+
+  while (size)
+{
+  unsigned s;
+  if (size >= 8)
+	s = 8;
+  else if (size >= 4)
+	s = 4;
+  else if (size >= 2)
+	s = 2;
+  else
+	s = 1;
+
+  addr = new hsa_op_address (target->symbol, target->reg,
+ target->imm_offset + offset);
+
+  BrigType16_t t = get_integer_type_by_bytes (s, false);
+  HOST_WIDE_INT c = build_memset_value (constant, s);
+
+  mem = new hsa_insn_mem (BRIG_OPCODE_ST, t, new hsa_op_immed (c, t),
+			  addr);
+  hbb->append_insn (mem);
+  offset += s;
+  size -= s;
+}
+}
+
 /* Generate HSA instructions for a single assignment.  HBB is the basic block
they will be appended to.  SSA_MAP maps gimple SSA names to HSA pseudo
registers.  */
@@ -3642,6 +3695,58 @@ specialop:
 
 	break;
   }
+case BUILT_IN_MEMCPY:
+  {
+	tree byte_size = gimple_call_arg (stmt, 2);
+
+	if (TREE_CODE (byte_size) != INTEGER_CST)
+	  {
+	sorry ("Support for HSA does not implement __builtin_memcpy with "
+		   "a non constant size");
+	return;
+	  }
+
+	tree dst = gimple_call_arg (stmt, 0);
+	tree src = gimple_call_arg (stmt, 1);
+
+	hsa_op_address *dst_addr = gen_hsa_addr (dst, hbb, ssa_map);
+	hsa_op_address *src_addr = gen_hsa_addr (src, hbb, ssa_map);
+	unsigned n = tree_to_uhwi (byte_size);
+
+	gen_hsa_memory_copy (hbb, dst_addr, src_addr, n);
+
+	break;
+  }
+case BUILT_IN_MEMSET:
+  {
+	tree c = gimple_call_arg (stmt, 1);
+
+	if (TREE_CODE (c) != INTEGER_CST)
+	  {
+	sorry ("Support for HSA does not implement __builtin_memset with "
+		   "a non constant byte value that should be written");
+	return;
+	  }
+
+	tree byte_size = gimple_call_arg (stmt, 2);
+
+	if (TREE_CODE (byte_size) != INTEGER_CST)
+	  {
+	sorry ("Support for HSA does not implement __builtin_memset with "
+		   "a non constant size");
+	return;
+	  }
+
+	hsa_op_address *dst_addr = gen_hsa_addr (gimple_call_arg (stmt, 0),
+		 hbb, ssa_map);
+	unsigned n = tree_to_uhwi (byte_size);
+	unsigned HOST_WIDE_INT constant = tree_to_uhwi
+	  (fold_convert (unsigned_char_type_node, c));
+
+	gen_hsa_memory_set (hbb, dst_addr, constant, n);
+
+	break;
+  }
 default:
   sorry ("Support for HSA does not implement calls to builtin %D",
 	 gimple_call_fndecl (stmt));
-- 
2.4.6

[hsa] HSA: reuse get_integer_type_by_bytes

2015-09-08 Thread Martin Liška

Hello.

Following patch is a small clean-up which reuses already existing function.

Martin
>From e592465b575c477f9311eba2e66e9bf1ec4c54fc Mon Sep 17 00:00:00 2001
From: mliska 
Date: Fri, 4 Sep 2015 14:38:13 +0200
Subject: [PATCH 3/7] HSA: reuse get_integer_type_by_bytes.

gcc/ChangeLog:

2015-09-04  Martin Liska  

	* hsa-gen.c (get_integer_type_by_bytes): Return 0 if a type is not
	an integer type.
	(hsa_type_for_scalar_tree_type): Reuse the function.
	(get_integer_tree_type_by_bytes): Remove unused function.
---
 gcc/hsa-gen.c | 145 +-
 1 file changed, 41 insertions(+), 104 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 39c0489..cf43189 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -373,6 +373,44 @@ hsa_get_segment_addr_type (BrigSegment8_t segment)
   gcc_unreachable ();
 }
 
+/* Return integer brig type according to provided SIZE in bytes.  If SIGN
+   is set to true, return signed integer type.  */
+
+static BrigType16_t
+get_integer_type_by_bytes (unsigned size, bool sign)
+{
+  if (sign)
+switch (size)
+  {
+  case 1:
+	return BRIG_TYPE_S8;
+  case 2:
+	return BRIG_TYPE_S16;
+  case 4:
+	return BRIG_TYPE_S32;
+  case 8:
+	return BRIG_TYPE_S64;
+  default:
+	break;
+  }
+  else
+switch (size)
+  {
+  case 1:
+	return BRIG_TYPE_U8;
+  case 2:
+	return BRIG_TYPE_U16;
+  case 4:
+	return BRIG_TYPE_U32;
+  case 8:
+	return BRIG_TYPE_U64;
+  default:
+	break;
+  }
+
+  return 0;
+}
+
 /* Return HSA type for tree TYPE, which has to fit into BrigType16_t.  Pointers
are assumed to use flat addressing.  If min32int is true, always expand
integer types to one that has at least 32 bits.  */
@@ -407,50 +445,10 @@ hsa_type_for_scalar_tree_type (const_tree type, bool min32int)
 }
 
   bsize = tree_to_uhwi (TYPE_SIZE (base));
+  unsigned byte_size = bsize / BITS_PER_UNIT;
   if (INTEGRAL_TYPE_P (base))
-{
-  if (TYPE_UNSIGNED (base))
-	{
-	  switch (bsize)
-	{
-	case 8:
-	  res = BRIG_TYPE_U8;
-	  break;
-	case 16:
-	  res = BRIG_TYPE_U16;
-	  break;
-	case 32:
-	  res = BRIG_TYPE_U32;
-	  break;
-	case 64:
-	  res = BRIG_TYPE_U64;
-	  break;
-	default:
-	  break;
-	}
-	}
-  else
-	{
-	  switch (bsize)
-	{
-	case 8:
-	  res = BRIG_TYPE_S8;
-	  break;
-	case 16:
-	  res = BRIG_TYPE_S16;
-	  break;
-	case 32:
-	  res = BRIG_TYPE_S32;
-	  break;
-	case 64:
-	  res = BRIG_TYPE_S64;
-	  break;
-	default:
-	  break;
-	}
-	}
-}
-  if (SCALAR_FLOAT_TYPE_P (base))
+res = get_integer_type_by_bytes (byte_size, !TYPE_UNSIGNED (base));
+  else if (SCALAR_FLOAT_TYPE_P (base))
 {
   switch (bsize)
 	{
@@ -1961,67 +1959,6 @@ get_bitfield_size (unsigned bitpos, unsigned bitsize)
   return 0;
 }
 
-/* Return integer brig type according to provided SIZE in bytes.  If SIGN
-   is set to true, return signed integer type.  */
-
-static BrigType16_t
-get_integer_type_by_bytes (unsigned size, bool sign)
-{
-  if (sign)
-switch (size)
-  {
-  case 1:
-	return BRIG_TYPE_S8;
-  case 2:
-	return BRIG_TYPE_S16;
-  case 4:
-	return BRIG_TYPE_S32;
-  case 8:
-	return BRIG_TYPE_S64;
-  default:
-	break;
-  }
-  else
-switch (size)
-  {
-  case 1:
-	return BRIG_TYPE_U8;
-  case 2:
-	return BRIG_TYPE_U16;
-  case 4:
-	return BRIG_TYPE_U32;
-  case 8:
-	return BRIG_TYPE_U64;
-  default:
-	break;
-  }
-
-  gcc_unreachable ();
-  return 0;
-}
-
-/* Return unsigned integer tree type wite SIZE bytes.  */
-
-static tree
-get_integer_tree_type_by_bytes (unsigned size)
-{
-  switch (size)
-{
-case 1:
-  return char_type_node;
-case 2:
-  return uint16_type_node;
-case 4:
-  return uint32_type_node;
-case 8:
-  return uint64_type_node;
-default:
-  gcc_unreachable ();
-}
-
-  return NULL_TREE;
-}
-
 /* Generate HSAIL instructions storing into memory.  LHS is the destination of
the store, SRC is the source operand.  Add instructions to HBB, use SSA_MAP
for HSA SSA lookup.  */
-- 
2.4.6

[hsa] Use newly added hsa_op_immed ctor.

2015-09-08 Thread Martin Liška

Hello.

Following patch uses new hsa_op_immed ctor, which is mainly used in
HSAIL emission of instructions that generate kernel from kernel dispatching.

Martin 
>From 7cf50d2831ec8b161858ecdbfb19b6287230384d Mon Sep 17 00:00:00 2001
From: mliska 
Date: Fri, 4 Sep 2015 14:19:25 +0200
Subject: [PATCH 2/7] HSA: use newly added hsa_op_immed ctor.

gcc/ChangeLog:

2015-09-04  Martin Liska  

	* hsa-gen.c (gen_hsa_insns_for_bitfield_load): Use newly added
	ctor for hsa_op_immed.
	(gen_hsa_insns_for_store): Likewise.
	(gen_hsa_binary_operation): Use newly added function set_type.
	(gen_hsa_insns_for_operation_assignment): Use newly added
	ctor for hsa_op_immed.
	(gen_hsa_insns_for_kernel_call): Likewise.
---
 gcc/hsa-gen.c | 57 +++--
 1 file changed, 23 insertions(+), 34 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index ed1e121..39c0489 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -1792,8 +1792,7 @@ gen_hsa_insns_for_bitfield_load (hsa_op_reg *dest, hsa_op_address *addr,
   if (left_shift)
 {
   hsa_op_reg *value_reg_2 = new hsa_op_reg (dest->type);
-  hsa_op_immed *c = new hsa_op_immed (build_int_cstu (unsigned_type_node,
-			  left_shift));
+  hsa_op_immed *c = new hsa_op_immed (left_shift, BRIG_TYPE_U32);
 
   hsa_insn_basic *lshift = new hsa_insn_basic
 	(3, BRIG_OPCODE_SHL, value_reg_2->type, value_reg_2, value_reg, c);
@@ -1806,8 +1805,7 @@ gen_hsa_insns_for_bitfield_load (hsa_op_reg *dest, hsa_op_address *addr,
   if (right_shift)
 {
   hsa_op_reg *value_reg_2 = new hsa_op_reg (dest->type);
-  hsa_op_immed *c = new hsa_op_immed (build_int_cstu (unsigned_type_node,
-			  right_shift));
+  hsa_op_immed *c = new hsa_op_immed (right_shift, BRIG_TYPE_U32);
 
   hsa_insn_basic *rshift = new hsa_insn_basic
 	(3, BRIG_OPCODE_SHR, value_reg_2->type, value_reg_2, value_reg, c);
@@ -2074,8 +2072,7 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb,
   hsa_op_reg *cleared_reg = new hsa_op_reg (mem_type);
 
   hsa_op_immed *c = new hsa_op_immed
-	(build_int_cstu (get_integer_tree_type_by_bytes
-			 (type_bitsize / BITS_PER_UNIT), mask));
+	(mask, get_integer_type_by_bytes (type_bitsize / BITS_PER_UNIT, false));
 
   hsa_insn_basic *clearing = new hsa_insn_basic
 	(3, BRIG_OPCODE_AND, mem_type, cleared_reg, value_reg, c);
@@ -2091,8 +2088,7 @@ gen_hsa_insns_for_store (tree lhs, hsa_op_base *src, hsa_bb *hbb,
   if (bitpos)
 	{
 	  hsa_op_reg *shifted_value_reg = new hsa_op_reg (mem_type);
-
-	  c = new hsa_op_immed (build_int_cstu (unsigned_type_node, bitpos));
+	  c = new hsa_op_immed (bitpos, BRIG_TYPE_U32);
 
 	  hsa_insn_basic *basic = new hsa_insn_basic
 	(3, BRIG_OPCODE_SHL, mem_type, shifted_value_reg, new_value_reg, c);
@@ -2402,8 +2398,7 @@ gen_hsa_binary_operation (int opcode, hsa_op_reg *dest,
   && is_a  (op2))
 {
   hsa_op_immed *i = dyn_cast  (op2);
-  op2 = new hsa_op_immed
-	(build_int_cstu (unsigned_type_node, TREE_INT_CST_LOW (i->tree_value)));
+  i->set_type (BRIG_TYPE_U32);
 }
 
   hsa_insn_basic *insn = new hsa_insn_basic (3, opcode, dest->type, dest,
@@ -2521,17 +2516,13 @@ gen_hsa_insns_for_operation_assignment (gimple assign, hsa_bb *hbb,
 
 	hsa_op_with_type *shift2 = NULL;
 	if (TREE_CODE (rhs2) == INTEGER_CST)
-	  {
-	shift2 = new hsa_op_immed
-	  (build_int_cstu (unsigned_type_node,
-			   bitsize - tree_to_uhwi (rhs2)));
-	  }
+	  shift2 = new hsa_op_immed (bitsize - tree_to_uhwi (rhs2),
+ BRIG_TYPE_U32);
 	else if (TREE_CODE (rhs2) == SSA_NAME)
 	  {
 	hsa_op_reg *s = hsa_reg_for_gimple_ssa (rhs2, ssa_map);
 	hsa_op_reg *d = new hsa_op_reg (s->type);
-	hsa_op_immed *size_imm = new hsa_op_immed
-	  (build_int_cstu (unsigned_type_node, bitsize));
+	hsa_op_immed *size_imm = new hsa_op_immed (bitsize, BRIG_TYPE_U32);
 
 	insn = new hsa_insn_basic (3, BRIG_OPCODE_SUB, d->type,
    d, s, size_imm);
@@ -2921,7 +2912,7 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   addr = new hsa_op_address (shadow_reg, offsetof (hsa_kernel_dispatch, debug));
 
   /* Create a magic number that is going to be printed by libgomp.  */
-  c = new hsa_op_immed (build_int_cstu (uint64_type_node, 1000 + index));
+  c = new hsa_op_immed (1000 + index, BRIG_TYPE_U64);
   mem = new hsa_insn_mem (BRIG_OPCODE_ST, BRIG_TYPE_U64, c, addr);
   hbb->append_insn (mem);
 
@@ -2968,7 +2959,7 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Store to synchronization signal.  */
   hbb->append_insn (new hsa_insn_comment ("store 1 to signal handle"));
 
-  c = new hsa_op_immed (build_int_cstu (uint64_type_node, 1));
+  c = new hsa_op_immed (1, BRIG_TYPE_U64);
 
   hsa_insn_signal *signal= new hsa_insn_signal (2, BRIG_OPCODE_SIGNALNORET,
 		BRIG_ATOMIC_ST, BRIG_TYPE_B64,
@@ -3002,7 +2993,7 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   /* Get a write index to

[hsa] Introduce a new ctor for hsa_op_immed.

2015-09-08 Thread Martin Liška

Hello.

Following patch adds a new ctor for hsa_op_immed class.

Martin
>From dc72ba0001f5b98c97da9ea5c03fccf2340e893d Mon Sep 17 00:00:00 2001
From: mliska 
Date: Fri, 4 Sep 2015 13:33:08 +0200
Subject: [PATCH 1/7] HSA: introduce a new ctor for hsa_op_immed.

gcc/ChangeLog:

2015-09-04  Martin Liska  

	* hsa-brig.c (emit_immediate_scalar_to_buffer): Add a new argument.
	(emit_immediate_scalar_to_data_section): Likewise.
	(hsa_op_immed::emit_to_buffer): New function.
	(emit_immediate_operand): Emit prepared byte buffer.
	* hsa-dump.c (dump_hsa_immed): Handle integer values.
	* hsa-gen.c (hsa_deinit_data_for_cfun): Release allocated byte buffers.
	(hsa_op_immed::hsa_op_immed): New.
	(hsa_op_immed::~hsa_op_immed): Likewise.
	(hsa_op_immed::set_type): New function.
	* hsa.h (union hsa_bytes): New
---
 gcc/hsa-brig.c | 96 ++
 gcc/hsa-dump.c | 17 ++-
 gcc/hsa-gen.c  | 67 ++--
 gcc/hsa.h  | 24 ++-
 4 files changed, 154 insertions(+), 50 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 728430c..0951039 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -785,7 +785,7 @@ regtype_for_type (BrigType16_t t)
 /* Return the length of the BRIG type TYPE that is going to be streamed out as
an immediate constant (so it must not be B1).  */
 
-static unsigned
+unsigned
 hsa_get_imm_brig_type_len (BrigType16_t type)
 {
   BrigType16_t base_type = type & BRIG_TYPE_BASE_MASK;
@@ -833,20 +833,14 @@ hsa_get_imm_brig_type_len (BrigType16_t type)
 }
 }
 
-/* Emit one scalar VALUE to the data BRIG section.  If NEED_LEN is not equal to
-   zero, shrink or extend the value to NEED_LEN bytes.  Return how many bytes
-   were written.  */
+/* Emit one scalar VALUE to the buffer DATA intended for BRIG emission.
+   If NEED_LEN is not equal to zero, shrink or extend the value
+   to NEED_LEN bytes.  Return how many bytes were written.  */
 
 static int
-emit_immediate_scalar_to_data_section (tree value, unsigned need_len)
+emit_immediate_scalar_to_buffer (tree value, char *data, unsigned need_len)
 {
-  union
-  {
-uint8_t b8;
-uint16_t b16;
-uint32_t b32;
-uint64_t b64;
-  } bytes;
+  union hsa_bytes bytes;
 
   memset (&bytes, 0, sizeof (bytes));
   tree type = TREE_TYPE (value);
@@ -905,10 +899,51 @@ emit_immediate_scalar_to_data_section (tree value, unsigned need_len)
   else
 len = need_len;
 
-  brig_data.add (&bytes, len);
+  memcpy (data, &bytes, len);
   return len;
 }
 
+void
+hsa_op_immed::emit_to_buffer (tree value)
+{
+  unsigned total_len = brig_repr_size;
+  brig_repr = XNEWVEC (char, total_len);
+  char *p = brig_repr;
+
+  if (TREE_CODE (value) == VECTOR_CST)
+{
+  int i, num = VECTOR_CST_NELTS (value);
+  for (i = 0; i < num; i++)
+	{
+	  unsigned actual;
+	  actual = emit_immediate_scalar_to_buffer
+	(VECTOR_CST_ELT (value, i), p, 0);
+	  total_len -= actual;
+	  p += actual;
+	}
+  /* Vectors should have the exact size.  */
+  gcc_assert (total_len == 0);
+}
+  else if (TREE_CODE (value) == STRING_CST)
+memcpy (brig_repr, TREE_STRING_POINTER (value), TREE_STRING_LENGTH (value));
+  else if (TREE_CODE (value) == COMPLEX_CST)
+{
+  gcc_assert (total_len % 2 == 0);
+  unsigned actual;
+  actual = emit_immediate_scalar_to_buffer
+	(TREE_REALPART (value), p, total_len / 2);
+
+  gcc_assert (actual == total_len / 2);
+  p += actual;
+
+  actual = emit_immediate_scalar_to_buffer
+	(TREE_IMAGPART (value), p, total_len / 2);
+  gcc_assert (actual == total_len / 2);
+}
+  else
+emit_immediate_scalar_to_buffer (value, p, total_len);
+}
+
 /* Emit an immediate BRIG operand IMM.  The BRIG type of the immediate might
have been massaged to comply with various HSA/BRIG type requirements, so the
only important aspect of that is the length (because HSAIL might expect
@@ -919,46 +954,15 @@ static void
 emit_immediate_operand (hsa_op_immed *imm)
 {
   struct BrigOperandConstantBytes out;
-  unsigned total_len = hsa_get_imm_brig_type_len (imm->type);
-
-  if (TREE_CODE (imm->value) == STRING_CST)
-total_len = TREE_STRING_LENGTH (imm->value);
 
   memset (&out, 0, sizeof (out));
   out.base.byteCount = htole16 (sizeof (out));
   out.base.kind = htole16 (BRIG_KIND_OPERAND_CONSTANT_BYTES);
-  uint32_t byteCount = htole32 (total_len);
+  uint32_t byteCount = htole32 (imm->brig_repr_size);
   out.type = htole16 (imm->type);
   out.bytes = htole32 (brig_data.add (&byteCount, sizeof (byteCount)));
   brig_operand.add (&out, sizeof(out));
-
-  if (TREE_CODE (imm->value) == VECTOR_CST)
-{
-  int i, num = VECTOR_CST_NELTS (imm->value);
-  for (i = 0; i < num; i++)
-	{
-	  unsigned actual;
-	  actual = emit_immediate_scalar_to_data_section
-	(VECTOR_CST_ELT (imm->value, i), 0);
-	  total_len -= actual;
-	}
-  /* Vectors should have the exact size.  */
-  gcc_assert (total_len == 0);
-

[gomp4] force global locks for nvptx targets

2015-09-08 Thread Cesar Philippidis

This patch forces GOACC_LOCK to use locks in global memory regardless if
the lock us for a worker or a gang. We were using a shared memory for
worker locks, but we ran into an issue with that would sporadically
involve deadlocks in worker reductions. We're still investigating that
issue, but for the time being, global locks appear to work albeit with a
lock contention penalty.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-09-08  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (force_global_lock): New global variable.
	(nvptx_expand_oacc_lock): Use it to workaround a shared memory lock
	problem.
	(nvptx_xform_lock): Likewise.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 51f2893..c8f6f5c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -134,6 +134,9 @@ static const unsigned lock_level[] = {BARRIER_GLOBAL, BARRIER_SHARED};
 static GTY(()) rtx lock_syms[LOCK_MAX];
 static bool lock_used[LOCK_MAX];
 
+/* FIXME: Temporary workaround for worker locks.  */
+static bool force_global_locks = true;
+
 /* Size of buffer needed for worker reductions.  This has to be
disjoing from the worker broadcast array, as both may be live
concurrently.  */
@@ -1245,6 +1248,7 @@ nvptx_expand_oacc_lock (rtx src, int direction)
   rtx pat;
   
   kind = INTVAL (src) == GOMP_DIM_GANG ? LOCK_GLOBAL : LOCK_SHARED;
+  kind = force_global_locks ? LOCK_GLOBAL : kind;
   lock_used[kind] = true;
 
   rtx mem = gen_rtx_MEM (SImode, lock_syms[kind]);
@@ -3740,7 +3744,7 @@ nvptx_xform_lock (gimple stmt, const int *ARG_UNUSED (dims), unsigned ifn_code)
   return mode > GOMP_DIM_WORKER;
 
 case IFN_GOACC_LOCK_INIT:
-  return mode != GOMP_DIM_WORKER;
+  return force_global_lock || mode != GOMP_DIM_WORKER;
 
 default: gcc_unreachable();
 }

[PATCH, committed] Trivial typo fix in pretty-print.h

2015-09-08 Thread David Malcolm

"pinter" -> "printer"

Committed to trunk as r227562, under the "obvious" rule.

gcc/ChangeLog:
* pretty-print.h (printer_fn): Fix typo in comment.
---
 gcc/pretty-print.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/pretty-print.h b/gcc/pretty-print.h
index 6e8a300..36d4e37 100644
--- a/gcc/pretty-print.h
+++ b/gcc/pretty-print.h
@@ -187,7 +187,7 @@ struct pp_wrapping_mode_t
 /* Get or set the wrapping mode as a single entity.  */
 #define pp_wrapping_mode(PP) (PP)->wrapping
 
-/* The type of a hook that formats client-specific data onto a pretty_pinter.
+/* The type of a hook that formats client-specific data onto a pretty_printer.
A client-supplied formatter returns true if everything goes well,
otherwise it returns false.  */
 typedef bool (*printer_fn) (pretty_printer *, text_info *, const char *,
-- 
1.8.5.3

Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-08 Thread François Dumont

On 07/09/2015 20:27, Jonathan Wakely wrote:
> This patch adds the "debug mode lite" we've been talking about, by
> changing __glibcxx_assert to be activated by _GLIBCXX_ASSERTIONS
> instead of _GLIBCXX_DEBUG (and making the latter imply the former).
>
> _GLIBCXX_ASSERTIONS is already used in Parallel Mode for enabling
> optional assertions (although some of them are O(n) and so we might
> want to change them to use another macro like _GLIBCXX_DEBUG or
> _GLIBCXX_PARALLEL_ASSERTIONS instead).
>
> With the change to define __glibcxx_assert() without Debug Mode we can
> change most uses of _GLIBCXX_DEBUG_ASSERT to simply __glibcxx_assert,
> so that the assertion is done when _GLIBCXX_ASSERTIONS is defined (not
> only in Debug Mode).
>
> I haven't added any new assertions yet, this just converts the
> lightweight Debug Mode checks, but the next step will be to add
> additional assertions to the (normal mode) containers. The google
> branches contain several good examples of checks to add.
>
> François, what do you think of this approach?
>
>
Very good approach, I will start moving light checks from _GLIBCXX_DEBUG
implementation to normal one then.

Thanks

François

[PATCH] Implement GOMP_OFFLOAD_unload_image in intelmic plugin

2015-09-08 Thread Ilya Verbin

Hi!

This patch supports unloading of target images from the device.
Unfortunately __offload_unregister_image requires the whole descriptor for
unloading, which must contain target code inside, for this reason the plugin
keeps descriptors for all offloaded images in memory.
Also the patch removes useless variable names, intended for debug purposes.
Regtested with make check-target-libgomp and using a dlopen/dlclose test.
OK for trunk?


liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (struct TargetImageDesc): New.
(ImgDescMap): New typedef.
(image_descriptors): New static var.
(init): Allocate image_descriptors.
(offload): Remove vars2 argument.  Pass NULL to __offload_offload1
instead of vars2.
(unregister_main_image): New static function.
(register_main_image): Call unregister_main_image at exit.
(GOMP_OFFLOAD_init_device): Print device number, fix offload args.
(GOMP_OFFLOAD_fini_device): Likewise.
(get_target_table): Remove vd1g and vd2g, don't pass them to offload.
(offload_image): Remove declaration of the struct TargetImage.
Free table.  Insert new descriptor into image_descriptors.
(GOMP_OFFLOAD_unload_image): Call __offload_unregister_image, free
the corresponding descriptor, and remove it from address_table and
image_descriptors.
(GOMP_OFFLOAD_alloc): Print device number, remove vd1g.
(GOMP_OFFLOAD_free): Likewise.
(GOMP_OFFLOAD_host2dev): Print device number, remove vd1g and vd2g.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_run): Print device number, remove vd1g.
* plugin/offload_target_main.cpp (__offload_target_table_p1): Remove
vd2, don't pass it to __offload_target_enter.
(__offload_target_table_p2): Likewise.
(__offload_target_alloc): Likewise.
(__offload_target_free): Likewise.
(__offload_target_host2tgt_p1): Likewise.
(__offload_target_host2tgt_p2): Likewise.
(__offload_target_tgt2host_p1): Likewise.
(__offload_target_tgt2host_p2): Likewise.
(__offload_target_run): Likewise.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index fde7d9e..a2260e5 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -64,6 +64,17 @@ typedef std::vector DevAddrVect;
 /* Addresses for all images and all devices.  */
 typedef std::map ImgDevAddrMap;
 
+/* Image descriptor needed by __offload_[un]register_image.  */
+struct TargetImageDesc {
+  int64_t size;
+  /* 10 characters is enough for max int value.  */
+  char name[sizeof ("lib00.so")];
+  char data[];
+} __attribute__ ((packed));
+
+/* Image descriptors, indexed by a pointer obtained from libgomp.  */
+typedef std::map ImgDescMap;
+
 
 /* Total number of available devices.  */
 static int num_devices;
@@ -75,6 +86,9 @@ static int num_images;
second key is number of device.  Contains a vector of pointer pairs.  */
 static ImgDevAddrMap *address_table;
 
+/* Descriptors of all images, registered in liboffloadmic.  */
+static ImgDescMap *image_descriptors;
+
 /* Thread-safe registration of the main image.  */
 static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
 
@@ -150,6 +164,7 @@ init (void)
 
 out:
   address_table = new ImgDevAddrMap;
+  image_descriptors = new ImgDescMap;
   num_devices = _Offload_number_of_devices ();
 }
 
@@ -186,11 +201,11 @@ GOMP_OFFLOAD_get_num_devices (void)
 
 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2)
+int num_vars, VarDesc *vars)
 {
   OFFLOAD ofld = __offload_target_acquire1 (&device, file, line);
   if (ofld)
-__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL);
+__offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, NULL);
   else
 {
   fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line);
@@ -199,9 +214,23 @@ offload (const char *file, uint64_t line, int device, 
const char *name,
 }
 
 static void
+unregister_main_image ()
+{
+  __offload_unregister_image (&main_target_image);
+}
+
+static void
 register_main_image ()
 {
+  /* Do not check the return value, because old versions of liboffloadmic did
+ not have return values.  */
   __offload_register_image (&main_target_image);
+
+  if (atexit (unregister_main_image) != 0)
+{
+  fprintf (stderr, "%s: atexit failed\n", __FILE__);
+  exit (1);
+}
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -209,16 +238,15 @@ register_main_image ()
 extern "C" void
 GOMP_OFFLOAD_init_device (int device)
 {
-  TRACE ("");
+  TRACE ("(device = %d)", device);
   pthread_once (&main_image_is_registered, register_main_image);
-  offload (__FILE__, __LINE__

C++ PATCH for c++/67041 (auto variable template and lambda)

2015-09-08 Thread Jason Merrill

If the lambda mangling scope is a variable, we don't want to call 
tsubst_copy on it, as that will call mark_used.  Use plain tsubst instead.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 60cf545572ad4214d70e99808f53312db592e341
Author: Jason Merrill 
Date:   Tue Sep 8 09:47:18 2015 -0400

	PR c++/67350
	* pt.c (tsubst_copy_and_build): Handle variables like functions.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index ec32c5a..16ed1b5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16320,15 +16320,14 @@ tsubst_copy_and_build (tree t,
 	LAMBDA_EXPR_MUTABLE_P (r) = LAMBDA_EXPR_MUTABLE_P (t);
 	LAMBDA_EXPR_DISCRIMINATOR (r)
 	  = (LAMBDA_EXPR_DISCRIMINATOR (t));
-	/* For a function scope, we want to use tsubst so that we don't
-	   complain about referring to an auto function before its return
-	   type has been deduced.  Otherwise, we want to use tsubst_copy so
-	   that we look up the existing field/parameter/variable rather
-	   than build a new one.  */
 	tree scope = LAMBDA_EXPR_EXTRA_SCOPE (t);
-	if (scope && TREE_CODE (scope) == FUNCTION_DECL)
+	if (!scope)
+	  /* No substitution needed.  */;
+	else if (VAR_OR_FUNCTION_DECL_P (scope))
+	  /* For a function or variable scope, we want to use tsubst so that we
+	 don't complain about referring to an auto before deduction.  */
 	  scope = tsubst (scope, args, complain, in_decl);
-	else if (scope && TREE_CODE (scope) == PARM_DECL)
+	else if (TREE_CODE (scope) == PARM_DECL)
 	  {
 	/* Look up the parameter we want directly, as tsubst_copy
 	   doesn't do what we need.  */
@@ -16341,8 +16340,12 @@ tsubst_copy_and_build (tree t,
 	if (DECL_CONTEXT (scope) == NULL_TREE)
 	  DECL_CONTEXT (scope) = fn;
 	  }
-	else
+	else if (TREE_CODE (scope) == FIELD_DECL)
+	  /* For a field, use tsubst_copy so that we look up the existing field
+	 rather than build a new one.  */
 	  scope = RECUR (scope);
+	else
+	  gcc_unreachable ();
 	LAMBDA_EXPR_EXTRA_SCOPE (r) = scope;
 	LAMBDA_EXPR_RETURN_TYPE (r)
 	  = tsubst (LAMBDA_EXPR_RETURN_TYPE (t), args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-var-templ1.C b/gcc/testsuite/g++.dg/cpp1y/lambda-var-templ1.C
new file mode 100644
index 000..adc1af1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-var-templ1.C
@@ -0,0 +1,11 @@
+// PR c++/67350
+// { dg-do compile { target c++14 } }
+
+template
+auto test = [](){
+return T{};
+};
+
+int main() {
+test();
+}

Re: [PATCH] Fix PR64078

2015-09-08 Thread Jeff Law


On 09/07/2015 07:46 AM, Bernd Edlinger wrote:

Hi,

On Mon, 7 Sep 2015 12:07:00, Marek Polacek wrote:


On Sun, Sep 06, 2015 at 07:21:13PM +0200, Bernd Edlinger wrote:

Hi,

we observed sporadic failures of the following two test cases (see PR64078):
c-c++-common/ubsan/object-size-9.c and c-c++-common/ubsan/object-size-10.c

For object-size-9.c this happens in a reproducible way when -fpic option is 
used:
If that option is used, it is slightly less desirable to inline the functions, 
but if an explicit
"inline" is added, the function is still in-lined, even if -fpic is used.


So if we rely on the function being inlined I think it would be better to add
the always_inline attribute.




I tried to replace inline by __attribute__((always_inline)), but unfortunately 
it does not work:

FAIL: c-c++-common/ubsan/object-size-9.c   -O2  (test for excess errors)
Excess errors:
/home/ed/gnu/gcc-trunk/gcc/testsuite/c-c++-common/ubsan/object-size-9.c:47:1: 
warning: always_inline function might not be inlinable [-Wattributes]
/home/ed/gnu/gcc-trunk/gcc/testsuite/c-c++-common/ubsan/object-size-9.c:32:1: 
warning: always_inline function might not be inlinable [-Wattributes]
/home/ed/gnu/gcc-trunk/gcc/testsuite/c-c++-common/ubsan/object-size-9.c:47:1: 
error: inlining failed in call to always_inline 'C f3(int)': function body can 
be overwritten at link time
/home/ed/gnu/gcc-trunk/gcc/testsuite/c-c++-common/ubsan/object-size-9.c:94:10: 
error: called from here

the diagnostics are just a little different when the function is inlined or not.
Can't you attack this problem by making sure the function is not 
interposable?


Jeff

Re: [wwwdocs] Document some gcc-6 changes

2015-09-08 Thread Manuel López-Ibáñez

On 8 September 2015 at 19:26, Martin Sebor  wrote:
> On 09/08/2015 11:14 AM, Manuel López-Ibáñez wrote:
>>  a negative value.
>> -A new command-line option -Wshift-overflow has been
>> -   added for the C and C++ compilers, which warns about left shift
>> +-Wshift-overflow warns about left shift
>>  overflows.  -Wshift-overflow=2 also warns about
>>  left-shifting 1 into the sign bit.  This warning is enabled by
>>  default.
>
>
> While unrelated to your change, this might be a good opportunity
> to also tweak the last sentence and clarify that "this warning"
> refers to -Wshift-overflow and not to -Wshift-overflow=2.

Implemented by transposing the last two sentences.

Cheers,

Manuel.

Re: [wwwdocs] Document some gcc-6 changes

2015-09-08 Thread Martin Sebor


On 09/08/2015 11:14 AM, Manuel López-Ibáñez wrote:

I also took the liberty of rewriting the list of new command-line
options to be less repetitive.


...

 a negative value.
-A new command-line option -Wshift-overflow has been
-   added for the C and C++ compilers, which warns about left shift
+-Wshift-overflow warns about left shift
 overflows.  -Wshift-overflow=2 also warns about
 left-shifting 1 into the sign bit.  This warning is enabled by
 default.


While unrelated to your change, this might be a good opportunity
to also tweak the last sentence and clarify that "this warning"
refers to -Wshift-overflow and not to -Wshift-overflow=2.

Martin

Re: [wwwdocs] Document some gcc-6 changes

2015-09-08 Thread Gerald Pfeifer

On Tue, 8 Sep 2015, Manuel López-Ibáñez wrote:
> I also took the liberty of rewriting the list of new command-line
> options to be less repetitive.

Nice!  (Aka "Thanks for doing this, please go ahead and commit.")

There is one question, but that is not introduced by your change:
Does the language now suggest that -Wshift-overflow=2 is enabled
by default, and hence shifting 1 into the sign bit?  Let's address
this separately, though, your change is fine as is, Manuel!

> +-Wshift-overflow warns about left shift
> overflows.  -Wshift-overflow=2 also warns about
> left-shifting 1 into the sign bit.  This warning is enabled by
> default.

David, what do you think?

Gerald

[wwwdocs] Document some gcc-6 changes

2015-09-08 Thread Manuel López-Ibáñez

I also took the liberty of rewriting the list of new command-line
options to be less repetitive.

OK?


Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.26
diff -u -r1.26 changes.html
--- htdocs/gcc-6/changes.html   4 Sep 2015 09:33:28 -   1.26
+++ htdocs/gcc-6/changes.html   8 Sep 2015 17:12:23 -
@@ -43,18 +43,26 @@
   oldval __attribute__ ((deprecated ("too old")))
 };
 
-A new command-line option -Wshift-negative-value has been
-   added for the C and C++ compilers, which warns about left shifting
+Initial support for precise diagnostic locations within strings:
+
+format-strings.c:3:14: warning:
field width specifier '*' expects a matching 'int'
argument [-Wformat=]
+   printf("%*d");
+^
+
+
+New command-line options have been added for the C and C++ compilers:
+
+-Wshift-negative-value  warns about left shifting
a negative value.
-A new command-line option -Wshift-overflow has been
-   added for the C and C++ compilers, which warns about left shift
+-Wshift-overflow warns about left shift
overflows.  -Wshift-overflow=2 also warns about
left-shifting 1 into the sign bit.  This warning is enabled by
default.
-A new command-line option -Wtautological-compare has been
-   added for the C and C++ compilers, which warns if a self-comparison
+-Wtautological-compare warns if a self-comparison
always evaluates to true or false.  This warning is enabled by
-Wall.
+-Wnull-dereference warns if the compiler detects
paths that trigger erroneous or undefined behavior due to
dereferencing a null pointer. This option is only active when
-fdelete-null-pointer-checks is active, which is enabled
by optimizations in most targets. The precision of the warnings
depends on the optimization options used.
+
   

 C

libbacktrace patch committed: Graceful fallback if out of memory

2015-09-08 Thread Ian Lance Taylor

I've committed this libbacktrace patch to mainline to do a graceful
fallback if no memory can be allocated.  In that case we print out the
PC addresses without trying to resolve file/line information.  This is
imperfect but better than the earlier behaviour of producing a series
of error messages.  Tested with libbacktrace and Go testsuites.
Committed to mainline.

Ian

2015-09-08  Ian Lance Taylor  

PR other/67457
* backtrace.c: #include "internal.h".
(struct backtrace_data): Add can_alloc field.
(unwind): If can_alloc is false, don't try to get file/line
information.
(backtrace_full): Set can_alloc field in bdata.
* alloc.c (backtrace_alloc): Don't call error_callback if it is
NULL.
* mmap.c (backtrace_alloc): Likewise.
* internal.h: Update comments for backtrace_alloc and
backtrace_free.
Index: alloc.c
===
--- alloc.c (revision 227528)
+++ alloc.c (working copy)
@@ -44,7 +44,8 @@ POSSIBILITY OF SUCH DAMAGE.  */
backtrace functions may not be safely invoked from a signal
handler.  */
 
-/* Allocate memory like malloc.  */
+/* Allocate memory like malloc.  If ERROR_CALLBACK is NULL, don't
+   report an error.  */
 
 void *
 backtrace_alloc (struct backtrace_state *state ATTRIBUTE_UNUSED,
@@ -55,7 +56,10 @@ backtrace_alloc (struct backtrace_state
 
   ret = malloc (size);
   if (ret == NULL)
-error_callback (data, "malloc", errno);
+{
+  if (error_callback)
+   error_callback (data, "malloc", errno);
+}
   return ret;
 }
 
Index: backtrace.c
===
--- backtrace.c (revision 227528)
+++ backtrace.c (working copy)
@@ -34,6 +34,7 @@ POSSIBILITY OF SUCH DAMAGE.  */
 
 #include "unwind.h"
 #include "backtrace.h"
+#include "internal.h"
 
 /* The main backtrace_full routine.  */
 
@@ -53,6 +54,8 @@ struct backtrace_data
   void *data;
   /* Value to return from backtrace_full.  */
   int ret;
+  /* Whether there is any memory available.  */
+  int can_alloc;
 };
 
 /* Unwind library callback routine.  This is passed to
@@ -80,8 +83,11 @@ unwind (struct _Unwind_Context *context,
   if (!ip_before_insn)
 --pc;
 
-  bdata->ret = backtrace_pcinfo (bdata->state, pc, bdata->callback,
-bdata->error_callback, bdata->data);
+  if (!bdata->can_alloc)
+bdata->ret = bdata->callback (bdata->data, pc, NULL, 0, NULL);
+  else
+bdata->ret = backtrace_pcinfo (bdata->state, pc, bdata->callback,
+  bdata->error_callback, bdata->data);
   if (bdata->ret != 0)
 return _URC_END_OF_STACK;
 
@@ -96,6 +102,7 @@ backtrace_full (struct backtrace_state *
backtrace_error_callback error_callback, void *data)
 {
   struct backtrace_data bdata;
+  void *p;
 
   bdata.skip = skip + 1;
   bdata.state = state;
@@ -103,6 +110,18 @@ backtrace_full (struct backtrace_state *
   bdata.error_callback = error_callback;
   bdata.data = data;
   bdata.ret = 0;
+
+  /* If we can't allocate any memory at all, don't try to produce
+ file/line information.  */
+  p = backtrace_alloc (state, 4096, NULL, NULL);
+  if (p == NULL)
+bdata.can_alloc = 0;
+  else
+{
+  backtrace_free (state, p, 4096, NULL, NULL);
+  bdata.can_alloc = 1;
+}
+
   _Unwind_Backtrace (unwind, &bdata);
   return bdata.ret;
 }
Index: internal.h
===
--- internal.h  (revision 227528)
+++ internal.h  (working copy)
@@ -201,13 +201,15 @@ extern int backtrace_close (int descript
 extern void backtrace_qsort (void *base, size_t count, size_t size,
 int (*compar) (const void *, const void *));
 
-/* Allocate memory.  This is like malloc.  */
+/* Allocate memory.  This is like malloc.  If ERROR_CALLBACK is NULL,
+   this does not report an error, it just returns NULL.  */
 
 extern void *backtrace_alloc (struct backtrace_state *state, size_t size,
  backtrace_error_callback error_callback,
  void *data) ATTRIBUTE_MALLOC;
 
-/* Free memory allocated by backtrace_alloc.  */
+/* Free memory allocated by backtrace_alloc.  If ERROR_CALLBACK is
+   NULL, this does not report an error.  */
 
 extern void backtrace_free (struct backtrace_state *state, void *mem,
size_t size,
Index: mmap.c
===
--- mmap.c  (revision 227529)
+++ mmap.c  (working copy)
@@ -77,7 +77,8 @@ backtrace_free_locked (struct backtrace_
 }
 }
 
-/* Allocate memory like malloc.  */
+/* Allocate memory like malloc.  If ERROR_CALLBACK is NULL, don't
+   report an error.  */
 
 void *
 backtrace_alloc (struct backtrace_state *state,
@@ -140,7 +141,10 @@ backtrace_alloc (struct backtrace_state
   page = mmap (NULL, asksize, PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
   if (

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Do not ifcvt complex blocks if the else block is empty

2015-09-08 Thread Kyrill Tkachov



On 08/09/15 10:26, Rainer Orth wrote:

Hi Kyrill,


PR rtl-optimization/67481 is a testsuite regression on sparc-solaris that
Rainer reported. I haven't tested
that this patch fixes that, but I suspect that the root cause is the
same. Rainer, could you please
check that this fixes the regression for you?

I've now checked that with your patch the regression went away indeed,
using a limited non-bootstrap build on sparc-sun-solaris2.10.  Next I'll
run a full bootstrap to check there are no other issues.


After some more benchmarking I've noticed that this patch is overly restrictive
in some cases. I have a prototype patch that fixes the regressions and does not
restrict if-conversion too much.

I need to do some more testing, and hope to post it in due time.

Sorry for the noise.

Kyrill



Thanks.
Rainer

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread FX

> Not a big issue, but slightly better if (O_CLOEXEC>>32) != 0 is also true.  
> See, if AIX should ever define this to a sensible value, the above would 
> disappear the feature.  However, if they did, then this expression should 
> then be false.

Sounds good.
This being a libbacktrace patch, you need Ian to approve it, it’s his area.

FX

Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-08 Thread Ilya Enkovich

On 21 Aug 10:38, Jeff Law wrote:
> On 08/21/2015 07:44 AM, Ilya Enkovich wrote:
> >>Our of curiosity, what does LLVM do here in terms of costing
> >>models?
> >
> >Unfortunately I have no idea where and how LLVM does this
> >optimization. Will try to find out. For now I just try to follow a
> >common sense and don't hurt any benchmark performance.
> Sounds wise.  No reason we can't look at the overall heuristics they're
> using for when this optimization ought to fire.
> 
> >>
> >>From a correctness standpoint, one of the interesting tests would
> >>be to turn off all tuning -- ie, always convert if it's supposed to
> >>be possible.  Then throw as much code as possible at it and see if
> >>anything breaks.  Also a good time to instrument so that you can
> >>then build testcases from real-world code.
> >
> >I did such testing previously for SPEC.
> Excellent to hear.
> 
> >  Now I also tried it for
> >bootstrap and found issue with EH edges.  Fixed it in a new version.
> 
> 
> When you track down the bootstrap failure, you might consider adding a
> test for whatever went wrong to the suite if it's feasible.

I added several tests including EH one.

> 
> 
> >
> >Thanks a lot for your review! Here is an updated version. Bootstrap
> >is OK. Regression testing shows a fail in gcc.dg/lower-subreg-1.c. It
> >happens because ior:DI is a subject for a new optimization and is not
> >lowered by subreg pass. I see test had multiple modifications to be
> >disabled on different targets. Will it actually be tested anywhere if
> >I disable it for i386? Probably remove the test?
> 
> I'd twiddle the test to turn off your new pass.  Which leads to the
> comment that your pass needs to be selectable via a -m argument.

I added an option to control new pass but it doesn't affect regressed test.  
Test is compiled using -O and new pass doesn't even work. Regression is caused 
by my new patterns which make lowering of 64-bit IOR on subreg pass 
unnecessary.  Test tries to check we split register on subreg1 for 64-bit IOR 
and fails.

> 
> Jeff
> 

Here is an updated version with tests and an option added.

Thanks,
Ilya
--
gcc/

2015-09-08  Ilya Enkovich  

* config/i386/i386.c: Include dbgcnt.h.
(has_non_address_hard_reg): New.
(convertible_comparison_p): New.
(scalar_to_vector_candidate_p): New.
(remove_non_convertible_regs): New.
(scalar_chain): New.
(scalar_chain::scalar_chain): New.
(scalar_chain::~scalar_chain): New.
(scalar_chain::add_to_queue): New.
(scalar_chain::mark_dual_mode_def): New.
(scalar_chain::analyze_register_chain): New.
(scalar_chain::add_insn): New.
(scalar_chain::build): New.
(scalar_chain::compute_convert_gain): New.
(scalar_chain::replace_with_subreg): New.
(scalar_chain::replace_with_subreg_in_insn): New.
(scalar_chain::emit_conversion_insns): New.
(scalar_chain::make_vector_copies): New.
(scalar_chain::convert_reg): New.
(scalar_chain::convert_op): New.
(scalar_chain::convert_insn): New.
(scalar_chain::convert): New.
(convert_scalars_to_vector): New.
(pass_data_stv): New.
(pass_stv): New.
(make_pass_stv): New.
(ix86_option_override): Created and register stv pass.
(flag_opts): Add -mstv.
(ix86_option_override_internal): Likewise.
* config/i386/i386.md (SWIM1248x): New.
(*movdi_internal): Remove '*' modifier for xmm to mem alternative.
(and3): Use SWIM1248x iterator instead of SWIM.
(*anddi3_doubleword): New.
(*zext_doubleword): New.
(*zextqi_doubleword): New.
(3): Use SWIM1248x iterator instead of SWIM.
(*di3_doubleword): New.
* config/i386/i386.opt (mstv): New.
* dbgcnt.def (stv_conversion): New.

gcc/testsuite/

2015-09-08  Ilya Enkovich  

* gcc.target/i386/pr65105-1.c: New.
* gcc.target/i386/pr65105-2.c: New.
* gcc.target/i386/pr65105-3.c: New.
* gcc.target/i386/pr65105-4.C: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d78f4e7..ceb0b06 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "dbgcnt.h"

 /* This file should be included last.  */
 #include "target-def.h"
@@ -2600,6 +2601,908 @@ rest_of_handle_insert_vzeroupper (void)
   return 0;
 }

+/* Return 1 if INSN uses or defines a hard register.
+   Hard register uses in a memory address are ignored.
+   Clobbers and flags definitions are ignored.  */
+
+static bool
+has_non_address_hard_reg (rtx_insn *insn)
+{
+  df_ref ref;
+  FOR_EACH_INSN_DEF (ref, insn)
+if (HARD_REGISTER_P (DF_REF_REAL_REG (ref))
+   && !DF_REF_FLAGS_IS_SET (ref, DF_REF_MUST_CLOBBER)
+   && DF_REF_REGNO (ref) != FLAGS_REG)
+

Re: [4/7] Use correct promoted mode sign for result of GIMPLE_CALL

2015-09-08 Thread Jeff Law


On 09/07/2015 03:27 PM, Kugan wrote:



On 07/09/15 23:10, Michael Matz wrote:

Hi,

On Mon, 7 Sep 2015, Kugan wrote:


For the following testcase (compiling with -O1; -O2 works fine), we have
a stmt with stm_code SSA_NAME (_7 = _ 6) and for which _6 is defined by
a GIMPLE_CALL. In this case, we are using wrong SUNREG promoted mode
resulting in wrong code.


And why is that?


Simple SSA_NAME copes are generally optimized
but when they are not, we can end up using the wrong promoted mode.
Attached patch fixes when we have one copy.


I think it's the wrong place to fixing up.  Where does the wrong use come
from?  At that place it should be fixed, not after the fact.


   _6 = bar5 (-10);
   ...
   _7 = _6;
   _3 = (long unsigned int) _6;
   ...
   if (_3 != l5.0_4)


There is no use of '_7' in this snippet so I don't see the relevance of
SUBREG_PROMOTED_MODE on it.

But whatever you do, please make sure you include the testcase for the
problem as a regression test:



Thanks for the review.

This happens in ARM where definition of PROMOTED_MODE also changes the
sign. I am attaching the cfgdump for the test-case. This is part of the
existing test-case thats why I didn't include it as part of this patch.
Is this another instance of the PROMOTE_MODE issue that was raised by 
Jim Wilson a couple months ago?


jeff

Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-08 Thread Jonathan Wakely


On 08/09/15 17:00 +0200, Florian Weimer wrote:

On 09/07/2015 09:59 PM, Jonathan Wakely wrote:

On 07/09/15 21:31 +0200, Florian Weimer wrote:

* Jonathan Wakely:


This patch adds the "debug mode lite" we've been talking about, by
changing __glibcxx_assert to be activated by _GLIBCXX_ASSERTIONS
instead of _GLIBCXX_DEBUG (and making the latter imply the former).


Interesting.  Is this mode ABI-compatible with the default mode?


Yes, that's the main reason I want to make this change.


Good.  Past discussions of similar proposals indicated that these
#ifdefs are still ODR violations.


Well technically even using assert() in an inline function or template
is an ODR violation unless every file including the function uses the
same value of NDEBUG.

I tend to ignore that technicality :-)


Should _FORTIFY_SOURCE imply _GLIBCXX_ASSERTIONS?


Yes, I think it should.

You can read my notes on these "debug mode lite" checks at
https://gcc.gnu.org/wiki/LibstdcxxDebugMode (including "This should be
discussed with Glibc and security teams" and I specifically had you in
mind when I wrote that :-)


I doubt we can achieve the complexity goals in all cases.  I expect that

 for (int i = 0; i < 1; ++i) {
   vector[i];
 }

is optimized away in default mode, but with _GLIBCXX_ASSERTIONS, it is not.

The last time I looked at this, GCC was unable to move bounds checks out
of loops.


Maybe we don't want to make _FORTIFY_SOURCE imply _GLIBCXX_ASSERTIONS
then, so they can be enabled independently. We don't have to make that
decision right away.

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread Mike Stump

On Sep 8, 2015, at 6:53 AM, David Edelsohn  wrote:
> On Tue, Sep 8, 2015 at 9:51 AM, FX  wrote:
>>> #define _FCLOEXEC   0x0010L
>>> #define O_CLOEXEC   _FCLOEXEC   /* sets FD_CLOEXEC on open  */
>> 
>> That’s weird, and definitely an AIX bug: 
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
> 
> Welcome to AIX :-/
> 
>> How does that even work? open() takes int as second arg.
> 
> No one else uses it? ;-)
> 
> The following kluge works:
> 
> Index: posix.c
> ===
> --- posix.c (revision 227528)
> +++ posix.c (working copy)
> @@ -45,6 +45,10 @@
> #define O_BINARY 0
> #endif
> 
> +#ifdef _AIX
> +#undef O_CLOEXEC
> +#endif
> +

Not a big issue, but slightly better if (O_CLOEXEC>>32) != 0 is also true.  
See, if AIX should ever define this to a sensible value, the above would 
disappear the feature.  However, if they did, then this expression should then 
be false.

Re: [PATCH] Fix seq_cost prototype to use signed int

2015-09-08 Thread Jeff Law


On 09/08/2015 09:17 AM, Jiong Wang wrote:


Jeff Law writes:


On 09/08/2015 06:17 AM, Jiong Wang wrote:


All other cost helper functions are using signed int to hold cost
while seq_cost is using unsigned int.

This fix this. bootstrap OK on x86.

OK for trunk?

2015-09-08  Jiong Wang  

gcc/
* rtl.h (seq_cost): Change return type from "unsigned" to "int".
* rtlanal.c (seq_cost): Likewise.

Why not go the other way and start making things unsigned -- for a cost
function like this, unsigned seems more natural to me.


I was using "(unsigned) -1" to represent maximum cost, then later known
there is MAX_COST macro and found it's actually signed type, and quick
search shows most of the code in gcc/rtlanal.c, are using "int", so I am
changing seq_cost which seems to be the only cost helper using unsigned.
Understood.  But the natural type should be unsigned as far as I can 
tell.  The fact that we're using signed types all over the place is 
probably a historical wart.


So I'd start by changing the MAX_COST macro to an unsigned type, then 
fix any fallout from that.  That should be a patch unto itself.  Then we 
can have additional follow-up patches to fix the types of costing 
related variables, parameters & return values.


Jeff

Re: [PATCH] Fix seq_cost prototype to use signed int

2015-09-08 Thread Jiong Wang

Jeff Law writes:

> On 09/08/2015 06:17 AM, Jiong Wang wrote:
>>
>> All other cost helper functions are using signed int to hold cost
>> while seq_cost is using unsigned int.
>>
>> This fix this. bootstrap OK on x86.
>>
>> OK for trunk?
>>
>> 2015-09-08  Jiong Wang  
>>
>> gcc/
>>* rtl.h (seq_cost): Change return type from "unsigned" to "int".
>>* rtlanal.c (seq_cost): Likewise.
> Why not go the other way and start making things unsigned -- for a cost 
> function like this, unsigned seems more natural to me.

I was using "(unsigned) -1" to represent maximum cost, then later known
there is MAX_COST macro and found it's actually signed type, and quick
search shows most of the code in gcc/rtlanal.c, are using "int", so I am
changing seq_cost which seems to be the only cost helper using unsigned.

And looks like there is no consistent type to hold cost value across
gcc, some are using "unsigned" while others are using "int", I guess
they are surviving because gcc disable -Wconversion by default.

-- 
Regards,
Jiong

Re: [PATCH, Darwin] Some driver TLC (improve support for the '-arch' flag).

2015-09-08 Thread Mike Stump

On Sep 7, 2015, at 11:24 AM, Iain Sandoe  wrote:
> For some Darwin compilers, "-arch " can be used (a) in place of, but to 
> indicate the same as, a multilib flag like "-m32" and (b) multiple times to 
> indicate that the User wants a FAT object with multiple arch slices.

> OK for trunk?

Ok.

[AArch64] Handle const address in aarch64_print_operand

2015-09-08 Thread Jiong Wang


I came across this issue when I was cleaning up some releveant TLS code.

Currently, unlike other TLS modes, TLS local executable addresses haven't
been through the same aarch64 address split logic which force the address
into base + offset. This actually is good.

The address assigned to TLS variable may be in the form like the
following:

  (const:DI (plus:DI (symbol_ref:DI ("a") [flags 0x2a]  )
(const_int 48 [0x30])))

then we can generate the following assembly code to encode the extra
constant "48" in the rela relocation info, instead of one extra add
instructions.

add x0, x0, #:tprel_hi12:a+48, lsl #12
add x0, x0, #:tprel_lo12_nc:a+48

instead of

add x0, x0, #:tprel_hi12:a, lsl #12
add x0, x0, #:tprel_lo12_nc:a
add x0, x0, 48

But this is only supported when the assembly output part of the pattern
are using some operand modifer like 'L', 'A', 'G' etc as they will
invoke "output_addr_const" which accept address wrapped by
"const". While if you use place holder only, for example change "%L1"
into ":tprel_lo12":%1", then this will trigger unreachable error, as
aarch64_print_operand doesn't support "const" wrapper when there is no
output modifier which is wrong.

This problem does not existed on other backends like arm, mips because
they use a "default" case to support all remaining situations which
including address wrapped by "const".

ok for trunk?

2015-09-08  Jiong Wang  

gcc/
  * config/aarch64/aarch64.c (aarch64_print_operand): Add "CONST"
  support.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a76832f..5ba0215 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4567,6 +4567,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  output_address (XEXP (x, 0));
 	  break;

+	case CONST:
 	case LABEL_REF:
 	case SYMBOL_REF:
 	  output_addr_const (asm_out_file, x);

[AArch64] Delete aarch64_symbol_context which is not used

2015-09-08 Thread Jiong Wang


The concept of aarch64_symbol_context is not used in AArch64, this patch
remove it and all relevant code.

ok for trunk?

2015-09-08  Jiong. Wang  

gcc/
  * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Delete.
  * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Likewise.
  (aarch64_cannot_force_const_mem): Likewise.
  (aarch64_classify_address): Likewise.
  (aarch64_classify_symbolic_expression): Likewise.
  (aarch64_print_operand): Likewise.
  (aarch64_classify_symbol): Likewise.
  (aarch64_mov_operand_p): Likewise.
  * config/aarch64/predicates.md (aarch64_valid_symref): Likewise.
  (aarch64_mov_operand): Likewise.
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a834027..8035ae1 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -24,18 +24,6 @@

 #include "input.h"

-/*
-  SYMBOL_CONTEXT_ADR
-  The symbol is used in a load-address operation.
-  SYMBOL_CONTEXT_MEM
-  The symbol is used as the address in a MEM.
- */
-enum aarch64_symbol_context
-{
-  SYMBOL_CONTEXT_MEM,
-  SYMBOL_CONTEXT_ADR
-};
-
 /* SYMBOL_SMALL_ABSOLUTE: Generate symbol accesses through
high and lo relocs that calculate the base address using a PC
relative reloc.
@@ -262,8 +250,7 @@ HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
 int aarch64_branch_cost (bool, bool);
-enum aarch64_symbol_type
-aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context);
+enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
 bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
@@ -282,8 +269,7 @@ bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_modes_tieable_p (machine_mode mode1,
 			  machine_mode mode2);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
-bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context,
-			machine_mode);
+bool aarch64_mov_operand_p (rtx, machine_mode);
 int aarch64_simd_attr_length_rglist (enum machine_mode);
 rtx aarch64_reverse_mask (enum machine_mode);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
@@ -308,8 +294,7 @@ const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 const char *aarch64_rewrite_selected_cpu (const char *name);

-enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx,
-		  enum aarch64_symbol_context);
+enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3cd5196..a76832f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1735,7 +1735,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	 before we start classifying the symbol.  */
   split_const (imm, &base, &offset);

-  sty = aarch64_classify_symbol (base, offset, SYMBOL_CONTEXT_ADR);
+  sty = aarch64_classify_symbol (base, offset);
   switch (sty)
 	{
 	case SYMBOL_FORCE_TO_MEM:
@@ -3441,7 +3441,7 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
   split_const (x, &base, &offset);
   if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
 {
-  if (aarch64_classify_symbol (base, offset, SYMBOL_CONTEXT_ADR)
+  if (aarch64_classify_symbol (base, offset)
 	  != SYMBOL_FORCE_TO_MEM)
 	return true;
   else
@@ -3886,8 +3886,7 @@ aarch64_classify_address (struct aarch64_address_info *info,
 	  rtx sym, offs;
 	  split_const (info->offset, &sym, &offs);
 	  if (GET_CODE (sym) == SYMBOL_REF
-	  && (aarch64_classify_symbol (sym, offs, SYMBOL_CONTEXT_MEM)
-		  == SYMBOL_SMALL_ABSOLUTE))
+	  && (aarch64_classify_symbol (sym, offs) == SYMBOL_SMALL_ABSOLUTE))
 	{
 	  /* The symbol and offset must be aligned to the access size.  */
 	  unsigned int align;
@@ -3933,17 +3932,15 @@ aarch64_symbolic_address_p (rtx x)
   return GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF;
 }

-/* Classify the base of symbolic expression X, given that X appears in
-   context CONTEXT.  */
+/* Classify the base of symbolic expression X.  */

 enum aarch64_symbol_type
-aarch64_classify_symbolic_expression (rtx x,
-  enum aarch64_symbol_context context)
+aarch64_classify_symbolic_expression (rtx x)
 {
   rtx offset;

   split_const (x, &x, &offset);
-  return aarch64_classify_symbol (x, offset, context);
+  return aarch64_classify_symbol (x, offset);
 }


@@ -4631,7 +4628,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   if (GET_CODE (x) == HIGH)
 	x = XEXP (x, 0);

-  switch (aarch64_classify_symbolic_expression

Re: [PATCH] Fix seq_cost prototype to use signed int

2015-09-08 Thread Jeff Law


On 09/08/2015 06:17 AM, Jiong Wang wrote:


All other cost helper functions are using signed int to hold cost
while seq_cost is using unsigned int.

This fix this. bootstrap OK on x86.

OK for trunk?

2015-09-08  Jiong Wang  

gcc/
   * rtl.h (seq_cost): Change return type from "unsigned" to "int".
   * rtlanal.c (seq_cost): Likewise.
Why not go the other way and start making things unsigned -- for a cost 
function like this, unsigned seems more natural to me.


jeff

Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-08 Thread Florian Weimer

On 09/07/2015 09:59 PM, Jonathan Wakely wrote:
> On 07/09/15 21:31 +0200, Florian Weimer wrote:
>> * Jonathan Wakely:
>>
>>> This patch adds the "debug mode lite" we've been talking about, by
>>> changing __glibcxx_assert to be activated by _GLIBCXX_ASSERTIONS
>>> instead of _GLIBCXX_DEBUG (and making the latter imply the former).
>>
>> Interesting.  Is this mode ABI-compatible with the default mode?
> 
> Yes, that's the main reason I want to make this change.

Good.  Past discussions of similar proposals indicated that these
#ifdefs are still ODR violations.

>> Should _FORTIFY_SOURCE imply _GLIBCXX_ASSERTIONS?
> 
> Yes, I think it should.
> 
> You can read my notes on these "debug mode lite" checks at
> https://gcc.gnu.org/wiki/LibstdcxxDebugMode (including "This should be
> discussed with Glibc and security teams" and I specifically had you in
> mind when I wrote that :-)

I doubt we can achieve the complexity goals in all cases.  I expect that

  for (int i = 0; i < 1; ++i) {
vector[i];
  }

is optimized away in default mode, but with _GLIBCXX_ASSERTIONS, it is not.

The last time I looked at this, GCC was unable to move bounds checks out
of loops.

-- 
Florian Weimer / Red Hat Product Security

[Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-08 Thread James Greenhalgh


Hi,

RTL "noce" ifcvt will currently give up if the branches it is trying to
make conditional are too complicated. One of the conditions for "too
complicated" is that the branch sets more than one value.

One common idiom that this misses is something like:

  int d = a[i];
  int e = b[i];
  if (d > e)
std::swap (d, e)
  [...]

Which is currently going to generate something like

  compare (d, e)
  branch.le L1
tmp = d;
d = e;
e = tmp;
  L1:

In the case that this is an unpredictable branch, we can do better
with:

  compare (d, e)
  d1 = if_then_else (le, e, d)
  e1 = if_then_else (le, d, e)
  d = d1
  e = e1

Register allocation will eliminate the two trailing unconditional
assignments, and we get a neater sequence.

This patch introduces this logic to the RTL if convert passes, catching
cases where a basic block does nothing other than multiple SETs. This
helps both with the std::swap idiom above, and with pathological cases
where tree passes create new basic blocks to resolve Phi nodes, which
contain only set instructions and end up unprecdictable.

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!

Elsewhere the ifcvt cost usage is pretty lacking - esentially counting
the number of instructions which will be if-converted and comparing that
against the magic number "2". I could follow this lead and just count
the number of moves I would convert, then compare that to the branch cost,
but this feels... wrong. This makes it pretty tough to choose a "good"
number for TARGET_BRANCH_COST. This isn't helped now that higher branch
costs can mean pulling expensive instructions in to the main execution
stream.

I've picked a fairly straightforward cost model for this patch, trying to
compare the cost of each conditional move, as calculated with rtx_costs,
against COSTS_N_INSNS (branch_cost). This essentially kills the
optimisation for any target with conditional-move cost > 1. Personally, I
consider that a pretty horrible bug in this patch - but I couldn't think of
anything better to try.

As you might expect, this triggers all over the place when
TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build,
I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set
to 4. Performance is also good on AArch64 on a range of microbenchmarks
and larger workloads (after playing with the branch costs). I didn't see
any performance regression on x86_64, as you would expect given that the
cost models preclude x86_64 targets from ever hitting this optimisation.

Bootstrapped and tested on x86_64 and AArch64 with no issues, and
bootstrapped and tested with the cost model turned off, to have some
confidence that we will continue to do the right thing if any targets do
up their branch costs and start using this code.

No testcase provided, as currently I don't know of targets with a high
enough branch cost to actually trigger the optimisation.

OK?

Thanks,
James

---
gcc/

2015-09-07  James Greenhalgh  

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
(noce_convert_multiple_sets): Likewise.
(noce_process_if_block): Call them.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..059bd89 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+ if (x > y)
+   { i = a; j = b; k = c; }
+
+   Make it:
+
+ tmp_i = (x > y) ? a : i;
+ tmp_j = (x > y) ? b : j;
+ tmp_k = (x > y) ? c : k;
+ i = tmp_i; <- Should be cleaned up
+ j = tmp_j; <- Likewise.
+ k = tmp_k; <- Likewise.
+
+   Look for special cases such as use of temporary registers (for
+   example in a swap idiom).
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  unsigned int cost = 0;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+ overlap.  */
+

Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.

2015-09-08 Thread David Edelsohn

On Thu, Sep 3, 2015 at 5:58 PM, Peter Bergner  wrote:
> While debugging a transaction lock elision issue, we noticed that the
> compiler was moving some loads/stores outside of the transaction body,
> because the HTM instructions were not marked as memory barriers, which
> is bad.  Looking deeper, I also noticed that neither Intel and S390
> have their HTM instructions marked as memory barriers either, although
> Andi did submit a patch last year:
>
> https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02999.html
>
> Richi and r~ both said the memory barrier should be part of the patterns:
>
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02235.html
>
> The following patch uses that suggestion by adding memory barriers to
> all of our HTM instructions, which fixes the issue.  I have also added
> a __TM_FENCE__ macro users can test to see whether the compiler treats
> the HTM instructions as memory barriers or not, in case they want to
> explicitly add memory barriers to their code when using older compilers.
>
> On a glibc thread discussing this issue, Torvald also asked that I add
> documention describing the memory consistency semantics the HTM instructions
> should have, so I added a blurb about that.  Torvald, is the text below
> what you were looking for?
>
> This has passed bootstrap/regtesting on powerpc64le-linux.  Is this ok
> for mainline?
>
> Since this is a correctness issue, I'd like to eventually backport this to
> the release branches.  Is that ok once I've verified bootstrap/regtesting
> on them?
>
> Once this is committed, I can take a stab at fixing Intel and S390 similarly,
> unless someone beats me to it (hint hint :).  I'd need help testing it though,
> since I don't have access to Intel or S390 hardware that support HTM.
>
> Peter
>
> * config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
> (tabort, tabortc, tabortci, tbegin, tcheck, tend,
> trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
> (*tabort, *tabortc, *tabortci, *tbegin, *tcheck, *tend,
> *trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
> (tabort, tabortc, tabortci, tbegin, tcheck, tend,
> trechkpt, treclaim, tsr, ttest): New define_expands.
> * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
> __TM_FENCE__ for htm.
> * doc/extend.texi: Update documentation for htm builtins.

This is okay.

Torvald should comment on the descriptive text.

Thanks, David

Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-08 Thread Jonathan Wakely

On 08/09/15 15:14 +0200, Michael Matz wrote:

Hi,

On Mon, 7 Sep 2015, Jonathan Wakely wrote:

> Interesting.  Is this mode ABI-compatible with the default mode?

Yes, that's the main reason I want to make this change.

> Should _FORTIFY_SOURCE imply _GLIBCXX_ASSERTIONS?

Yes, I think it should.

Then at least those assertions that lie in a different big-O complexity
class have to be moved away from _GLIBCXX_ASSERTIONS (as hinted in your
initial mail).  Some distros build packages with _FORTIFY_SOURCE, and
while additional asserts seem acceptable, going from constant to linear
(or the like) seems not.

Agreed.

AFAIK no distros have anything that depends on the libstdc++ Parallel
Mode (which enables O(n) checks under _GLIBCXX_ASSERTIONS), but as I
suggested I think those should be moved to another macro anyway.

Anything currently enabled by _GLIBCXX_DEBUG that changes the big-O
complexity is not touched by my patch, so wouldn't be enabled by
_GLIBCXX_ASSERTIONS. That's by design.

The existing _GLIBCXX_DEBUG Debug Mode is very important, and far more
powerful than the checks enabled by _GLIBCXX_ASSERTIONS will ever be,
but the ABI impact and the violations of the standard's complexity
guarantees mean that there are places it can't be used. For
_GLIBCXX_ASSERTIONS to be worthwhile it has to be usable in almost any
situations. In particular it should be reasonable to build entire
distros with those checks enabled.

Re: [PATCH, libiberty] Fix PR63758 by using the _NSGetEnviron() API on Darwin.

2015-09-08 Thread Ian Lance Taylor

On Tue, Sep 8, 2015 at 7:20 AM, Iain Sandoe  wrote:
>
>> This seems likely to break cross-compilers to Darwin that do not have
>> the system libraries available.  I guess I don't care about that if
>> you don't.
>
> I do care about it, but I'm not visualising the case...
>
> AFAICS, when built as a host component for a cross to Darwin from non-Darwin, 
> environ would be declared as **environ as usual.
>
> If an implementation includes a compiler targetting Darwin that defines 
> __APPLE__ but doesn't provide _NSGetEnviron in its libc, then isn't it broken 
> anyway?

I'm talking about the case of building a cross-compiler where the
system libraries are not available.  This is sometimes done as a first
step toward building a full cross-compiler.

Ian

Re: [PATCH, libiberty] Fix PR63758 by using the _NSGetEnviron() API on Darwin.

2015-09-08 Thread Iain Sandoe

On 8 Sep 2015, at 15:00, Ian Lance Taylor wrote:

> On Mon, Sep 7, 2015 at 8:23 AM, Iain Sandoe  wrote:
>> 
>> include/
>> 
>>Roland McGrath  
>> 
>>PR other/63758
>>* environ.h: New file.
>> 
>> libiberty/
>> 
>>Roland McGrath  
>>Iain Sandoe  
>> 
>>PR other/63758
>>* pex-unix.c: Obtain the environment interface from settings in 
>> environ.h
>>rather than in-line code.  Update copyright date.
>>* setenv.c: Likewise.
>>* xmalloc.c: Likewise.
> 
> 
> This seems likely to break cross-compilers to Darwin that do not have
> the system libraries available.  I guess I don't care about that if
> you don't.

I do care about it, but I'm not visualising the case...

AFAICS, when built as a host component for a cross to Darwin from non-Darwin, 
environ would be declared as **environ as usual.

If an implementation includes a compiler targetting Darwin that defines 
__APPLE__ but doesn't provide _NSGetEnviron in its libc, then isn't it broken 
anyway?

Iain

Re: [PATCH 15/15][ARM] Update sourcebuild.texi with testsuite/effective-target hooks

2015-09-08 Thread Alan Lawrence

Original message here: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02363.html

On 28/07/15 12:27, Alan Lawrence wrote:
> This documents the change to arm_neon_fp16_ok in the first patch; the addition
> of arm_neon_fp16_hw_ok in the last patch; and corrects a cross-reference.
>
> (I tried using an @ref instead of "Implies previous." but the page ref looked
> very out-of-place in PDF when I am referring to the previous item in the 
> list!)

The change to arm_neon_fp16_ok was committed in r227033 (as per 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02470.html).

I've now updated this patch with the change to arm_neon_fp16_hw (rather than
 _hw_ok), as per https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00488.html.

OK for trunk?

Thanks, Alan

gcc/ChangeLog:

* doc/sourcebuild.texi (arm_neon_fp16): Correct cross-reference.
(arm_neon_fp16_ok): Document adding of -mfp16-format=ieee flag.
(arm_neon_fp16_hw): New.
---
 gcc/doc/sourcebuild.texi | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 7aa9c9d..5dc7c81 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1549,7 +1549,12 @@ options.  Some multilibs may be incompatible with these 
options.
 @item arm_neon_fp16_ok
 @anchor{arm_neon_fp16_ok}
 ARM Target supports @code{-mfpu=neon-fp16 -mfloat-abi=softfp} or compatible
-options.  Some multilibs may be incompatible with these options.
+options, including @code{-mfp16-format=ieee} if necessary to obtain the
+@code{__fp16} type.  Some multilibs may be incompatible with these options.
+
+@item arm_neon_fp16_hw
+Test system supports executing Neon half-precision float instructions.
+(Implies previous.)
 
 @item arm_thumb1_ok
 ARM target generates Thumb-1 code for @code{-mthumb}.
@@ -2035,7 +2040,7 @@ keyword}.
 @item arm_neon_fp16
 NEON and half-precision floating point support.  Only ARM targets
 support this feature, and only then in certain modes; see
-the @ref{arm_neon_ok,,arm_neon_fp16_ok effective target keyword}.
+the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 
 @item arm_vfp3
 arm vfp3 floating point support; see
-- 
1.9.1

Re: [PATCH 15/15][ARM] Update sourcebuild.texi with testsuite/effective-target hooks

2015-09-08 Thread Kyrill Tkachov



On 08/09/15 15:00, Alan Lawrence wrote:

Original message here: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02363.html

On 28/07/15 12:27, Alan Lawrence wrote:

This documents the change to arm_neon_fp16_ok in the first patch; the addition
of arm_neon_fp16_hw_ok in the last patch; and corrects a cross-reference.

(I tried using an @ref instead of "Implies previous." but the page ref looked
very out-of-place in PDF when I am referring to the previous item in the list!)

The change to arm_neon_fp16_ok was committed in r227033 (as per 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02470.html).

I've now updated this patch with the change to arm_neon_fp16_hw (rather than
  _hw_ok), as per https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00488.html.

OK for trunk?


Ok.
Thanks,
Kyrill



Thanks, Alan

gcc/ChangeLog:

* doc/sourcebuild.texi (arm_neon_fp16): Correct cross-reference.
(arm_neon_fp16_ok): Document adding of -mfp16-format=ieee flag.
(arm_neon_fp16_hw): New.
---
  gcc/doc/sourcebuild.texi | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 7aa9c9d..5dc7c81 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1549,7 +1549,12 @@ options.  Some multilibs may be incompatible with these 
options.
  @item arm_neon_fp16_ok
  @anchor{arm_neon_fp16_ok}
  ARM Target supports @code{-mfpu=neon-fp16 -mfloat-abi=softfp} or compatible
-options.  Some multilibs may be incompatible with these options.
+options, including @code{-mfp16-format=ieee} if necessary to obtain the
+@code{__fp16} type.  Some multilibs may be incompatible with these options.
+
+@item arm_neon_fp16_hw
+Test system supports executing Neon half-precision float instructions.
+(Implies previous.)
  
  @item arm_thumb1_ok

  ARM target generates Thumb-1 code for @code{-mthumb}.
@@ -2035,7 +2040,7 @@ keyword}.
  @item arm_neon_fp16
  NEON and half-precision floating point support.  Only ARM targets
  support this feature, and only then in certain modes; see
-the @ref{arm_neon_ok,,arm_neon_fp16_ok effective target keyword}.
+the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
  
  @item arm_vfp3

  arm vfp3 floating point support; see

Re: [PATCH, libiberty] Fix PR63758 by using the _NSGetEnviron() API on Darwin.

2015-09-08 Thread Ian Lance Taylor

On Mon, Sep 7, 2015 at 8:23 AM, Iain Sandoe  wrote:
>
> include/
>
> Roland McGrath  
>
> PR other/63758
> * environ.h: New file.
>
> libiberty/
>
> Roland McGrath  
> Iain Sandoe  
>
> PR other/63758
> * pex-unix.c: Obtain the environment interface from settings in 
> environ.h
> rather than in-line code.  Update copyright date.
> * setenv.c: Likewise.
> * xmalloc.c: Likewise.

This seems likely to break cross-compilers to Darwin that do not have
the system libraries available.  I guess I don't care about that if
you don't.

This is OK for mainline.  Thanks.

Ian

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread David Edelsohn

On Tue, Sep 8, 2015 at 9:51 AM, FX  wrote:
>> #define _FCLOEXEC   0x0010L
>> #define O_CLOEXEC   _FCLOEXEC   /* sets FD_CLOEXEC on open  */
>
> That’s weird, and definitely an AIX bug: 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html

Welcome to AIX :-/

> How does that even work? open() takes int as second arg.

No one else uses it? ;-)

The following kluge works:

Index: posix.c
===
--- posix.c (revision 227528)
+++ posix.c (working copy)
@@ -45,6 +45,10 @@
 #define O_BINARY 0
 #endif

+#ifdef _AIX
+#undef O_CLOEXEC
+#endif
+
 #ifndef O_CLOEXEC
 #define O_CLOEXEC 0
 #endif

Thanks, David

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread FX

> #define _FCLOEXEC   0x0010L
> #define O_CLOEXEC   _FCLOEXEC   /* sets FD_CLOEXEC on open  */

That’s weird, and definitely an AIX bug: 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
How does that even work? open() takes int as second arg.

FX

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread FX

> /home/dje/src/src/libbacktrace/posix.c: In function 'backtrace_open':
> /home/dje/src/src/libbacktrace/posix.c:67:32: error: overflow in
> implicit constant conversion [-Werror=overflow]
>   descriptor = open (filename, O_RDONLY | O_BINARY | O_CLOEXEC);

?? I have a hard time understanding how the non-constant filename can give an 
overflow error. Or maybe the error is wrong, and it’s O_RDONLY | O_BINARY | 
O_CLOEXEC that have crazy values?

This looks like a valid POSIX construct to me.

FX

libbacktrace patch committed: fix test for mmap failure

2015-09-08 Thread Ian Lance Taylor

PR 67457 points out a crash in libbacktrace when there is no memory
available.  This is because the code testing the mmap result for
failure is broken.  This patch fixes it.  Bootstrapped and ran
libbacktrace tests.  Committed to mainline.

Ian


2015-09-08  Ian Lance Taylor  

PR other/67457
* mmap.c (backtrace_alloc): Correct test for mmap failure.
Index: mmap.c
===
--- mmap.c  (revision 227528)
+++ mmap.c  (working copy)
@@ -139,7 +139,7 @@ backtrace_alloc (struct backtrace_state
   asksize = (size + pagesize - 1) & ~ (pagesize - 1);
   page = mmap (NULL, asksize, PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-  if (page == NULL)
+  if (page == MAP_FAILED)
error_callback (data, "mmap", errno);
   else
{

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread David Edelsohn

On Tue, Sep 8, 2015 at 9:40 AM, David Edelsohn  wrote:
> On Tue, Sep 8, 2015 at 9:15 AM, FX  wrote:
>>> libbacktrace is not supported on AIX.  This patch breaks bootstrap on AIX.
>>> It's okay if Fortran backtrace does not work on AIX, but not all
>>> targets support libbacktrace.
>>
>> libbacktrace is designed to be compiled on all targets. Some targets offer 
>> full support, some offer nothing, but libbacktrace is compiled and its 
>> headers provided in all cases.
>>
>> Can you please give us something to investigate? Like, the error message 
>> you’re seeing.
>
> /home/dje/src/src/libbacktrace/posix.c: In function 'backtrace_open':
> /home/dje/src/src/libbacktrace/posix.c:67:32: error: overflow in
> implicit constant conversion [-Werror=overflow]
>descriptor = open (filename, O_RDONLY | O_BINARY | O_CLOEXEC);
> ^

AIX system headers do not define O_BINARY, so posix.c defines it as 0.

#define _FCLOEXEC   0x0010L
#define O_CLOEXEC   _FCLOEXEC   /* sets FD_CLOEXEC on open  */

Thanks, David

Re: [PATCH] Import liboffloadmic from upstream

2015-09-08 Thread Jakub Jelinek

On Tue, Sep 08, 2015 at 04:40:22PM +0300, Ilya Verbin wrote:
> Looks like this is the only incompatible change.  Given that the library is 
> used
> only by libgomp plugin, this isn't a big problem.  I will add a comment into
> plugin prohibiting the use of the return value.  In fact, if something goes
> wrong in __offload_register_image, it calls exit ().

Ack.

> > > > 2) the *.map changes look wrong, when adding symbols to a symbol 
> > > > versioned
> > > >shared library, new symbols shouldn't be added to existing symbol
> > > >version, but to a new symbol version (so e.g. MYO_1.0.1 (or MYO_1.1)
> > > >and COI_1.0.1 (or COI_1.1))
> > > 
> > > I agree, but this is what I can't change - these files are copied from 
> > > real COI/
> > > MYO libraries, therefore the emulator (fake COI/MYO libs) must have the 
> > > same
> > > versions as the real libs.
> > 
> > If that change is already cast into stone, there is nothing we can do, but
> > can you talk to the library maintainers that they add new symbols to
> > different symbol version next time at least?
> 
> Yes, I have told them.
> 
> So, is it OK for trunk?

Ok.

Jakub

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread David Edelsohn

On Tue, Sep 8, 2015 at 9:15 AM, FX  wrote:
>> libbacktrace is not supported on AIX.  This patch breaks bootstrap on AIX.
>> It's okay if Fortran backtrace does not work on AIX, but not all
>> targets support libbacktrace.
>
> libbacktrace is designed to be compiled on all targets. Some targets offer 
> full support, some offer nothing, but libbacktrace is compiled and its 
> headers provided in all cases.
>
> Can you please give us something to investigate? Like, the error message 
> you’re seeing.

/home/dje/src/src/libbacktrace/posix.c: In function 'backtrace_open':
/home/dje/src/src/libbacktrace/posix.c:67:32: error: overflow in
implicit constant conversion [-Werror=overflow]
   descriptor = open (filename, O_RDONLY | O_BINARY | O_CLOEXEC);
^

Thanks, David

Re: [PATCH] Import liboffloadmic from upstream

2015-09-08 Thread Ilya Verbin

On Mon, Aug 31, 2015 at 20:07:49 +0200, Jakub Jelinek wrote:
> On Mon, Aug 31, 2015 at 08:56:58PM +0300, Ilya Verbin wrote:
> > On Mon, Aug 31, 2015 at 16:49:53 +0200, Jakub Jelinek wrote:
> > > 1) Is the library backwards ABI compatible?  Can you run e.g.
> > >libabigail abidiff in between the unpatched and patched version?
> > 
> > It should be in theory, and I've successfully tested an old binary with old
> > libgomp plugin and with new liboffloadmic.  However, `abidiff --changed-fns
> > old/liboffloadmic_host.so new/liboffloadmic_host.so` prints:
> > 
> > Functions changes summary: 0 Removed (82 filtered out), 7 Changed (21 
> > filtered out), 0 Added functions (1081 filtered out)
> > Variables changes summary: 0 Removed (25 filtered out), 1 Changed, 0 Added 
> > variables (7 filtered out)
> > Function symbols changes summary: 7 Removed, 76 Added function symbols not 
> > referenced by debug info
> > Variable symbols changes summary: 22 Removed, 4 Added variable symbols not 
> > referenced by debug info
> > 
> > 7 functions with some indirect sub-type change:
> > 
> > /* Unused functions skipped.  */
> > 
> >   [C]'function int __offload_offload(OFFLOAD, const char*, int, int, 
> > VarDesc*, VarDesc2*, int, int)' has some indirect sub-type changes:
> > parameter 1 of type 'typedef OFFLOAD' has sub-type changes:
> >   underlying type 'OffloadDescriptor*' changed:
> > in pointed to type 'struct OffloadDescriptor':
> >   type size changed from 2240 to 2368 bits
> >   9 data member insertions:
> > /* ...  */
> > 
> >   [C]'function void __offload_register_image()' has some indirect sub-type 
> > changes:
> > return type changed:
> >   type name changed from 'void' to 'bool'
> >   type size changed from 0 to 8 bits
> 
> E.g. this is an ABI change, if e.g. anything in the plugin will call
> __offload_register_image and check for return value (i.e. expect the new
> versioN), but actually at runtime link against the old one, it will
> misbehave.

Looks like this is the only incompatible change.  Given that the library is used
only by libgomp plugin, this isn't a big problem.  I will add a comment into
plugin prohibiting the use of the return value.  In fact, if something goes
wrong in __offload_register_image, it calls exit ().

> > > 2) the *.map changes look wrong, when adding symbols to a symbol versioned
> > >shared library, new symbols shouldn't be added to existing symbol
> > >version, but to a new symbol version (so e.g. MYO_1.0.1 (or MYO_1.1)
> > >and COI_1.0.1 (or COI_1.1))
> > 
> > I agree, but this is what I can't change - these files are copied from real 
> > COI/
> > MYO libraries, therefore the emulator (fake COI/MYO libs) must have the same
> > versions as the real libs.
> 
> If that change is already cast into stone, there is nothing we can do, but
> can you talk to the library maintainers that they add new symbols to
> different symbol version next time at least?

Yes, I have told them.

So, is it OK for trunk?

  -- Ilya

Re: [C++ Patch] PR 67369 ("[5/6 Regression] ICE (in tsubst_decl, at cp/pt.c:11302) with -std=c++14")

2015-09-08 Thread Jason Merrill


OK.

Jason

[PATCH] Prevent unnecessary recompilation for trivial params.def changes

2015-09-08 Thread Tom de Vries

[ was: Re: [RFC] Prevent unnecessary recompilation for trivial 
params.def changes ]


On 08/09/15 14:03, Andreas Schwab wrote:

Tom de Vries  writes:


After a subsequent rebuild I don't see anything being rebuild. So I don't
observe 'continuous rebuilding'.


What happens when you just touch params-list.h or params.def?
move-if-change will leave the target untouched when unchanged (that's
the whole point of it), so it will remain older than the dependencies.


I could reproduce the problem using these instructions, thanks.

I also found a bit "On the use of stamps" in gcc/Makefile.in, which 
explains the problem and how to fix things.


Updated patch accordingly.

OK for trunk if bootstrap succeeds?

Thanks,
- Tom

Prevent unnecessary recompilation for trivial params.def changes

2015-09-08  Tom de Vries  

	* Makefile.in (generated_files): Add params.list.
	(params.list, s-params.list): Add rule.
	* params.h (enum compiler_param): Include params-list.h.  Move define
	DEFPARAM, include params.def and undef DEFPARAM ...
	* params-list.h: ... here.  New file.
---
 gcc/Makefile.in   | 8 +++-
 gcc/params-list.h | 4 
 gcc/params.h  | 5 +
 3 files changed, 12 insertions(+), 5 deletions(-)
 create mode 100644 gcc/params-list.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3d1c1e5..b495bd2 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2415,7 +2415,7 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_H) multilib.h \
$(ALL_GTFILES_H) gtype-desc.c gtype-desc.h gcov-iov.h \
options.h target-hooks-def.h insn-opinit.h \
common/common-target-hooks-def.h pass-instances.def \
-   c-family/c-target-hooks-def.h
+   c-family/c-target-hooks-def.h params.list
 
 #
 # How to compile object files to run on the build machine.
@@ -3236,6 +3236,12 @@ installdirs:
 	$(mkinstalldirs) $(DESTDIR)$(man1dir)
 	$(mkinstalldirs) $(DESTDIR)$(man7dir)
 
+params.list: s-params.list; @true
+s-params.list: $(srcdir)/params-list.h $(srcdir)/params.def
+	$(CPP) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
+	$(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
+	$(STAMP) s-params.list
+
 PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   toplev.h $(DIAGNOSTIC_CORE_H) $(BASIC_BLOCK_H) $(HASH_TABLE_H) \
   tree-ssa-alias.h $(INTERNAL_FN_H) gimple-fold.h tree-eh.h gimple-expr.h \
diff --git a/gcc/params-list.h b/gcc/params-list.h
new file mode 100644
index 000..49301d2
--- /dev/null
+++ b/gcc/params-list.h
@@ -0,0 +1,4 @@
+#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
+  enumerator,
+#include "params.def"
+#undef DEFPARAM
diff --git a/gcc/params.h b/gcc/params.h
index f53426d..9f7618a 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -81,10 +81,7 @@ extern void set_param_value (const char *name, int value,
 
 enum compiler_param
 {
-#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
-  enumerator,
-#include "params.def"
-#undef DEFPARAM
+#include "params.list"
   LAST_PARAM
 };
 
-- 
1.9.1

Re: [PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread FX

> libbacktrace is not supported on AIX.  This patch breaks bootstrap on AIX.
> It's okay if Fortran backtrace does not work on AIX, but not all
> targets support libbacktrace.

libbacktrace is designed to be compiled on all targets. Some targets offer full 
support, some offer nothing, but libbacktrace is compiled and its headers 
provided in all cases.

Can you please give us something to investigate? Like, the error message you’re 
seeing.

FX

Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-08 Thread Michael Matz

Hi,

On Mon, 7 Sep 2015, Jonathan Wakely wrote:

> > Interesting.  Is this mode ABI-compatible with the default mode?
> 
> Yes, that's the main reason I want to make this change.
> 
> > Should _FORTIFY_SOURCE imply _GLIBCXX_ASSERTIONS?
> 
> Yes, I think it should.

Then at least those assertions that lie in a different big-O complexity 
class have to be moved away from _GLIBCXX_ASSERTIONS (as hinted in your 
initial mail).  Some distros build packages with _FORTIFY_SOURCE, and 
while additional asserts seem acceptable, going from constant to linear 
(or the like) seems not.

Ciao,
Michael.

[C++ Patch] PR 67369 ("[5/6 Regression] ICE (in tsubst_decl, at cp/pt.c:11302) with -std=c++14")

2015-09-08 Thread Paolo Carlini


Hi,

in this regression, an ICE is triggered in tsubst_decl, [case 
FUNCTION_DECL] at:


/* Nobody should be tsubst'ing into non-template functions.  */
gcc_assert (DECL_TEMPLATE_INFO (t) != NULL_TREE);

indeed, 't' is just the 'main' function. A simple way to avoid it is 
tweaking the code recently changed in tsubst_copy, [case FUNCTION_DECL] 
which calls tsubt_decl via tsusbt, to the effect of not calling the 
latter at all when DECL_CONTEXT (t) isn't a template.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2015-09-08  Paolo Carlini  

PR c++/67369
* pt.c (tsubst_copy, [case FUNCTION_DECL]): Do not call tsubst
if the first argument isn't a template.

/testsuite
2015-09-08  Paolo Carlini  

PR c++/67369
* g++.dg/cpp1y/lambda-generic-ice4.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 227528)
+++ cp/pt.c (working copy)
@@ -13599,8 +13599,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t com
  if (r)
{
  /* Make sure that the one we found is the one we want.  */
- tree ctx = tsubst (DECL_CONTEXT (t), args,
-complain, in_decl);
+ tree ctx = DECL_CONTEXT (t);
+ if (DECL_LANG_SPECIFIC (ctx) && DECL_TEMPLATE_INFO (ctx))
+   ctx = tsubst (ctx, args, complain, in_decl);
  if (ctx != DECL_CONTEXT (r))
r = NULL_TREE;
}
Index: testsuite/g++.dg/cpp1y/lambda-generic-ice4.C
===
--- testsuite/g++.dg/cpp1y/lambda-generic-ice4.C(revision 0)
+++ testsuite/g++.dg/cpp1y/lambda-generic-ice4.C(working copy)
@@ -0,0 +1,10 @@
+// PR c++/67369
+// { dg-do compile { target c++14 } }
+
+int main() {
+  unsigned const nsz = 0;
+  auto repeat_conditional = [&](auto) {
+auto new_sz = nsz;
+  };
+  repeat_conditional(1);
+}

Re: [RS6000] Fix PowerPC ICE due to secondary_reload ignoring reload replacements

2015-09-08 Thread Ulrich Weigand

David Edelsohn wrote:
> On Mon, Sep 7, 2015 at 11:47 PM, Alan Modra  wrote:
> > In https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67378 analysis I show
> > the reason for this PR is that insns emitted by secondary reload
> > patterns are being generated without taking into account other reloads
> > that may have occurred.  We run into this problem when an insn has a
> > pseudo that doesn't get a hard reg, and the pseudo is used in a way
> > that requires a secondary reload.  In this case the secondary reload
> > is needed due to gcc generating a 64-bit gpr load from memory insn
> > with an address offset not a multiple of 4.
> >
> > Bootstrapped and regression tested powerpc64-linux.  OK to apply?
> > gcc-5 and gcc-4.9 branches too?
> >
> > I haven't included a testcase in this patch, because the testcase in
> > the PR is quite horrible, and testcases triggering reload misbehaviour
> > tend to be unreliable.  By unreliable, I mean a small change anywhere
> > in the compiler can result in the testcase passing even if this bug
> > was reintroduced at some future date.  The testcase doesn't fail on
> > gcc-5, even though I'm fairly sure the same bug lurks there..
> >
> > PR target/67378
> > * config/rs6000/rs6000.c (rs6000_secondary_reload_gpr): Find
> > reload replacement for PRE_MODIFY address reg.
> 
> I'm okay with this patch, but I'd like Uli to double-check it when he
> has a moment.

The patch looks OK to me.  We definitely need to check for replacements
in secondary reload in such cases.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [PATCH 2/5] completely_scalarize arrays as well as records.

2015-09-08 Thread Martin Jambor

Hi,

On Mon, Sep 07, 2015 at 02:15:45PM +0100, Alan Lawrence wrote:
> In-Reply-To: <55e0697d.2010...@arm.com>
> 
> On 28/08/15 16:08, Alan Lawrence wrote:
> > Alan Lawrence wrote:
> >>
> >> Right. I think VLA's are the problem with pr64312.C also. I'm testing a fix
> >> (that declares arrays with any of these properties as unscalarizable).
> > ... 
> > In the meantime I've reverted the patch pending further testing on x86, 
> > aarch64
> > and arm.
> 
> I've now tested g++ and fortran (+ bootstrap + check-gcc) on x86, AArch64 and
> ARM, and Ada on x86 and ARM.
> 
> So far the list of failures from the original patch seems to be:
> 
> * g++.dg/torture/pr64312.C on ARM and m68k-linux
> * Building Ada on x86
> * Ada ACATS c87b31a on ARM (where the Ada frontend builds fine)
> 
> Here's a new version, that fixes all the above, by adding a dose of
> paranoia in scalarizable_type_p...

I have only had a bref look at scalarizable_type_p then, considering
all of the rest unchanged, and the tests there seem natural to me.
(Note that I do not have the authority to approve the patch.)

> (I wonder about adding a comment
> in completely_scalarize that such cases have already been ruled
> out?)

The comment already references scalarizable_type_p which is enough at
least for me.

Thanks,

Martin

Re: [RFC] Try vector as a new representation for vector masks

2015-09-08 Thread Ilya Enkovich

2015-09-04 23:42 GMT+03:00 Jeff Law :
> On 09/01/2015 07:08 AM, Ilya Enkovich wrote:
>>
>> On 27 Aug 09:55, Richard Biener wrote:
>>>
>>> I suggest you try modifying those parts first according to this
>>> scheme that will most likely uncover issues we missed.
>>>
>>> Thanks, Richard.
>>>
>>
>> I tried to implement this scheme and apply it for MASK_LOAD and
>> MASK_STORE.  There were no major issues (for now).
>
> So do we have enough confidence in this representation that we want to go
> ahead and commit to it?

I think new representation fits nice mostly. There are some places
where I have to make some exceptions for vector of bools to make it
work. This is mostly to avoid target modifications. I'd like to avoid
necessity to change all targets currently supporting vec_cond. It
makes me add some special handling of vec in GIMPLE, e.g. I add
a special code in vect_init_vector to build vec invariants with
proper casting to int. Otherwise I'd need to do it on a target side.

I made several fixes and current patch (still allowing integer vector
result for vector comparison and applying bool patterns) passes
bootstrap and regression testing on x86_64. Now I'll try to fully
switch to vec and see how it goes.

Thanks,
Ilya

>
>>
>> I had to introduce significant number of new patterns in i386 target
>> to support new optabs.  The reason was vector compare was never
>> expanded separately and always was a part of a vec_cond expansion.
>
> One could argue we should have fixed this already, so I don't see the new
> patterns as a bad thing, but instead they're addressing a long term
> mis-design.
>
>>
>>
>> For now I still don't disable bool patterns, thus new masks apply to
>> masked loads and stores only.  Patch is also not tested and tried on
>> several small tests only.  Could you please look at what I currently
>> have and say if it's in sync with your view on vector masking?
>
> I'm going to let Richi run with this for the most part -- but I will chime
> in with a thank you for being willing to bounce this around a bit while we
> figure out the representational issues.
>
>
> jeff

gcc/

2015-09-08  Ilya Enkovich  

* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
(ix86_expand_int_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
op_true and op_false.
(ix86_int_cmp_code_to_pcmp_immediate): New.
(ix86_fp_cmp_code_to_pcmp_immediate): New.
(ix86_cmp_code_to_pcmp_immediate): New.
(ix86_expand_mask_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
(ix86_expand_int_sse_cmp): New.
(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
(ix86_expand_int_vec_cmp): New.
(ix86_get_mask_mode): New.
(TARGET_VECTORIZE_GET_MASK_MODE): New.
* config/i386/sse.md (avx512fmaskmodelower): New.
(vec_cmp): New.
(vec_cmp): New.
(vec_cmpv2div2di): New.
(vec_cmpu): New.
(vec_cmpu): New.
(vec_cmpuv2div2di): New.
(maskload): Rename to ...
(maskload): ... this.
(maskstore): Rename to ...
(maskstore): ... this.
(maskload): New.
(maskstore): New.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
(expand_MASK_STORE): Likewise.
* optabs.c (vector_compare_rtx): Add OPNO arg.
(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
(get_vec_cmp_icode): New.
(expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* optabs.def (vec_cmp_optab): New.
(vec_cmpu_optab): New.
(maskload_optab): Transform into convert optab.
(maskstore_optab): Likewise.
* optabs.h (expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* target.def (get_mask_mode): New.
* targhooks.c (default_vector_alignment): Use mode alignment
for vector masks.
(default_get_mask_mode): New.
* targhooks.h (default_get_mask_mode): New.
* tree-cfg.c (verify_gimple_comparison): Support vector mask.
* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
can_vec_mask_load_store_p signature change.
(predicate_mem_writes): Use boolean mask.
* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-generic.c (expand_vector_comparison): Use
expand_vec_cmp_expr_p for comparison availability.
(expand_vector_operations_1): Ignore mask statements with scalar mode.
* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
operations for VF.  Add mask type computation.
* tree-vect-stmts.c (vect_init_vector): Support mask invariants.
(vect_get_vec_def_for_operand): Support mask constant.
(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
signature change.
(vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
(get_mask_type_for_scalar_type): New.
* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
(enum st

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-08 Thread Wilco Dijkstra

> Renlin Li wrote:
> Hi Andrew,
> 
> Previously, there is a discussion thread in binutils mailing list:
> 
> https://sourceware.org/ml/binutils/2015-04/msg00032.html
> 
> Nick proposed a way to fix, Richard Henderson hold similar opinion as you.

Both Nick and Richard H seem to think it is an issue with unaligned 
instructions 
rather than an alignment bug in the debug code in the assembler (probably due to
the misleading error message). Although it would work, since we don't have/need
unaligned instructions that proposed patch is not the right fix for this issue.

Anyway aligning the debug tables correctly should be a safe and trivial fix.

Wilco

[PATCH, fortran] PR 53379 Backtrace on error termination

2015-09-08 Thread David Edelsohn

libbacktrace is not supported on AIX.  This patch breaks bootstrap on AIX.

It's okay if Fortran backtrace does not work on AIX, but not all
targets support libbacktrace.

Thanks, David

Re: [RS6000] Fix PowerPC ICE due to secondary_reload ignoring reload replacements

2015-09-08 Thread David Edelsohn

On Mon, Sep 7, 2015 at 11:47 PM, Alan Modra  wrote:
> In https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67378 analysis I show
> the reason for this PR is that insns emitted by secondary reload
> patterns are being generated without taking into account other reloads
> that may have occurred.  We run into this problem when an insn has a
> pseudo that doesn't get a hard reg, and the pseudo is used in a way
> that requires a secondary reload.  In this case the secondary reload
> is needed due to gcc generating a 64-bit gpr load from memory insn
> with an address offset not a multiple of 4.
>
> Bootstrapped and regression tested powerpc64-linux.  OK to apply?
> gcc-5 and gcc-4.9 branches too?
>
> I haven't included a testcase in this patch, because the testcase in
> the PR is quite horrible, and testcases triggering reload misbehaviour
> tend to be unreliable.  By unreliable, I mean a small change anywhere
> in the compiler can result in the testcase passing even if this bug
> was reintroduced at some future date.  The testcase doesn't fail on
> gcc-5, even though I'm fairly sure the same bug lurks there..
>
> PR target/67378
> * config/rs6000/rs6000.c (rs6000_secondary_reload_gpr): Find
> reload replacement for PRE_MODIFY address reg.

I'm okay with this patch, but I'd like Uli to double-check it when he
has a moment.

Thanks, David

[PATCH] Fix seq_cost prototype to use signed int

2015-09-08 Thread Jiong Wang


All other cost helper functions are using signed int to hold cost
while seq_cost is using unsigned int.

This fix this. bootstrap OK on x86.

OK for trunk?

2015-09-08  Jiong Wang  

gcc/
  * rtl.h (seq_cost): Change return type from "unsigned" to "int".
  * rtlanal.c (seq_cost): Likewise.
  
-- 
Regards,
Jiong

diff --git a/gcc/rtl.h b/gcc/rtl.h
index ac56133..ded054c 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3050,7 +3050,7 @@ extern rtx_insn *find_first_parameter_load (rtx_insn *, rtx_insn *);
 extern bool keep_with_call_p (const rtx_insn *);
 extern bool label_is_jump_target_p (const_rtx, const rtx_insn *);
 extern int insn_rtx_cost (rtx, bool);
-extern unsigned seq_cost (const rtx_insn *, bool);
+extern int seq_cost (const rtx_insn *, bool);

 /* Given an insn and condition, return a canonical description of
the test being made.  */
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index ef98f4b..2f14b93 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5160,10 +5160,10 @@ insn_rtx_cost (rtx pat, bool speed)

 /* Returns estimate on cost of computing SEQ.  */

-unsigned
+int
 seq_cost (const rtx_insn *seq, bool speed)
 {
-  unsigned cost = 0;
+  int cost = 0;
   rtx set;

   for (; seq; seq = NEXT_INSN (seq))

Re: [RFC] Prevent unnecessary recompilation for trivial params.def changes

2015-09-08 Thread Andreas Schwab

Tom de Vries  writes:

> After a subsequent rebuild I don't see anything being rebuild. So I don't
> observe 'continuous rebuilding'.

What happens when you just touch params-list.h or params.def?
move-if-change will leave the target untouched when unchanged (that's
the whole point of it), so it will remain older than the dependencies.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH][AArch64] Improve code generation for float16 vector code

2015-09-08 Thread Alan Lawrence


On 08/09/15 09:26, James Greenhalgh wrote:

On Tue, Sep 08, 2015 at 09:21:08AM +0100, James Greenhalgh wrote:

On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:

On 04/09/15 13:32, James Greenhalgh wrote:

In that case, these should be implemented as inline assembly blocks. As it
stands, the code generation for these intrinsics will be very poor with this
patch applied.

I'm going to hold off OKing this until I see a follow-up to fix the code
generation, either replacing those particular intrinsics with inline asm,
or doing the more comprehensive fix in the back-end.

Thanks,
James


In that case, here is the follow-up now ;). This fixes each of the following
functions to generate a single instruction followed by ret:
   * vld1_dup_f16, vld1q_dup_f16
   * vset_lane_f16, vsetq_lane_f16
   * vget_lane_f16, vgetq_lane_f16
   * For IN of type either float16x4_t or float16x8_t, and constant C:
return (float16x4_t) {in[C], in[C], in[C], in[C]};
   * Similarly,
return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], in[C]};
(These correspond intuitively to what one might expect for "vdup_lane_f16",
"vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics,
although such intrinsics do not actually exist.)

This patch does not deal with equivalents to vdup_n_s16 and other intrinsics
that load immediates, rather than using elements of pre-existing vectors.


What is code generation like for these then? if I remeber correctly it
was the vdup_n_f16 implementation that looked most objectionable before.


Ah, I see what you are saying here. You mean: if there were intrinsics
equivalent to vdup_n_s16 (which there are not), then this patch would not
handle them. I was confused as vld1_dup_f16 does not use an element of a
pre-existing vector, and may well load an immediate, but is handled by
your patch.


To be clear: the *immediate* case of this, we do not use at all yet, as HFmode 
constants are disabled in aarch64_float_const_representable_p - we need to do 
some mangling to express the floating point value as a binary constant in the 
assembler output. (See the ARM backend.) That is, we cannot output (say) an 
HFmode load of 16.0 as the assembler would express 16.0 as a 32-bit float 
constant; we would instead need to output a load of immediate 0x4400. Instead, 
we will push the constant out to the constant pool and use a load instruction 
taking an address.


--Alan

Re: [PATCH] New attribute to create target clones

2015-09-08 Thread Evgeny Stupachenko

Ping.

On Thu, Aug 27, 2015 at 2:18 PM, Evgeny Stupachenko  wrote:
> Hi All,
>
> Based on RFC:
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01322.html
>
> The patch implement an extension to Function Multiversioning that
> allows to clone a function for multiple targets.
> __attribute__((target_clones("avx","arch=slm","default")))
> int foo ()
> ...
>
> Will create 3 clones of foo(). One optimized with -mavx, one optimized
> with -march=slm, and one with default optimizations.
> And will create ifunc resolver that calls appropriate clone (same as
> in Function Multiversioning).
>
> Bootstrap and make check for x86 passed.
>
> Is it ok for trunk?
>
> 2015-08-27  Evgeny Stupachenko  
>
> gcc/
> * Makefile.in (OBJS): Add multiple_target.o.
> * multiple_target.c (make_attribute): New.
> (create_dispatcher_calls): Ditto.
> (expand_target_clones): Ditto.
> (ipa_target_clone): Ditto.
> * passes.def (pass_target_clone): New ipa pass.
> * tree-pass.h (make_pass_target_clone): Ditto.
>
> gcc/c-family
> * c-common.c (handle_target_clones_attribute): New.
> * (c_common_attribute_table): Add handle_target_clones_attribute.
> * (handle_always_inline_attribute): Add check on target_clones
> attribute.
> * (handle_target_attribute): Ditto.
>
> gcc/testsuite
> * gcc.dg/mvc1.c: New test for multiple targets cloning.
> * gcc.dg/mvc2.c: Ditto.
> * gcc.dg/mvc3.c: Ditto.
> * gcc.dg/mvc4.c: Ditto.
> * gcc.dg/mvc5.c: Ditto.
> * gcc.dg/mvc6.c: Ditto.
> * gcc.dg/mvc7.c: Ditto.
> * g++.dg/ext/mvc1.C: Ditto.
> * g++.dg/ext/mvc2.C: Ditto.
> * g++.dg/ext/mvc3.C: Ditto.
>
> gcc/doc
> * doc/extend.texi (target_clones): New attribute description.

Re: [RFC] Prevent unnecessary recompilation for trivial params.def changes

2015-09-08 Thread Tom de Vries


On 08/09/15 13:00, Andreas Schwab wrote:

Tom de Vries  writes:


@@ -3236,6 +3236,10 @@ installdirs:
$(mkinstalldirs) $(DESTDIR)$(man1dir)
$(mkinstalldirs) $(DESTDIR)$(man7dir)

+params.list: $(srcdir)/params-list.h $(srcdir)/params.def
+   $(CPP) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
+   $(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
+


You need a stamp file to avoid continuous rebuilding, don't you?



After a trivial change and a rebuild, I see the files being rebuild:
...
/usr/bin/gcc-4.6 -E src/gcc/params-list.h | sed 's/^#.*//;/^$/d' > 
tmp-params.list

/bin/bash src/gcc/../move-if-change tmp-params.list params.list
...

After a subsequent rebuild I don't see anything being rebuild. So I 
don't observe 'continuous rebuilding'.


Thanks,
- Tom

[patch match.pd c c++]: Ignore results of 'shorten_compare' and move missing patterns in match.pd

2015-09-08 Thread Kai Tietz

Hi,

This patch is the first part of obsoleting 'shorten_compare' function
for folding.
It adjusts the uses of 'shorten_compare' to ignore folding returned by
it, and adds
missing pattterns to match.pd to allow full bootstrap of C/C++ without
regressions.
Due we are using 'shorten_compare' for some diagnostic we can't simply
remove it.  So if this patch gets approved, the next step will be to
rename the function to something like 'check_compare', and adjust its
arguments and inner logic to reflect that we don't modify
arguments/expression anymore within that function.

Bootstrap just show 2 regressions within gcc.dg testsuite due patterns
matched are folded more early by forward-propagation.  I adjusted
them, and added them to patch, too.

I did regression-testing for x86_64-unknown-linux-gnu.

ChangeLog

2015-09-08  Kai Tietz  

* match.pd: Add missing patterns from shorten_compare.
* c/c-typeck.c (build_binary_op): Discard foldings of shorten_compare.
* cp/typeck.c (cp_build_binary_op): Likewise.

2015-09-08  Kai Tietz  

* gcc.dg/tree-ssa/vrp23.c: Adjust testcase to reflect that
pattern is matching now already within forward-propagation pass.
* gcc.dg/tree-ssa/vrp24.c: Likewise.

Index: match.pd
===
--- match.pd(Revision 227528)
+++ match.pd(Arbeitskopie)
@@ -1786,6 +1786,45 @@ along with GCC; see the file COPYING3.  If not see
   (op (abs @0) zerop@1)
   (op @0 @1)))

+/* Simplify '((type) X) cmp ((type) Y' to shortest possible types, of X and Y,
+   if type's precision is wider then precision of X's and Y's type.
+   Logic taken from shorten_compare function.  */
+(for op (tcc_comparison)
+  (simplify
+(op (convert@0 @1) (convert@3 @2))
+(if ((TREE_CODE (TREE_TYPE (@1)) == REAL_TYPE)
+ == (TREE_CODE (TREE_TYPE (@2)) == REAL_TYPE)
+ && (TREE_CODE (TREE_TYPE (@1)) == REAL_TYPE)
+== (TREE_CODE (TREE_TYPE (@0)) == REAL_TYPE)
+ && single_use (@1)
+ && single_use (@3)
+ && TYPE_UNSIGNED (TREE_TYPE (@1)) == TYPE_UNSIGNED (TREE_TYPE (@2))
+ && !((TREE_CODE (TREE_TYPE (@1)) == REAL_TYPE
+   && DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (@1
+  || (TREE_CODE (TREE_TYPE (@2)) == REAL_TYPE
+  && DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (@2)
+ && TYPE_PRECISION (TREE_TYPE (@1)) < TYPE_PRECISION (TREE_TYPE (@0))
+ && TYPE_PRECISION (TREE_TYPE (@2)) < TYPE_PRECISION (TREE_TYPE (@0))
+ )
+   (with {
+ tree comtype = TYPE_PRECISION (TREE_TYPE (@1))
+ < TYPE_PRECISION (TREE_TYPE (@2)) ? TREE_TYPE (@2)
+   : TREE_TYPE (@1);
+ if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   {
+ if (TYPE_UNSIGNED (TREE_TYPE (@1)) || TYPE_UNSIGNED
(TREE_TYPE (@0)))
+   comtype = unsigned_type_for (comtype);
+ else
+   comtype = signed_type_for (comtype);
+   }
+ }
+(op (convert:comtype @1) (convert:comtype @2))
+   )
+ )
+  )
+)
+
+
 /* From fold_sign_changed_comparison and fold_widened_comparison.  */
 (for cmp (simple_comparison)
  (simplify
@@ -2046,7 +2085,43 @@ along with GCC; see the file COPYING3.  If not see
 (if (cmp == LE_EXPR)
  (ge (convert:st @0) { build_zero_cst (st); })
  (lt (convert:st @0) { build_zero_cst (st); }))
-
+
+/* Simplify '(type) X cmp CST' to 'X cmp (type-of-X) CST', if
+   CST fits into the type of X.  */
+(for cmp (simple_comparison)
+  (simplify
+(cmp (convert@2 @0) INTEGER_CST@1)
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0))
+ && single_use (@2)
+ && (TYPE_UNSIGNED (TREE_TYPE (@0))
+ || TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED
(TREE_TYPE (@1))
+ || cmp == NE_EXPR || cmp == EQ_EXPR)
+ && !POINTER_TYPE_P (TREE_TYPE (@0))
+ && int_fits_type_p (@1, TREE_TYPE (@0)))
+  (with { tree optype = TREE_TYPE (@0); }
+(cmp @0 (convert:optype @1))
+  )
+)
+  )
+)
+
+/* See if '(type) X ==/!= CST' represents a condition,
+   which is always true, or false due CST's value.  */
+(for cmp (ne eq)
+  (simplify
+(cmp (convert@2 @0) INTEGER_CST@1)
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && TYPE_PRECISION (TREE_TYPE (@1)) >= TYPE_PRECISION (TREE_TYPE (@0))
+ && single_use (@2)
+ && !POINTER_TYPE_P (TREE_TYPE (@0))
+ && !int_fits_type_p (@1, TREE_TYPE (@0))
+ && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1)))
+  { constant_boolean_node (cmp == NE_EXPR, type); }
+)
+  )
+)
+
 (for cmp (unordered ordered unlt unle ungt unge uneq ltgt)
  /* If the second operand is NaN, the result is constant.  */
  (simplify
Index: cp/typeck.c
==

Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}

2015-09-08 Thread Kyrill Tkachov



On 08/09/15 11:56, Alan Lawrence wrote:

Ping. (Thanks, Christophe!)

Correct version here: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01501.html

Cheers, Alan

On 25/08/15 15:21, Christophe Lyon wrote:

On 25 August 2015 at 15:57, Alan Lawrence  wrote:

Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
moved to the previous patch! This version also fixes some whitespace issues.


This looks OK to me now, thanks.


gcc/testsuite/ChangeLog:

  * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
  * lib/target-supports.exp
  (check_effective_target_arm_neon_fp16_hw_ok): New.
---
   .../aarch64/advsimd-intrinsics/vcvt_f16.c  | 98 
++
   gcc/testsuite/lib/target-supports.exp  | 15 
   2 files changed, 113 insertions(+)
   create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index 000..a2cfd38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,98 @@
+/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } 
*/
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include 
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
+   0x4160, 0x4150 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
+0xc120, 0xc110 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 
0x4000,
+0xcc00, 0xcb80, 0xcb00, 0xca80 
};
+
+void
+exec_vcvt (void)
+{
+  clean_results ();
+
+#define TEST_MSG vcvt_f32_f16
+  {
+VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+DECL_VARIABLE (vector_src, float, 16, 4);
+
+VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+DECL_VARIABLE (vector_res, float, 32, 4) =
+   vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+  VECT_VAR (vector_res, float, 32, 4));
+
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+DECL_VARIABLE (vector_src, float, 32, 4);
+
+VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+DECL_VARIABLE (vector_res, float, 16, 4) =
+  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+vst1_f16 (VECT_VAR (result, float, 16, 4),
+ VECT_VAR (vector_res, float, 16 ,4));
+
+CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  }
+#undef TEST_MSG
+
+#if defined (__aarch64__)
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f32_f16"
+  {
+DECL_VARIABLE (vector_src, float, 16, 8);
+VLOAD (vector_src, buffer, q, float, f, 16, 8);
+DECL_VARIABLE (vector_res, float, 32, 4);
+VECT_VAR (vector_res, float, 32, 4) =
+  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+  VECT_VAR (vector_res, float, 32, 4));
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, "");
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f16_f32"
+  {
+DECL_VARIABLE (vector_low, float, 16, 4);
+VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+DECL_VARIABLE (vector_src, float, 32, 4);
+VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+DECL_VARIABLE (vector_res, float, 16, 8) =
+  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
+VECT_VAR (vector_src, float, 32, 4));
+vst1q_f16 (VECT_VAR (result, float, 16, 8),
+  VECT_VAR (vector_res, float, 16, 8));
+
+CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, "");
+  }
+#endif
+}
+
+int
+main (void)
+{
+  exec_vcvt ();
+  return 0;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9aec02d..0a22c95 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2730,6 +2730,21 @@ proc check_effective_target_arm_neon_fp16_ok { } {
  check_effective_target_arm_neon_fp16_ok_nocache]
   }

+proc check_effective_target_arm_neon_fp16_hw_ok { } {


I see we're not using the *hw_ok naming anywhere else in target-supports.exp.
For example, we have check_effective_target_arm_neon_hw, without the _ok.
So I'd just call this check_effective_target_arm_neon_fp16_hw.

Ok wit

Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-09-08 Thread Kyrill Tkachov



On 08/09/15 11:55, Kyrill Tkachov wrote:

Hi all,

On 08/09/15 11:52, Alan Lawrence wrote:

Ping. (Thanks, Christophe!).

Original message: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02366.html

On 25/08/15 14:28, Alan Lawrence wrote:

Christophe Lyon wrote:

On 28 July 2015 at 13:26, Alan Lawrence  wrote:

This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the
testsuite for float16 vectors. Relative to the previous version, most of the
additions to the tests are now within #if..#endif such that they are only
compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since
this is actually an integer type, we can define and use it without any
compiler fp16 support). Also we  try to use add_options_for_arm_neon_fp16
for all tests (on ARM targets), falling back to add_options_for_arm_neon if
the previous fails.

Cross-tested on many multilibs, including -march=armv6,
-march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon,
-march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon},
-march=armv7-a/-mfp16-format=alternative .


Hi Alan,

It looks OK.
Did you also run the tests on AArch64?

arm-wise it looks ok to me since Christophe is ok with the changes.


To be clear, this is ok for trunk.


Kyrill


Sorry, yes, I did - aarch64-none-linux-gnu, and aarch64_be-none-elf also.

Thanks, Alan

Re: [RFC] Prevent unnecessary recompilation for trivial params.def changes

2015-09-08 Thread Andreas Schwab

Tom de Vries  writes:

> @@ -3236,6 +3236,10 @@ installdirs:
>   $(mkinstalldirs) $(DESTDIR)$(man1dir)
>   $(mkinstalldirs) $(DESTDIR)$(man7dir)
>  
> +params.list: $(srcdir)/params-list.h $(srcdir)/params.def
> + $(CPP) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
> + $(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
> +

You need a stamp file to avoid continuous rebuilding, don't you?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}

2015-09-08 Thread Alan Lawrence


Ping. (Thanks, Christophe!)

Correct version here: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01501.html

Cheers, Alan

On 25/08/15 15:21, Christophe Lyon wrote:

On 25 August 2015 at 15:57, Alan Lawrence  wrote:

Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
moved to the previous patch! This version also fixes some whitespace issues.



This looks OK to me now, thanks.


gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
 * lib/target-supports.exp
 (check_effective_target_arm_neon_fp16_hw_ok): New.
---
  .../aarch64/advsimd-intrinsics/vcvt_f16.c  | 98 ++
  gcc/testsuite/lib/target-supports.exp  | 15 
  2 files changed, 113 insertions(+)
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index 000..a2cfd38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,98 @@
+/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } 
*/
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include 
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
+   0x4160, 0x4150 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
+0xc120, 0xc110 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 
0x4000,
+0xcc00, 0xcb80, 0xcb00, 0xca80 
};
+
+void
+exec_vcvt (void)
+{
+  clean_results ();
+
+#define TEST_MSG vcvt_f32_f16
+  {
+VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+DECL_VARIABLE (vector_src, float, 16, 4);
+
+VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+DECL_VARIABLE (vector_res, float, 32, 4) =
+   vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+  VECT_VAR (vector_res, float, 32, 4));
+
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+DECL_VARIABLE (vector_src, float, 32, 4);
+
+VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+DECL_VARIABLE (vector_res, float, 16, 4) =
+  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+vst1_f16 (VECT_VAR (result, float, 16, 4),
+ VECT_VAR (vector_res, float, 16 ,4));
+
+CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  }
+#undef TEST_MSG
+
+#if defined (__aarch64__)
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f32_f16"
+  {
+DECL_VARIABLE (vector_src, float, 16, 8);
+VLOAD (vector_src, buffer, q, float, f, 16, 8);
+DECL_VARIABLE (vector_res, float, 32, 4);
+VECT_VAR (vector_res, float, 32, 4) =
+  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+  VECT_VAR (vector_res, float, 32, 4));
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, "");
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f16_f32"
+  {
+DECL_VARIABLE (vector_low, float, 16, 4);
+VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+DECL_VARIABLE (vector_src, float, 32, 4);
+VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+DECL_VARIABLE (vector_res, float, 16, 8) =
+  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
+VECT_VAR (vector_src, float, 32, 4));
+vst1q_f16 (VECT_VAR (result, float, 16, 8),
+  VECT_VAR (vector_res, float, 16, 8));
+
+CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, "");
+  }
+#endif
+}
+
+int
+main (void)
+{
+  exec_vcvt ();
+  return 0;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 9aec02d..0a22c95 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2730,6 +2730,21 @@ proc check_effective_target_arm_neon_fp16_ok { } {
 check_effective_target_arm_neon_fp16_ok_nocache]
  }

+proc check_effective_target_arm_neon_fp16_hw_ok { } {
+if {! [check_effective_target_arm_neon_fp16_ok] } {
+   return 0
+}
+global et_arm_neon_fp16_flags
+check_runtime_nocache arm_neon_fp16_hw_ok {
+   int
+   main (int argc, char **argv)
+   {
+ asm ("vcvt.f32.f16 q1, d0");
+ re

Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-09-08 Thread Kyrill Tkachov


Hi all,

On 08/09/15 11:52, Alan Lawrence wrote:

Ping. (Thanks, Christophe!).

Original message: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02366.html

On 25/08/15 14:28, Alan Lawrence wrote:

Christophe Lyon wrote:

On 28 July 2015 at 13:26, Alan Lawrence  wrote:

This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the
testsuite for float16 vectors. Relative to the previous version, most of the
additions to the tests are now within #if..#endif such that they are only
compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since
this is actually an integer type, we can define and use it without any
compiler fp16 support). Also we  try to use add_options_for_arm_neon_fp16
for all tests (on ARM targets), falling back to add_options_for_arm_neon if
the previous fails.

Cross-tested on many multilibs, including -march=armv6,
-march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon,
-march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon},
-march=armv7-a/-mfp16-format=alternative .


Hi Alan,

It looks OK.
Did you also run the tests on AArch64?


arm-wise it looks ok to me since Christophe is ok with the changes.
Kyrill


Sorry, yes, I did - aarch64-none-linux-gnu, and aarch64_be-none-elf also.

Thanks, Alan

Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-09-08 Thread Alan Lawrence


Ping. (Thanks, Christophe!).

Original message: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02366.html

On 25/08/15 14:28, Alan Lawrence wrote:

Christophe Lyon wrote:

On 28 July 2015 at 13:26, Alan Lawrence  wrote:

This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the
testsuite for float16 vectors. Relative to the previous version, most of the
additions to the tests are now within #if..#endif such that they are only
compiled if we have a scalar __fp16 type (the exception is hfloat16_t: since
this is actually an integer type, we can define and use it without any
compiler fp16 support). Also we  try to use add_options_for_arm_neon_fp16
for all tests (on ARM targets), falling back to add_options_for_arm_neon if
the previous fails.

Cross-tested on many multilibs, including -march=armv6,
-march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon,
-march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon},
-march=armv7-a/-mfp16-format=alternative .


Hi Alan,

It looks OK.
Did you also run the tests on AArch64?


Sorry, yes, I did - aarch64-none-linux-gnu, and aarch64_be-none-elf also.

Thanks, Alan

[RFC] Prevent unnecessary recompilation for trivial params.def changes

2015-09-08 Thread Tom de Vries


Hi,

this patch adds generation of params.list, a file containing a list of 
names of all the parameters in params.def.


By including params.list in params.h, rather than params.def itself, we 
prevent recompiling the 118 c files that include params.h for trivial 
changes in params.def, such as f.i. changing the default value of a 
parameter.


I did a minimal build with the patch, and tested the behaviour by doing 
both trivial an non-trivial changes in params.def, and rebuilding.


Any comments?

Thanks,
- Tom
Add params.list

---
 gcc/Makefile.in   | 6 +-
 gcc/params-list.h | 4 
 gcc/params.h  | 5 +
 3 files changed, 10 insertions(+), 5 deletions(-)
 create mode 100644 gcc/params-list.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3d1c1e5..f1ce154 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2415,7 +2415,7 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_H) multilib.h \
$(ALL_GTFILES_H) gtype-desc.c gtype-desc.h gcov-iov.h \
options.h target-hooks-def.h insn-opinit.h \
common/common-target-hooks-def.h pass-instances.def \
-   c-family/c-target-hooks-def.h
+   c-family/c-target-hooks-def.h params.list
 
 #
 # How to compile object files to run on the build machine.
@@ -3236,6 +3236,10 @@ installdirs:
 	$(mkinstalldirs) $(DESTDIR)$(man1dir)
 	$(mkinstalldirs) $(DESTDIR)$(man7dir)
 
+params.list: $(srcdir)/params-list.h $(srcdir)/params.def
+	$(CPP) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
+	$(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
+
 PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   toplev.h $(DIAGNOSTIC_CORE_H) $(BASIC_BLOCK_H) $(HASH_TABLE_H) \
   tree-ssa-alias.h $(INTERNAL_FN_H) gimple-fold.h tree-eh.h gimple-expr.h \
diff --git a/gcc/params-list.h b/gcc/params-list.h
new file mode 100644
index 000..49301d2
--- /dev/null
+++ b/gcc/params-list.h
@@ -0,0 +1,4 @@
+#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
+  enumerator,
+#include "params.def"
+#undef DEFPARAM
diff --git a/gcc/params.h b/gcc/params.h
index f53426d..9f7618a 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -81,10 +81,7 @@ extern void set_param_value (const char *name, int value,
 
 enum compiler_param
 {
-#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
-  enumerator,
-#include "params.def"
-#undef DEFPARAM
+#include "params.list"
   LAST_PARAM
 };
 
-- 
1.9.1

Re: [PATCH PR66388]Add sizetype cand for BIV of smaller type if it's used as index of memory ref

2015-09-08 Thread Bin.Cheng

On Tue, Sep 8, 2015 at 6:06 PM, Bin.Cheng  wrote:
> On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener
>  wrote:
>> On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng  wrote:
>>> Hi,
>>> This patch is a new approach to fix PR66388.  IVO today computes iv_use with
>>> iv_cand which has at least same type precision as the use.  On 64bit
>>> platforms like AArch64, this results in different iv_cand created for each
>>> address type iv_use, and register pressure increased.  As a matter of fact,
>>> the BIV should be used for all iv_uses in some of these cases.  It is a
>>> latent bug but recently getting worse because of overflow changes.
>>>
>>> The original approach at
>>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue
>>> except it conflict with IV elimination.  Seems to me it is impossible to
>>> mitigate the contradiction.
>>>
>>> This new approach fixes the issue by adding sizetype iv_cand for BIVs
>>> directly.  In cases if the original BIV is preferred, the sizetype iv_cand
>>> will be chosen.  As for code generation, the sizetype iv_cand has the same
>>> effect as the original BIV.  Actually, it's better because BIV needs to be
>>> explicitly extended to sizetype to be used in address expression on most
>>> targets.
>>>
>>> One shortage of this approach is it may introduce more iv candidates.  To
>>> minimize the impact, this patch does sophisticated code analysis and adds
>>> sizetype candidate for BIV only if it is used as index.  Moreover, it avoids
>>> to add candidate of the original type if the BIV is only used as index.
>>> Statistics for compiling spec2k6 shows increase of candidate number is
>>> modest and can be ignored.
>>>
>>> There are two more patches following to fix corner cases revealed by this
>>> one.  In together they bring obvious perf improvement for spec26k/int on
>>> aarch64.
>>> Spec2k6/int
>>> 400.perlbench   3.44%
>>> 445.gobmk   -0.86%
>>> 456.hmmer   14.83%
>>> 458.sjeng   2.49%
>>> 462.libquantum  -0.79%
>>> GEOMEAN 1.68%
>>>
>>> There is also about 0.36% improvement for spec2k6/fp, mostly because of case
>>> 436.cactusADM.  I believe it can be further improved, but that should be
>>> another patch.
>>>
>>> I also collected benchmark data for x86_64.  Spec2k6/fp is not affected.  As
>>> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is
>>> regressed by ~3%.  I can see BIVs are chosen for some loops instead of
>>> address candidates.  Generally, the loop header will be simplified because
>>> iv elimination with BIV is simpler; the number of instructions in loop body
>>> isn't changed.  I suspect the regression comes from different addressing
>>> modes.  With BIV, complex addressing mode like [base + index << scale +
>>> disp] is used, rather than [base + disp].  I guess the former has more
>>> micro-ops, thus more expensive.  This guess can be confirmed by manually
>>> suppressing the complex addressing mode with higher address cost.
>>> Now the problem becomes why overall cost of BIV is computed lower while the
>>> actual cost is higher.  I noticed for most affected loops, loop header is
>>> bloated because of iv elimination using the old address candidate.  The
>>> bloated loop header results in much higher cost than BIV.  As a result, BIV
>>> is preferred.  I also noticed the bloated loop header generally can be
>>> simplified (I have a following patch for this).  After applying the local
>>> patch, the old address candidate is chosen, and most of regression is
>>> recovered.
>>> Conclusion is I think loop header bloated issue should be blamed for the
>>> regression, and it can be resolved.
>>>
>>> Bootstrap and test on x64_64 and aarch64.  It fixes failure of
>>> gcc.target/i386/pr49781-1.c, without new breakage.
>>>
>>> So what do you think?
>>
>> The data above looks ok to me.
>>
>> +static struct iv *
>> +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv)
>> +{
>> +  aff_tree aff;
>> +  struct expand_data exp_data;
>> +
>> +  if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME)
>> +return iv;
>> +
>> +  /* Expand IV's ssa_name till the deriving biv is found.  */
>> +  exp_data.data = data;
>> +  exp_data.biv = NULL;
>> +  tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name),
>> + &aff, &data->name_expansion_cache,
>> + stop_expand, &exp_data);
>> +  return exp_data.biv;
>>
>> that's actually "abusing" tree_to_aff_combination_expand for simply walking
>> SSA uses and their defs uses recursively until you hit "stop".  ISTR past
>> discussion to add a generic walk_ssa_use interface for that.  Not sure if it
>> materialized with a name I can't remember or whether it didn't.
> Thanks for reviewing.  I didn't found existing interface to walk up
> definition chains of ssa vars.  In this updated patch, I implemented a
> simple function which meets the minimal requirement of walking up
> definition chains of

Re: [PATCH PR66388]Add sizetype cand for BIV of smaller type if it's used as index of memory ref

2015-09-08 Thread Bin.Cheng

On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener
 wrote:
> On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng  wrote:
>> Hi,
>> This patch is a new approach to fix PR66388.  IVO today computes iv_use with
>> iv_cand which has at least same type precision as the use.  On 64bit
>> platforms like AArch64, this results in different iv_cand created for each
>> address type iv_use, and register pressure increased.  As a matter of fact,
>> the BIV should be used for all iv_uses in some of these cases.  It is a
>> latent bug but recently getting worse because of overflow changes.
>>
>> The original approach at
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue
>> except it conflict with IV elimination.  Seems to me it is impossible to
>> mitigate the contradiction.
>>
>> This new approach fixes the issue by adding sizetype iv_cand for BIVs
>> directly.  In cases if the original BIV is preferred, the sizetype iv_cand
>> will be chosen.  As for code generation, the sizetype iv_cand has the same
>> effect as the original BIV.  Actually, it's better because BIV needs to be
>> explicitly extended to sizetype to be used in address expression on most
>> targets.
>>
>> One shortage of this approach is it may introduce more iv candidates.  To
>> minimize the impact, this patch does sophisticated code analysis and adds
>> sizetype candidate for BIV only if it is used as index.  Moreover, it avoids
>> to add candidate of the original type if the BIV is only used as index.
>> Statistics for compiling spec2k6 shows increase of candidate number is
>> modest and can be ignored.
>>
>> There are two more patches following to fix corner cases revealed by this
>> one.  In together they bring obvious perf improvement for spec26k/int on
>> aarch64.
>> Spec2k6/int
>> 400.perlbench   3.44%
>> 445.gobmk   -0.86%
>> 456.hmmer   14.83%
>> 458.sjeng   2.49%
>> 462.libquantum  -0.79%
>> GEOMEAN 1.68%
>>
>> There is also about 0.36% improvement for spec2k6/fp, mostly because of case
>> 436.cactusADM.  I believe it can be further improved, but that should be
>> another patch.
>>
>> I also collected benchmark data for x86_64.  Spec2k6/fp is not affected.  As
>> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is
>> regressed by ~3%.  I can see BIVs are chosen for some loops instead of
>> address candidates.  Generally, the loop header will be simplified because
>> iv elimination with BIV is simpler; the number of instructions in loop body
>> isn't changed.  I suspect the regression comes from different addressing
>> modes.  With BIV, complex addressing mode like [base + index << scale +
>> disp] is used, rather than [base + disp].  I guess the former has more
>> micro-ops, thus more expensive.  This guess can be confirmed by manually
>> suppressing the complex addressing mode with higher address cost.
>> Now the problem becomes why overall cost of BIV is computed lower while the
>> actual cost is higher.  I noticed for most affected loops, loop header is
>> bloated because of iv elimination using the old address candidate.  The
>> bloated loop header results in much higher cost than BIV.  As a result, BIV
>> is preferred.  I also noticed the bloated loop header generally can be
>> simplified (I have a following patch for this).  After applying the local
>> patch, the old address candidate is chosen, and most of regression is
>> recovered.
>> Conclusion is I think loop header bloated issue should be blamed for the
>> regression, and it can be resolved.
>>
>> Bootstrap and test on x64_64 and aarch64.  It fixes failure of
>> gcc.target/i386/pr49781-1.c, without new breakage.
>>
>> So what do you think?
>
> The data above looks ok to me.
>
> +static struct iv *
> +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv)
> +{
> +  aff_tree aff;
> +  struct expand_data exp_data;
> +
> +  if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME)
> +return iv;
> +
> +  /* Expand IV's ssa_name till the deriving biv is found.  */
> +  exp_data.data = data;
> +  exp_data.biv = NULL;
> +  tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name),
> + &aff, &data->name_expansion_cache,
> + stop_expand, &exp_data);
> +  return exp_data.biv;
>
> that's actually "abusing" tree_to_aff_combination_expand for simply walking
> SSA uses and their defs uses recursively until you hit "stop".  ISTR past
> discussion to add a generic walk_ssa_use interface for that.  Not sure if it
> materialized with a name I can't remember or whether it didn't.
Thanks for reviewing.  I didn't found existing interface to walk up
definition chains of ssa vars.  In this updated patch, I implemented a
simple function which meets the minimal requirement of walking up
definition chains of BIV variables.  I also counted number of
no_overflow BIVs that are not used in address type use.  Since
generally there are only two BIVs in a loop, this

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Do not ifcvt complex blocks if the else block is empty

2015-09-08 Thread Rainer Orth

Hi Kyrill,

> PR rtl-optimization/67481 is a testsuite regression on sparc-solaris that
> Rainer reported. I haven't tested
> that this patch fixes that, but I suspect that the root cause is the
> same. Rainer, could you please
> check that this fixes the regression for you?

I've now checked that with your patch the regression went away indeed,
using a limited non-bootstrap build on sparc-sun-solaris2.10.  Next I'll
run a full bootstrap to check there are no other issues.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-09-08 Thread Jakub Jelinek

Hi!

This patch does two things:
1) removes fatal error from #pragma omp target update if object is not
mapped (at all, partial mapping is still a fatal error); the 4.1 draft spec
says that nothing is copied if the object is not mapped (first hunk)

2) implements nowait support for #pragma omp target {update,{enter,exit} data}
- if depend clause is not present, nowait is ignored, similarly if there is
no team (not inside of a parallel), or if the encountering task is final,
or if no children of the current task had depend clauses yet.  Otherwise,
a task is created, and when the dependencies are resolved and the task
scheduler will schedule it, it will perform the required update/enter/exit
action(s).  If there are depend clauses, the "target task" is not really
executed "immediately" as the spec says, but the spec is broken and I
believe is going to change (the question is when and to what wording).

nowait support for #pragma omp target is not implemented yet, supposedly we
need to mark those somehow (some flag) already in the struct gomp_task
structure, essentially it will need either 2 or 3 callbacks
(the current one, executed when the dependencies are resolved (it actually
waits until some thread schedules it after that point, I think it is
undesirable to run it with the tasking lock held), which would perform
the gomp_map_vars and initiate the running of the region, and then some
query routine which would poll the plugin whether the task is done or not,
and either perform the finalization (unmap_vars) if it is done (and in any
case return bool whether it should be polled again or not), and if the
finalization is not done there, also another callback for the finalization.
Also, there is the issue that if we are waiting for task that needs to be
polled, and we don't have any further tasks to run, we shouldn't really
attempt to sleep on some semaphore (e.g. in taskwait, end of
taskgroup, etc.) or barrier, but rather either need to keep polling it, or
call the query hook with some argument that it should sleep in there until
the work is done by the offloading device.
Also, there needs to be a way for the target nowait first callback to say
that it is using host fallback and thus acts as a normal task, therefore
once the task fn finishes, the task is done.

2015-09-08  Jakub Jelinek  

* target.c (gomp_update): Remove fatal error if object is not mapped.

* target.c (GOMP_target_update_41): Handle nowait update with
dependencies.  Don't call gomp_update if parallel or taskgroup has
been cancelled.
(GOMP_target_enter_exit_data): Likewise.
(gomp_target_task_fn): New function.
* task.c (gomp_task_handle_depend): New function, copied from...
(GOMP_task): ... here.  Use gomp_task_handle_depend.
(gomp_create_target_task): New function.
* libgomp.h (struct gomp_target_task): New type.
(gomp_create_target_task, gomp_target_task_fn): New prototypes.
* testsuite/libgomp.c/target-27.c: New test.

--- libgomp/target.c.jj 2015-09-03 16:51:06.0 +0200
+++ libgomp/target.c2015-09-08 09:55:24.591484158 +0200
@@ -899,13 +899,6 @@ gomp_update (struct gomp_device_descr *d
- n->host_start),
  cur_node.host_end - cur_node.host_start);
  }
-   else
- {
-   gomp_mutex_unlock (&devicep->lock);
-   gomp_fatal ("Trying to update [%p..%p) object that is not mapped",
-   (void *) cur_node.host_start,
-   (void *) cur_node.host_end);
- }
   }
   gomp_mutex_unlock (&devicep->lock);
 }
@@ -1460,18 +1453,50 @@ GOMP_target_update_41 (int device, size_
   /* If there are depend clauses, but nowait is not present,
  block the parent task until the dependencies are resolved
  and then just continue with the rest of the function as if it
- is a merged task.  */
+ is a merged task.  Until we are able to schedule task during
+ variable mapping or unmapping, ignore nowait if depend clauses
+ are not present.  */
   if (depend != NULL)
 {
   struct gomp_thread *thr = gomp_thread ();
   if (thr->task && thr->task->depend_hash)
-   gomp_task_maybe_wait_for_dependencies (depend);
+   {
+ if ((flags & GOMP_TARGET_FLAG_NOWAIT)
+ && thr->ts.team
+ && !thr->task->final_task)
+   {
+ gomp_create_target_task (devicep, (void (*) (void *)) NULL,
+  mapnum, hostaddrs, sizes, kinds,
+  flags | GOMP_TARGET_FLAG_UPDATE,
+  depend);
+ return;
+   }
+
+ struct gomp_team *team = thr->ts.team;
+ /* If parallel or taskgroup has been cancelled, don't start new
+tasks.  */
+ if (team
+ && (gomp_team_barrier_cancelled (&team->ba

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Do not ifcvt complex blocks if the else block is empty

2015-09-08 Thread Kyrill Tkachov



On 07/09/15 20:14, H.J. Lu wrote:

On Mon, Sep 7, 2015 at 9:29 AM, Kyrill Tkachov  wrote:

Hi all,

This patch fixes the PRs in the ChangeLog that have been reported against my
if-conversion patch.
The problem occurs when the 'then' block is complex but the else block is
empty.
In this case the calling code in noce_process_if_block takes the 'else' move
(x := b) from
the test block. However, we have not checked whether the test block is valid
for complex-block
if-conversion with bb_valid_for_noce_process_p. Also, that's a case I wasn't
particularly targeting
when writing the initial patch.

This patch bails out of noce_try_cmove_arith when one of the blocks is
complex and the other is empty.
I've checked that if-conversion still happens in the cases of interest from
the original patch.

I've added the testcase from PR 67465 since that one uses __builtin_abort
and triggers the problem nicely.
The others show the miscompilation using printf seems to go away if I
replace it with an abort.
I have confirmed manually that the miscompilation goes away on those
testcases.

PR rtl-optimization/67481 is a testsuite regression on sparc-solaris that
Rainer reported. I haven't tested
that this patch fixes that, but I suspect that the root cause is the same.
Rainer, could you please
check that this fixes the regression for you?

Bootstrapped and tested on aarch64 and x86_64.

Ok for trunk if sparc testing comes ok?

Thanks,
Kyrill

2015-09-07  Kyrylo Tkachov  

 PR rtl-optimization/67456
 PR rtl-optimization/67464
 PR rtl-optimization/67465
 PR rtl-optimization/67481
 * ifcvt.c (noce_try_cmove_arith): Bail out if one of the blocks
 is complex and the other is empty.

2015-09-07  Kyrylo Tkachov  

 * gcc.dg/pr67465.c: New test.

Does it fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462


No, PR 67462 is a testism. I've added a comment to the issue with my thoughts.

Kyrill

Re: [PATCH] PR target/67480: AVX512 bitwise logic insns pattern is incorrect

2015-09-08 Thread Kirill Yukhin

Hi,
On 07 Sep 19:07, Alexander Fomin wrote:
>  (define_insn "3"
> -  [(set (match_operand:VI 0 "register_operand" "=x,v")
> - (any_logic:VI
> -   (match_operand:VI 1 "nonimmediate_operand" "%0,v")
> -   (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
> +  [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,v")
> + (any_logic:VI48_AVX_AVX512F
> +   (match_operand:VI48_AVX_AVX512F 1 "nonimmediate_operand" "%0,v")
> +   (match_operand:VI48_AVX_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
>"TARGET_SSE && 
> && ix86_binary_operator_ok (, mode, operands)"
>  {
> @@ -11109,13 +7,104 @@
>  case V4DImode:
>  case V4SImode:
>  case V2DImode:
> -  if (TARGET_AVX512VL)
> +  tmp = TARGET_AVX512VL ? "p" : "p";
Suppose masking is applied and 1st alternative chosen...
> +  break;
> +default:
> +  gcc_unreachable ();
> +  }
> +  break;
> +
> +   case MODE_V16SF:
> +  gcc_assert (TARGET_AVX512F);
> +   case MODE_V8SF:
> +  gcc_assert (TARGET_AVX);
> +   case MODE_V4SF:
> +  gcc_assert (TARGET_SSE);
> +
> +  tmp = "ps";
> +  break;
> +
> +   default:
> +  gcc_unreachable ();
> +   }
> +
> +  switch (which_alternative)
> +{
> +case 0:
> +  ops = "%s\t{%%2, %%0|%%0, %%2}";
We'll reach here having p %xmm17, %xmm18 w/o even mention of mask 
register.
I think we need to check here if masking is needed and emit EVEX version (3 
args + mask).
> +  break;
> +case 1:
> +  ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, 
> %%2}";
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  snprintf (buf, sizeof (buf), ops, tmp);
> +  return buf;
> +}
...
> +
> +(define_insn "*3"
> +  [(set (match_operand:VI12_AVX_AVX512F 0 "register_operand" "=x,v")
> + (any_logic: VI12_AVX_AVX512F
> +   (match_operand:VI12_AVX_AVX512F 1 "nonimmediate_operand" "%0,v")
> +   (match_operand:VI12_AVX_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
> +  "TARGET_SSE && ix86_binary_operator_ok (, mode, operands)"
> +{
> +  static char buf[64];
> +  const char *ops;
> +  const char *tmp;
> +
> +  switch (get_attr_mode (insn))
> +{
> +case MODE_XI:
> +  gcc_assert (TARGET_AVX512F);
> +case MODE_OI:
> +  gcc_assert (TARGET_AVX2 || TARGET_AVX512VL);
> +case MODE_TI:
> +  gcc_assert (TARGET_SSE2 || TARGET_AVX512VL);
> +  switch (mode)
> +  {
> +case V64QImode:
> +case V32HImode:
> +  if (TARGET_AVX512F)
>{
> -tmp = "p";
> +tmp = "pq";
>  break;
>}
> -default:
> +case V32QImode:
> +case V16HImode:
> +case V16QImode:
> +case V8HImode:
>tmp = TARGET_AVX512VL ? "pq" : "p";
Despite of alternative chosen, you force insn to be pq (when compiled w/ 
-mavx512vl).
> +  break;
> +default:
> +  gcc_unreachable ();
>}
>break;
>  
> @@ -11139,7 +11238,7 @@
>ops = "%s\t{%%2, %%0|%%0, %%2}";
So, here you'll emit, e.g. "pandq %xmm16, %xmm17"
If think it'll be better to attach AVX-512VL related suffix while discriminating
alternatives.
>break;
>  case 1:
> -  ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, 
> %%2}";
> +  ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
>break;
>  default:
>gcc_unreachable ();
...


--
Thanks, K

Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG

2015-09-08 Thread James Greenhalgh

On Wed, Sep 02, 2015 at 02:00:23PM +0100, Kyrill Tkachov wrote:
> 
> On 01/09/15 11:40, Kyrill Tkachov wrote:
> > Hi James,
> >
> > On 01/09/15 10:25, James Greenhalgh wrote:
> >> On Thu, Aug 13, 2015 at 01:36:50PM +0100, Kyrill Tkachov wrote:
> >>
> >> Some comments below.
> > Thanks, I'll incorporate them, with one clarification inline.
> 
> And here's the updated patch.

OK.

Thanks,
James

> 2015-09-02  Kyrylo Tkachov  
> 
>   * config/aarch64/aarch64.md (mod3): New define_expand.
>   (*neg2_compare0): Rename to...
>   (neg2_compare0): ... This.
>   * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case):
>   Move check for speed inside the if-then-elses.  Reflect
>   CSNEG sequence in MOD by power of 2 case.

Re: [PATCH][ARM][3/3] Expand mod by power of 2

2015-09-08 Thread Kyrill Tkachov



On 07/09/15 10:27, Ramana Radhakrishnan wrote:


On 24/07/15 11:55, Kyrill Tkachov wrote:

commit d562629e36ba013b8f77956a74139330d191bc30
Author: Kyrylo Tkachov 
Date:   Fri Jul 17 16:30:01 2015 +0100

 [ARM][3/3] Expand mod by power of 2

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e1bc727..6ade07c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9556,6 +9556,22 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
  
  case MOD:

  case UMOD:
+  /* MOD by a power of 2 can be expanded as:
+rsbsr1, r0, #0
+and r0, r0, #(n - 1)
+and r1, r1, #(n - 1)
+rsbpl   r0, r1, #0.  */
+  if (code == MOD
+ && CONST_INT_P (XEXP (x, 1))
+ && exact_log2 (INTVAL (XEXP (x, 1))) > 0
+ && mode == SImode)
+   {
+ *cost += COSTS_N_INSNS (3)
+  + 2 * extra_cost->alu.logical
+  + extra_cost->alu.arith;
+ return true;
+   }
+
*cost = LIBCALL_COST (2);
return false;   /* All arguments must be in registers.  */
  
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

index f341109..8301648 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1229,7 +1229,7 @@ (define_peephole2
""
  )
  
-(define_insn "*subsi3_compare0"

+(define_insn "subsi3_compare0"
[(set (reg:CC_NOOV CC_REGNUM)
(compare:CC_NOOV
 (minus:SI (match_operand:SI 1 "arm_rhs_operand" "r,r,I")
@@ -2158,7 +2158,7 @@ (define_expand "andsi3"
  )
  
  ; ??? Check split length for Thumb-2

-(define_insn_and_split "*arm_andsi3_insn"
+(define_insn_and_split "arm_andsi3_insn"
[(set (match_operand:SI 0 "s_register_operand" "=r,l,r,r,r")
(and:SI (match_operand:SI 1 "s_register_operand" "%r,0,r,r,r")
(match_operand:SI 2 "reg_or_int_operand" "I,l,K,r,?n")))]
@@ -11105,6 +11105,78 @@ (define_expand "thumb_legacy_rev"
""
  )

This shouldn't be necessary - you are just adding another interface to produce 
an and insn.


I see what you mean. Ok, I'll drop this.



  
+;; ARM-specific expansion of signed mod by power of 2

+;; using conditional negate.
+;; For r0 % n where n is a power of 2 produce:
+;; rsbsr1, r0, #0
+;; and r0, r0, #(n - 1)
+;; and r1, r1, #(n - 1)
+;; rsbpl   r0, r1, #0
+
+(define_expand "modsi3"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "const_int_operand" "")]
+  "TARGET_32BIT"
+  {
+HOST_WIDE_INT val = INTVAL (operands[2]);
+
+if (val <= 0
+   || exact_log2 (INTVAL (operands[2])) <= 0
+   || !const_ok_for_arm (INTVAL (operands[2]) - 1))
+  FAIL;
+
+rtx mask = GEN_INT (val - 1);
+
+/* In the special case of x0 % 2 we can do the even shorter:
+   cmp r0, #0
+   and r0, r0, #1
+   rsblt   r0, r0, #0.  */
+
+if (val == 2)
+  {
+   rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+   rtx cond = gen_rtx_LT (SImode, cc_reg, const0_rtx);
+
+   emit_insn (gen_rtx_SET (cc_reg,
+   gen_rtx_COMPARE (CCmode, operands[1], const0_rtx)));
+
+   rtx masked = gen_reg_rtx (SImode);
+   emit_insn (gen_arm_andsi3_insn (masked, operands[1], mask));

Use emit_insn (gen_andsi3 (masked, operands[1], mask) instead and likewise 
below.


Ok, done that. A side effect of this is that since the andsi3 expander handles
any reg_or_int we can catch more masks this way. Also, due to the way the 
andsi3 expander
is written, for the mask 255 it will generate a zero_extend instead of an and.
This may or may not be optimal on some cores but perhaps we should look at the 
andsi3 expander
to make it more robust as a separate task.





+   emit_move_insn (operands[0],
+   gen_rtx_IF_THEN_ELSE (SImode, cond,
+ gen_rtx_NEG (SImode,
+  masked),
+ masked));
+   DONE;
+  }
+
+rtx neg_op = gen_reg_rtx (SImode);
+rtx_insn *insn = emit_insn (gen_subsi3_compare0 (neg_op, const0_rtx,
+ operands[1]));
+
+/* Extract the condition register and mode.  */
+rtx cmp = XVECEXP (PATTERN (insn), 0, 0);
+rtx cc_reg = SET_DEST (cmp);
+rtx cond = gen_rtx_GE (SImode, cc_reg, const0_rtx);
+
+emit_insn (gen_arm_andsi3_insn (operands[0], operands[1], mask));
+
+rtx masked_neg = gen_reg_rtx (SImode);
+emit_insn (gen_arm_andsi3_insn (masked_neg, neg_op, mask));
+
+/* We want a conditional negate here, but emitting COND_EXEC rtxes
+   during expand does not always work.  Do an IF_THEN_ELSE instead.  */
+emit_move_insn (operands[0],
+   gen_rtx_IF_THEN_ELSE (SImode, cond,
+ gen_rtx_NEG (SImode, masked_neg),
+

Re: [PATCH 11/15][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

2015-09-08 Thread James Greenhalgh

On Tue, Jul 28, 2015 at 12:26:22PM +0100, Alan Lawrence wrote:
> gcc/ChangeLog:
> 
>   * config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
>   vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
>   vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
>   vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
>   vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
>   vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
>   vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
>   vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
>   vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
>   vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
>   vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
>   vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
>   vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16,
>   vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
>   vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
>   vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
>   vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16,
>   vld1q_dup_f16): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vget_high_1.c: Add float16x8->float16x4 case.
>   * gcc.target/aarch64/vget_low_1.c: Likewise.


OK,

Thanks,
James

Re: [PATCH][AArch64] Improve code generation for float16 vector code

2015-09-08 Thread James Greenhalgh

On Tue, Sep 08, 2015 at 09:21:08AM +0100, James Greenhalgh wrote:
> On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
> > On 04/09/15 13:32, James Greenhalgh wrote:
> > > In that case, these should be implemented as inline assembly blocks. As it
> > > stands, the code generation for these intrinsics will be very poor with 
> > > this
> > > patch applied.
> > >
> > > I'm going to hold off OKing this until I see a follow-up to fix the code
> > > generation, either replacing those particular intrinsics with inline asm,
> > > or doing the more comprehensive fix in the back-end.
> > >
> > > Thanks,
> > > James
> > 
> > In that case, here is the follow-up now ;). This fixes each of the following
> > functions to generate a single instruction followed by ret:
> >   * vld1_dup_f16, vld1q_dup_f16
> >   * vset_lane_f16, vsetq_lane_f16
> >   * vget_lane_f16, vgetq_lane_f16
> >   * For IN of type either float16x4_t or float16x8_t, and constant C:
> > return (float16x4_t) {in[C], in[C], in[C], in[C]};
> >   * Similarly,
> > return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], 
> > in[C]};
> > (These correspond intuitively to what one might expect for "vdup_lane_f16",
> > "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics,
> > although such intrinsics do not actually exist.)
> > 
> > This patch does not deal with equivalents to vdup_n_s16 and other intrinsics
> > that load immediates, rather than using elements of pre-existing vectors.
> 
> What is code generation like for these then? if I remeber correctly it
> was the vdup_n_f16 implementation that looked most objectionable before.

Ah, I see what you are saying here. You mean: if there were intrinsics
equivalent to vdup_n_s16 (which there are not), then this patch would not
handle them. I was confused as vld1_dup_f16 does not use an element of a
pre-existing vector, and may well load an immediate, but is handled by
your patch.

Sorry for the noise.

James

Re: [PATCH][AArch64] Improve code generation for float16 vector code

2015-09-08 Thread James Greenhalgh

On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
> On 04/09/15 13:32, James Greenhalgh wrote:
> > In that case, these should be implemented as inline assembly blocks. As it
> > stands, the code generation for these intrinsics will be very poor with this
> > patch applied.
> >
> > I'm going to hold off OKing this until I see a follow-up to fix the code
> > generation, either replacing those particular intrinsics with inline asm,
> > or doing the more comprehensive fix in the back-end.
> >
> > Thanks,
> > James
> 
> In that case, here is the follow-up now ;). This fixes each of the following
> functions to generate a single instruction followed by ret:
>   * vld1_dup_f16, vld1q_dup_f16
>   * vset_lane_f16, vsetq_lane_f16
>   * vget_lane_f16, vgetq_lane_f16
>   * For IN of type either float16x4_t or float16x8_t, and constant C:
> return (float16x4_t) {in[C], in[C], in[C], in[C]};
>   * Similarly,
> return (float16x8_t) {in[C], in[C], in[C], in[C], in[C], in[C], in[C], in[C]};
> (These correspond intuitively to what one might expect for "vdup_lane_f16",
> "vdup_laneq_f16", "vdupq_lane_f16" and "vdupq_laneq_f16" intrinsics,
> although such intrinsics do not actually exist.)
> 
> This patch does not deal with equivalents to vdup_n_s16 and other intrinsics
> that load immediates, rather than using elements of pre-existing vectors.

What is code generation like for these then? if I remeber correctly it
was the vdup_n_f16 implementation that looked most objectionable before.

> I'd welcome thoughts/opinions on what testcase would be appropriate.
> Correctness of all the intrinsics is already tested by the advsimd-intrinsics
> testsuite, and the only way I can see to verify code generation, is to
> scan-assembler looking for particular instructions; do we wish to see more
> scan-assembler tests?

I think these are fine without a test case, as you say corectness is
already handled elsewhere.

> Bootstrapped + check-gcc on aarch64-none-linux-gnu.

OK,

Thanks,
James

> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md (aarch64_simd_dup,
>   aarch64_dup_lane, aarch64_dup_lane_,
>   aarch64_simd_vec_set, vec_set, vec_perm_const,
>   vec_init, *aarch64_simd_ld1r, vec_extract): Add
>   V4HF and V8HF variants to iterator.
> 
>   * config/aarch64/aarch64.c (aarch64_evpc_dup): Add V4HF and V8HF cases.
> 
>   * config/aarch64/iterators.md (VDQF_F16): New.
>   (VSWAP_WIDTH, vswap_width_name): Add V4HF and V8HF cases.

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-09-08 Thread Renlin Li


Hi Andrew,

Previously, there is a discussion thread in binutils mailing list:

https://sourceware.org/ml/binutils/2015-04/msg00032.html

Nick proposed a way to fix, Richard Henderson hold similar opinion as you.

Regards,
Renlin

On 07/09/15 12:45, pins...@gmail.com wrote:





On Sep 7, 2015, at 7:22 PM, Kugan  wrote:



On 07/09/15 20:46, Wilco Dijkstra wrote:

Kugan wrote:
2. vector-compare-1.c from c-c++-common/torture fails to assemble with
-O3 -g Error: unaligned opcodes detected in executable segment. It works
fine if I remove the -g. I am looking into it and needs to be fixed as well.

This is a known assembler bug I found a while back, Renlin is looking into it.
Basically when debug tables are inserted at the end of a code section the
assembler doesn't align to the alignment required by the debug tables.

This is precisely what seems to be happening. Renlin, could you please
let me know when you have a patch (even if it is a prototype or a hack).


I had noticed that but I read through the assembler code and it sounded very 
much like it was a designed this way and that the compiler was not supposed to 
emit assembly like this and fix up the alignment.

Thanks,
Andrew


Thanks,
Kugan

Re: [Patch, avr] Fix PR65210

2015-09-08 Thread Senthil Kumar Selvaraj

On Tue, Sep 08, 2015 at 09:16:35AM +0200, Georg-Johann Lay wrote:
> Senthil Kumar Selvaraj schrieb:
> >  This (rather trivial) patch fixes PR65210. The ICE happens because code
> >  wasn't handling io_low attribute where it is supposed to.
> 
> Hi Senthil, could you line out for what these new attributes are good for?
> The Compiler just maps the argument to a compile-time const, so the
> attributes do the same as a cast to a volatile address.  Except that  the
> user must know in advance what I/O Region is the right one.

IIRC, Joern Rennecke added these so you could have extern declarations
(without actual addresses) specifying that they will be in IO address
space, so that the compiler can generate IN/OUT or the bit addressable
instructions (SBI/CBI etc..).

For example,

extern char volatile SOMEPORT __attribute__((io_low));
int main()
{
  SOMEPORT |= 1;
}

$ ~/avr/install/bin/avr-gcc -mmcu=avr5 useio.c -Os -S
$ cat useio.s

sbi SOMEPORT-32,0
ldi r24,0
ldi r25,0
ret

whereas one without io_low will generate a lds, ori, sts sequence.
> 
> Supporting similar RELOCs for io* doesn't make hardly any sense, or are
> there Plans to add respective RELOCs (and ones for bitfields)?

binutils already has those - R_AVR_PORT6 and R_AVR_PORT5 do that. This was added
back in 2014 (https://sourceware.org/ml/binutils/2014-06/msg00122.html).

Re: [Patch, avr] Fix PR65210

2015-09-08 Thread Georg-Johann Lay


Senthil Kumar Selvaraj schrieb:

  This (rather trivial) patch fixes PR65210. The ICE happens because code
  wasn't handling io_low attribute where it is supposed to.


Hi Senthil, could you line out for what these new attributes are good 
for?  The Compiler just maps the argument to a compile-time const, so 
the attributes do the same as a cast to a volatile address.  Except that 
 the user must know in advance what I/O Region is the right one.


Supporting similar RELOCs for io* doesn't make hardly any sense, or are 
there Plans to add respective RELOCs (and ones for bitfields)?


Thanks, Johann

95 matches

Mail list logo