Re: [www-patch] Document new -Wshadow= variants in gcc-7/changes.html

2016-11-22 Thread Gerald Pfeifer
A few markup fixes on top of the committed patch that I just
applied.

Essentially  must not be within  and  was
missing in one case.  

Thanks again for providing this nice documentation!

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.27
retrieving revision 1.30
diff -u -r1.27 -r1.30
--- changes.html22 Nov 2016 10:14:25 -  1.27
+++ changes.html23 Nov 2016 07:38:58 -  1.30
@@ -131,7 +131,9 @@
 shadowing variable can be converted to that of the shadowed variable).
 
 The following example shows the different kinds of shadow
-warnings:
+warnings:
+
+
 enum operation { add, count };
 struct container { int nr; };
 
@@ -145,34 +147,41 @@
   r += count.nr;
 }
   return r;
-}
+}
 
 -Wshadow=compatible-local will warn for the parameter being
-shadowed with the same type:
+shadowed with the same type:
+
+
 warn-test.c:8:12: warning: declaration 
of 'count' shadows a parameter [-Wshadow=compatible-local]
for (int count = 0; count > 0; count--)
 ^
 warn-test.c:5:42: note: shadowed 
declaration is here
  container_count (struct container c, int count)
-  ^
+  ^
 
 -Wshadow=local will warn for the above and for the shadowed
-declaration with incompatible type:
+declaration with incompatible type:
+
+
 warn-test.c:10:24: warning: 
declaration of 'count' shadows a previous local [-Wshadow=local]
struct container count = c;
 ^
 warn-test.c:8:12: note: shadowed 
declaration is here
for (int count = 0; count > 0; count--)
-^
+^
 
 -Wshadow=global will warn for all of the above and the 
shadowing
-of the global declaration: 
+of the global declaration:
+
+
 warn-test.c:5:42: warning: declaration 
of 'count' shadows a global declaration [-Wshadow]
  container_count (struct container c, int count)
   ^
 warn-test.c:1:23: note: shadowed 
declaration is here
  enum operation { add, count };
-   ^
+   ^
+
 
 
 C


Re: [PATCH] Fix PR78230

2016-11-22 Thread Kito Cheng
Hi Jeff:
Thanks your review and approve, however I don't have commit right yet,
 can you help me to commit it :)

thanks

On Wed, Nov 23, 2016 at 1:04 AM, Jeff Law  wrote:
>
> On 11/08/2016 07:43 PM, Kito Cheng wrote:
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-11-09  Kito Cheng 
>>
>> PR target/78230
>> * gcc.dg/torture/pr66178.c (test): Use uintptr_t instead of int.
>> (test2) Ditto.
>
> OK.
> jeff


[PATCH] remove invalid "tail +16c"

2016-11-22 Thread ma . jiang
Hi all,
  In "config/acx.m4", there are still some "tail +16c"  which are invalid 
on POSIX systems. 
  In my opinion, all "tail +16c" should be changed to "tail -c +16" 
directly, as most systems has accept the latter.
  And, to skip first 16 bytes, we should use "tail -c +17" instead of 
"tail -c +16".

 * config/acx.m4:Change "tail +16c" to "tail -c +17".
 * configure: Regenerate.
--- gcc-6.2.0/config/acx.m4 2011-12-18 17:58:37.0 +0800
+++ gcc-6.2.0-bak/config/acx.m4 2016-11-23 10:56:21.065817691 +0800
@@ -404,7 +404,7 @@ AC_DEFUN([ACX_PROG_CMP_IGNORE_INITIAL],
 [AC_CACHE_CHECK([how to compare bootstrapped objects], 
gcc_cv_prog_cmp_skip,
 [ echo abfoo >t1
   echo cdfoo >t2
-  gcc_cv_prog_cmp_skip='tail +16c $$f1 > tmp-foo1; tail +16c $$f2 > 
tmp-foo2; cmp tmp-foo1 tmp-foo2'
+  gcc_cv_prog_cmp_skip='tail -c +17 $$f1 > tmp-foo1; tail -c +17 $$f2 > 
tmp-foo2; cmp tmp-foo1 tmp-foo2'
   if cmp t1 t2 2 2 > /dev/null 2>&1; then
 if cmp t1 t2 1 1 > /dev/null 2>&1; then
   :


Re: [PING] [PATCH] Fix PR31096

2016-11-22 Thread Marc Glisse

On Wed, 23 Nov 2016, Hurugalawadi, Naveen wrote:


Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01049.html


Hi,

the gcc part looks good to me (I didn't look at the testsuite), but you'll 
need a real reviewer to approve the patch...


--
Marc Glisse


Re: GCC patch committed: Fix -fdump-go-spec to not align last field to type alignment

2016-11-22 Thread Ian Lance Taylor
On Tue, Nov 22, 2016 at 7:01 PM, Martin Sebor  wrote:
> On 11/22/2016 04:25 PM, Ian Lance Taylor wrote:
>>
>> The code that handles -fdump-go-spec was incorrectly trying to pad the
>> last field of a struct/union to the alignment of the overall
>> struct/union.  That is unnecessary and incorrect, as the alignment is
>> handled by go_force_record_alignment anyhow.  It caused a compiler
>> crash on x32 and various other 32-bit targets when generating the Go
>> version of the libffi ffi_closure type, which is explicitly aligned to
>> an 8 byte boundary but does not necessarily have that size.  This
>> caused PRs 78431 and 78432.  Ran bootstrap and Go testsuite on
>> x86_64-pc-linux-gnu, with x32 multilib enabled.  Committed to
>> mainline.
>
>
> I'm seeing failures in the gcc.misc-tests/godump-1.c test that may
> be related to this change.  The same failures are also reported on
> gcc-testresults:
>
>   https://gcc.gnu.org/ml/gcc-testresults/2016-11/msg02556.html

Thanks.  Fixed like so.  Committed to mainline.

Ian

2016-11-22  Ian Lance Taylor  

* gcc.misc-tests/godump-1.c: Update expected output for recent
changes.
Index: gcc.misc-tests/godump-1.c
===
--- gcc.misc-tests/godump-1.c   (revision 242737)
+++ gcc.misc-tests/godump-1.c   (revision 242738)
@@ -425,22 +425,22 @@ struct { int8_t e1; void *e2; } s2el;
 /* { dg-final { scan-file godump-1.out "(?n)^var _s2el struct \{ e1 int8; e2 
\\*byte; \}$" } } */
 
 typedef struct { void *e1; int8_t e2; } ts2eg;
-/* { dg-final { scan-file godump-1.out "(?n)^type _ts2eg struct \{ e1 \\*byte; 
e2 int8; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^type _ts2eg struct \{ e1 \\*byte; 
e2 int8; Godump_0_pad \\\[.\\\]byte; \}$" } } */
 
 struct { void *e1; int8_t e2; } s2eg;
-/* { dg-final { scan-file godump-1.out "(?n)^var _s2eg struct \{ e1 \\*byte; 
e2 int8; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^var _s2eg struct \{ e1 \\*byte; 
e2 int8; Godump_0_pad \\\[.\\\]byte; \}$" } } */
 
 typedef struct { int64_t l; int8_t c; int32_t i; int16_t s; } tsme;
-/* { dg-final { scan-file godump-1.out "(?n)^type _tsme struct \{ l int64; c 
int8; i int32; s int16; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^type _tsme struct \{ l int64; c 
int8; i int32; s int16; Godump_0_pad \\\[.\\\]byte; \}$" } } */
 
 struct { int64_t l; int8_t c; int32_t i; int16_t s; } sme;
-/* { dg-final { scan-file godump-1.out "(?n)^var _sme struct \{ l int64; c 
int\8; i int32; s int16; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^var _sme struct \{ l int64; c 
int\8; i int32; s int16; Godump_0_pad \\\[.\\\]byte; \}$" } } */
 
 typedef struct { int16_t sa[3]; int8_t ca[3]; } tsae;
-/* { dg-final { scan-file godump-1.out "(?n)^type _tsae struct \{ sa 
\\\[2\\+1\\\]int16; ca \\\[2\\+1\\\]int8; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^type _tsae struct \{ sa 
\\\[2\\+1\\\]int16; ca \\\[2\\+1\\\]int8; Godump_0_pad \\\[.\\\]byte; \}$" } } 
*/
 
 struct { int16_t sa[3]; int8_t ca[3]; } sae;
-/* { dg-final { scan-file godump-1.out "(?n)^var _sae struct \{ sa 
\\\[2\\+1\\\]int16; ca \\\[2\\+1\\\]int8; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^var _sae struct \{ sa 
\\\[2\\+1\\\]int16; ca \\\[2\\+1\\\]int8; Godump_0_pad \\\[.\\\]byte; \}$" } } 
*/
 
 typedef struct { float f; } tsf_equiv;
 /* { dg-final { scan-file godump-1.out "(?n)^type _tsf_equiv struct \{ f 
float\[0-9\]*; \}$" } } */
@@ -477,10 +477,10 @@ struct { struct { uint8_t ca[3]; } s; ui
 /* { dg-final { scan-file godump-1.out "(?n)^var _sn struct \{ s struct \{ ca 
\\\[2\\+1\\\]uint8; \}; i uint32; \}$" } } */
 
 typedef struct { struct { uint8_t a; uint16_t s; }; uint8_t b; } tsn_anon;
-/* { dg-final { scan-file godump-1.out "(?n)^type _tsn_anon struct \{ a uint8; 
s uint16; b uint8; Godump_0_align \\\[0\\\]int16; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^type _tsn_anon struct \{ a uint8; 
s uint16; b uint8; Godump_0_pad \\\[.\\\]byte; Godump_1_align \\\[0\\\]int16; 
\}$" } } */
 
 struct { struct { uint8_t a; uint16_t s; }; uint8_t b; } sn_anon;
-/* { dg-final { scan-file godump-1.out "(?n)^var _sn_anon struct \{ a uint8; s 
uint16; b uint8; Godump_0_align \\\[0\\\]int16; \}$" } } */
+/* { dg-final { scan-file godump-1.out "(?n)^var _sn_anon struct \{ a uint8; s 
uint16; b uint8; Godump_0_pad \\\[.\\\]byte; Godump_1_align \\\[0\\\]int16; 
\}$" } } */
 
 
 /*** structs with bitfields ***/
@@ -557,52 +557,52 @@ struct { uint16_t bf : 1; uint8_t c; } s
 /* { dg-final { scan-file godump-1.out "(?n)^var _sbf_pad16_1 struct \{ 
Godump_0_pad \\\[1\\\]byte; c uint8; Godump_1_align \\\[0\\\]int16; \}$" } } */
 
 typedef struct { uint16_t bf : 15; uint8_t c; } tsbf_pad16_2;
-/* { dg-final { scan-file godump-1.out "(?n)^type _tsbf_pad16_2 struct \{ 
Godump_0_pad \\\[2\\\]byte; c uint8; Godump_1_align \\\[0\\\]int16; \}$" } } */
+/* { dg-final { scan-file 

[PING] [PATCH] Fix PR71727

2016-11-22 Thread Hurugalawadi, Naveen
Hi,

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00697.html

Thanks,
Naveen



[PATCH] [AArch64] Fix PR71112

2016-11-22 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that fixes PR71112.

The current implementation that handles SYMBOL_SMALL_GOT_28K in
aarch64_load_symref_appropriately access the high part of RTX for Big-Endian
mode which results in ICE for ILP32.

The attached patch modifies it by accessing the lower part for both Endian
and fixes the issue.

Please review the patch and let me know if its okay?


2016-11-23  Andrew PInski  

gcc
* config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
Access the lower part of RTX appropriately.

gcc/testsuite
* gcc.target/aarch64/pr71112.c : New Testcase.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index efcba83..4d87953 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1298,7 +1298,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	emit_move_insn (gp_rtx, gen_rtx_HIGH (Pmode, s));
 
 	if (mode != GET_MODE (gp_rtx))
-	  gp_rtx = simplify_gen_subreg (mode, gp_rtx, GET_MODE (gp_rtx), 0);
+ gp_rtx = gen_lowpart (mode, gp_rtx);
+
 	  }
 
 	if (mode == ptr_mode)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr71112.c b/gcc/testsuite/gcc.target/aarch64/pr71112.c
new file mode 100644
index 000..5bb9dee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr71112.c
@@ -0,0 +1,12 @@
+/* PR target/71112 */
+/* { dg-do compile } */
+/* { dg-options "-mabi=ilp32 -mbig-endian -fpie" } */
+
+extern int dbs[100];
+void f (int *);
+int
+nscd_init (void)
+{
+  f (dbs);
+  return 0;
+}


[PING] [PATCH] Fix PR31096

2016-11-22 Thread Hurugalawadi, Naveen
Hi,

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01049.html

Thanks,
Naveen

[PATCH] [AArch64] Fix PR77635

2016-11-22 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that fixes PR77635.

Some load pair testcase fails when gcc is configured "--with-cpu=thunderx"
as -mcpu=generic is missed out in them.
The attached patch modifies the testcases to use -mcpu=generic.

Please review the patch and let me know if its okay?

2016-11-23  Naveen H.S  

* gcc.target/aarch64/ldp_stp_1.c : Add -mcpu=generic.
* gcc.target/aarch64/store-pair-1.c : Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
index 9de4e77..89550e0 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mcpu=generic" } */
 
 int arr[4][4];
 
diff --git a/gcc/testsuite/gcc.target/aarch64/store-pair-1.c b/gcc/testsuite/gcc.target/aarch64/store-pair-1.c
index a90fc61..b8e762b 100644
--- a/gcc/testsuite/gcc.target/aarch64/store-pair-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/store-pair-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mcpu=generic" } */
 
 int f(int *a, int b)
 {


[PATCH] [AArch64] Fix PR77634

2016-11-22 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that fixes PR77634.

Some testcase does not use -fno-vect-cost-model  and hence fails when gcc is
configured "--with-cpu=thunderx".
The attached patch modifies the testcases to use -fno-vect-cost-model.

Please review the patch and let me know if its okay?


2016-11-23  Naveen H.S  

* gcc.target/aarch64/fmaxmin.c : Add -fno-vect-cost-model.
* gcc.target/aarch64/fmul_fcvt_2.c : Likewise.
* gcc.target/aarch64/vect-abs-compile.c : Likewise.
* gcc.target/aarch64/vect-clz.c : Likewise.
* gcc.target/aarch64/vect-fcm-eq-d.c : Likewise.
* gcc.target/aarch64/vect-fcm-ge-d.c : Likewise.
* gcc.target/aarch64/vect-fcm-gt-d.c : Likewise.
* gcc.target/aarch64/vect-fmovd-zero.c : Likewise.
* gcc.target/aarch64/vect-fmovd.c : Likewise.
* gcc.target/aarch64/vect-fmovf-zero.c : Likewise.
* gcc.target/aarch64/vect-fmovf.c : Likewise.
* gcc.target/aarch64/vect_ctz_1.c : Likewise.diff --git a/gcc/testsuite/gcc.target/aarch64/fmaxmin.c b/gcc/testsuite/gcc.target/aarch64/fmaxmin.c
index 7654955..4447e33 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmaxmin.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmaxmin.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -ftree-vectorize -fno-inline -save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-inline -fno-vect-cost-model -save-temps" } */
 
 
 extern void abort (void);
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
index d8a9335..4ac3ab7 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-save-temps -O2 -ftree-vectorize -fno-inline" } */
+/* { dg-options "-save-temps -O2 -ftree-vectorize -fno-inline -fno-vect-cost-model" } */
 
 #define N 1024
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-abs-compile.c b/gcc/testsuite/gcc.target/aarch64/vect-abs-compile.c
index 27146b8..19082d7 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-abs-compile.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-abs-compile.c
@@ -1,6 +1,6 @@
 
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fno-vect-cost-model" } */
 
 #define N 16
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-clz.c b/gcc/testsuite/gcc.target/aarch64/vect-clz.c
index 4c7321f..044fa9e 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-clz.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-clz.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps -fno-inline" } */
+/* { dg-options "-O3 -save-temps -fno-inline -fno-vect-cost-model" } */
 
 extern void abort ();
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c b/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c
index d91cca2..4640f57 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-unroll-loops --save-temps -fno-inline" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-unroll-loops --save-temps -fno-inline -fno-vect-cost-model" } */
 
 #define FTYPE double
 #define ITYPE long
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fcm-ge-d.c b/gcc/testsuite/gcc.target/aarch64/vect-fcm-ge-d.c
index c3c4fb3..f5b6329 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fcm-ge-d.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fcm-ge-d.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-unroll-loops --save-temps -fno-inline" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-unroll-loops --save-temps -fno-inline -fno-vect-cost-model" } */
 
 #define FTYPE double
 #define ITYPE long
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fcm-gt-d.c b/gcc/testsuite/gcc.target/aarch64/vect-fcm-gt-d.c
index 9ef5f1c..28d7ab6 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fcm-gt-d.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fcm-gt-d.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-unroll-loops --save-temps -fno-inline" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-unroll-loops --save-temps -fno-inline -fno-vect-cost-model" } */
 
 #define FTYPE double
 #define ITYPE long
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c b/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c
index f8ef3ac..bfd327c 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-vect-cost-model" } */
 
 #define N 32
 
diff --git 

[PATCH 7/9] Add patterns and predicates foutline-msabi-xlouges

2016-11-22 Thread Daniel Santos
Adds the predicates save_multiple and restore_multiple to predicates.md,
which are used by following patterns in sse.md:

* save_multiple - insn that calls a save stub
* save_multiple_hfp - insn that calls a save stub when a hard frame
  pointer is used.
* restore_multiple - call_insn that calls a save stub and returns to the
  function to allow a sibling call (which should typically offer better
  optimization than the restore stub as the tail call)
* restore_multiple_and_return - a jump_insn that returns from the
  function as a tail-call.
* restore_multiple_leave_return - like the above, but restores the frame
  pointer before returning.
---
 gcc/config/i386/predicates.md | 155 ++
 gcc/config/i386/sse.md|  46 +
 2 files changed, 201 insertions(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 219674e..ebe735ad 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1663,3 +1663,158 @@
   (ior (match_operand 0 "register_operand")
(and (match_code "const_int")
(match_test "op == constm1_rtx"
+
+;; Return true if:
+;; 1. first op is a symbol reference,
+;; 2. >= 13 operands, and
+;; 3. operands 2 to end is one of:
+;;   a. save a register to a memory location, or
+;;   b. restore stack pointer.
+(define_predicate "save_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  if (GET_CODE (head) != USE)
+return false;
+  else
+{
+  rtx op0 = XEXP (head, 0);
+  if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+   return false;
+}
+
+  if (nregs < 13)
+return false;
+
+  for (i = 2; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* storing a register to memory.  */
+   if (GET_CODE (src) == REG && GET_CODE (dest) == MEM)
+ {
+   rtx addr = XEXP (dest, 0);
+
+   /* Good if dest address is in RAX.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == AX_REG)
+ continue;
+
+   /* Good if dest address is offset of RAX.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == AX_REG)
+ continue;
+ }
+   break;
+
+ default:
+   break;
+   }
+   return false;
+}
+  return true;
+})
+
+;; Return true if:
+;; * first op is (return) or a a use (symbol reference),
+;; * >= 14 operands, and
+;; * operands 2 to end are one of:
+;;   - restoring a register from a memory location that's an offset of RSI.
+;;   - clobbering a reg
+;;   - adjusting SP
+(define_predicate "restore_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  switch (GET_CODE (head))
+{
+  case RETURN:
+   i = 3;
+   break;
+
+  case USE:
+  {
+   rtx op0 = XEXP (head, 0);
+
+   if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+ return false;
+
+   i = 1;
+   break;
+  }
+
+  default:
+   return false;
+}
+
+  if (nregs < i + 12)
+return false;
+
+  for (; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case CLOBBER:
+   continue;
+
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* Restoring a register from memory.  */
+   if (GET_CODE (src) == MEM && GET_CODE (dest) == REG)
+ {
+   rtx addr = XEXP (src, 0);
+
+   /* Good if src address is in RSI.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == SI_REG)
+ continue;
+
+   /* Good if src address is offset of RSI.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == SI_REG)
+ continue;
+
+   /* Good if adjusting stack pointer.  */
+   if (GET_CODE (dest) == REG
+   && REGNO (dest) == SP_REG
+   && GET_CODE (src) == PLUS
+   && GET_CODE (XEXP (src, 0)) == REG
+   && REGNO (XEXP (src, 0)) == SP_REG)
+ continue;
+ }
+
+   /* Restoring stack pointer from another register. */
+   if (GET_CODE (dest) == REG && REGNO (dest) == SP_REG
+   && GET_CODE (src) == REG)
+ continue;
+   break;
+
+ default:
+   break;
+   }
+   return 

[PATCH 9/9] Add remainder of moutline-msabi-xlogues implementation

2016-11-22 Thread Daniel Santos
Adds functions emit_msabi_outlined_save and emit_msabi_outlined_restore,
which are called from ix86_expand_prologue and ix86_expand_epilogue,
respectively. Also adds the code to ix86_expand_call that enables the
optimization (setting  the machine_function's outline_ms_sysv field).
---
 gcc/config/i386/i386.c | 298 +
 1 file changed, 279 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1dc244e..6345c61 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13882,6 +13882,103 @@ ix86_elim_entry_set_got (rtx reg)
 }
 }
 
+static rtx
+gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store)
+{
+  rtx addr, mem;
+
+  if (offset)
+addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset));
+  mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg);
+  return gen_rtx_SET (store ? mem : reg, store ? reg : mem);
+}
+
+static inline rtx
+gen_frame_load (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, false);
+}
+
+static inline rtx
+gen_frame_store (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, true);
+}
+
+static void
+emit_msabi_outlined_save (const struct ix86_frame )
+{
+  struct machine_function *m = cfun->machine;
+  const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS
+ + m->outline_ms_sysv_extra_regs;
+  rtvec v = rtvec_alloc (ncregs - 1 + 3);
+  rtx insn, sym, tmp;
+  rtx rax = gen_rtx_REG (word_mode, AX_REG);
+  unsigned i = 0;
+  unsigned j;
+  const struct xlogue_layout  = xlogue_layout::get_instance ();
+  HOST_WIDE_INT stack_used = xlogue.get_stack_space_used ();
+  HOST_WIDE_INT stack_alloc_size = stack_used;
+  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset ();
+
+  /* Verify that the incoming stack 16-byte alignment offset matches the
+ layout we're using.  */
+  gcc_assert ((m->fs.sp_offset & 15) == xlogue.get_stack_align_off_in ());
+
+  sym = xlogue.get_stub_rtx (frame_pointer_needed ? XLOGUE_STUB_SAVE_HFP
+ : XLOGUE_STUB_SAVE);
+  RTVEC_ELT (v, i++) = gen_rtx_USE (VOIDmode, sym);
+
+  /* Combine as many other allocations as possible.  */
+  if (frame.nregs == 0)
+{
+  if (frame.nsseregs == 0)
+   /* If no other GP or SSE regs, we allocate the whole stack frame.  */
+   stack_alloc_size = frame.stack_pointer_offset - m->fs.sp_offset;
+  else
+   stack_alloc_size = frame.reg_save_offset - m->fs.sp_offset;
+
+  gcc_assert (stack_alloc_size >= stack_used);
+}
+
+  if (crtl->stack_realign_needed)
+{
+  int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
+
+  gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
+  insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-align_bytes)));
+  RTX_FRAME_RELATED_P (insn) = 1;
+  RTVEC_ELT (v, i++) = const1_rtx;
+}
+  else
+  RTVEC_ELT (v, i++) = const0_rtx;
+
+  tmp = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (-rax_offset));
+  insn = emit_insn (gen_rtx_SET (rax, tmp));
+
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+GEN_INT (-stack_alloc_size), -1,
+m->fs.cfa_reg == stack_pointer_rtx);
+
+  for (j = 0; j < ncregs; ++j)
+{
+  const xlogue_layout::reginfo  = xlogue.get_reginfo (j);
+  rtx store;
+  rtx reg;
+
+  reg = gen_rtx_REG (SSE_REGNO_P (r.regno) ? V4SFmode : word_mode,
+r.regno);
+  store = gen_frame_store (reg, rax, -r.offset);
+  RTVEC_ELT (v, i++) = store;
+}
+
+  gcc_assert (i == (unsigned)GET_NUM_ELEM (v));
+
+  insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+  RTX_FRAME_RELATED_P (insn) = true;
+}
+
 /* Expand the prologue into a bunch of separate insns.  */
 
 void
@@ -14095,6 +14192,11 @@ ix86_expand_prologue (void)
}
 }
 
+  /* Call to outlining stub occurs after pushing frame pointer (if it was
+ needed).  */
+  if (m->outline_ms_sysv)
+  emit_msabi_outlined_save (frame);
+
   if (!int_registers_saved)
 {
   /* If saving registers via PUSH, do so now.  */
@@ -14123,20 +14225,24 @@ ix86_expand_prologue (void)
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
-  /* The computation of the size of the re-aligned stack frame means
-that we must allocate the size of the register save area before
-performing the actual alignment.  Otherwise we cannot guarantee
-that there's enough storage above the realignment point.  */
-  if (m->fs.sp_offset != frame.sse_reg_save_offset)
-pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-  GEN_INT (m->fs.sp_offset
-   

[PATCH 8/9] Add msabi pro/epilogue stubs to libgcc

2016-11-22 Thread Daniel Santos
Adds libgcc/config/i386/i386-asm.h to manage common cpp and gas macros.
stubs use the following naming convention:

  (sav|res)ms64[f][x]

save|resSave or restore
ms64Avoid possible name collisions with future stubs
(specific to 64-bit msabi --> sysv scenario)
[f] Variant for hard frame pointer (and stack realignment)
[x] Tail-call variant (is the return from function)
---
 libgcc/config.host |  2 +-
 libgcc/config/i386/i386-asm.h  | 82 ++
 libgcc/config/i386/resms64.S   | 63 
 libgcc/config/i386/resms64f.S  | 59 ++
 libgcc/config/i386/resms64fx.S | 61 +++
 libgcc/config/i386/resms64x.S  | 65 +
 libgcc/config/i386/savms64.S   | 63 
 libgcc/config/i386/savms64f.S  | 64 +
 libgcc/config/i386/t-msabi |  7 
 9 files changed, 465 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/i386/i386-asm.h
 create mode 100644 libgcc/config/i386/resms64.S
 create mode 100644 libgcc/config/i386/resms64f.S
 create mode 100644 libgcc/config/i386/resms64fx.S
 create mode 100644 libgcc/config/i386/resms64x.S
 create mode 100644 libgcc/config/i386/savms64.S
 create mode 100644 libgcc/config/i386/savms64f.S
 create mode 100644 libgcc/config/i386/t-msabi

diff --git a/libgcc/config.host b/libgcc/config.host
index 64beb21..07bb269 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1335,7 +1335,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \
   i[34567]86-*-gnu*)
-   tmake_file="${tmake_file} t-tls i386/t-linux t-slibgcc-libgcc"
+   tmake_file="${tmake_file} t-tls i386/t-linux i386/t-msabi 
t-slibgcc-libgcc"
if test "$libgcc_cv_cfi" = "yes"; then
tmake_file="${tmake_file} t-stack i386/t-stack-i386"
fi
diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h
new file mode 100644
index 000..73acf5c
--- /dev/null
+++ b/libgcc/config/i386/i386-asm.h
@@ -0,0 +1,82 @@
+/* Defines common perprocessor and assembly macros for use by various stubs.
+ *
+ *   Copyright (C) 2016 Free Software Foundation, Inc.
+ *   Written By Daniel Santos 
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 3, or (at your option) any
+ * later version.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Under Section 7 of GPL version 3, you are granted additional
+ * permissions described in the GCC Runtime Library Exception, version
+ * 3.1, as published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License and
+ * a copy of the GCC Runtime Library Exception along with this program;
+ * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+ * .
+ */
+
+#ifndef I386_ASM_H
+#define I386_ASM_H
+
+#ifdef __ELF__
+# define ELFFN(fn) .type fn,@function
+#else
+# define ELFFN(fn)
+#endif
+
+#define FUNC_START(fn) \
+   .global fn; \
+   ELFFN (fn); \
+fn:
+
+#define HIDDEN_FUNC(fn)\
+   FUNC_START (fn) \
+   .hidden fn; \
+
+#define FUNC_END(fn) .size fn,.-fn
+
+#ifdef __SSE2__
+# ifdef __AVX__
+#  define MOVAPS vmovaps
+# else
+#  define MOVAPS movaps
+# endif
+
+/* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
+.macro SSE_SAVE off=0
+   MOVAPS %xmm15,(\off - 0x90)(%rax)
+   MOVAPS %xmm14,(\off - 0x80)(%rax)
+   MOVAPS %xmm13,(\off - 0x70)(%rax)
+   MOVAPS %xmm12,(\off - 0x60)(%rax)
+   MOVAPS %xmm11,(\off - 0x50)(%rax)
+   MOVAPS %xmm10,(\off - 0x40)(%rax)
+   MOVAPS %xmm9, (\off - 0x30)(%rax)
+   MOVAPS %xmm8, (\off - 0x20)(%rax)
+   MOVAPS %xmm7, (\off - 0x10)(%rax)
+   MOVAPS %xmm6, \off(%rax)
+.endm
+
+/* Restore SSE registers 6-15. off is the offset of rsi to get to xmm6.  */
+.macro SSE_RESTORE off=0
+   MOVAPS (\off - 0x90)(%rsi), %xmm15
+   MOVAPS (\off - 0x80)(%rsi), %xmm14
+   MOVAPS (\off - 0x70)(%rsi), %xmm13
+   MOVAPS (\off - 0x60)(%rsi), %xmm12
+   MOVAPS (\off - 0x50)(%rsi), %xmm11
+   MOVAPS (\off - 0x40)(%rsi), %xmm10
+   MOVAPS (\off - 0x30)(%rsi), %xmm9
+   MOVAPS (\off - 0x20)(%rsi), %xmm8
+   MOVAPS (\off - 0x10)(%rsi), %xmm7
+   MOVAPS \off(%rsi), %xmm6
+.endm
+
+#endif /* __SSE2__ */
+#endif /* I386_ASM_H */
diff --git a/libgcc/config/i386/resms64.S b/libgcc/config/i386/resms64.S
new file 

[PATCH 6/9] Modify ix86_compute_frame_layout for foutline-msabi-xlogues

2016-11-22 Thread Daniel Santos
ix86_compute_frame_layout will now populate fields added to structs
machine_function and ix86_frame and modify the frame layout specific to
facilitate the use of save & restore stubs.
---
 gcc/config/i386/i386.c | 138 ++---
 1 file changed, 131 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f661b3f..1dc244e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2664,13 +2664,37 @@ struct GTY(()) stack_local_entry {
 
saved frame pointer if frame_pointer_needed
<- HARD_FRAME_POINTER
-   [saved regs]
-   <- regs_save_offset
-   [padding0]
-
-   [saved SSE regs]
+   [Normal case:
+
+ [saved regs]
+   <- reg_save_offset
+ [padding0]
+
+ [saved SSE regs]
+
+   ][ms x64 --> sysv with -foutline-msabi-xlogues:
+ [padding0]
+   <- Start of out-of-line, stub-saved/restored regs
+  (see libgcc/config/i386/msabi.S)
+ [XMM6-15]
+ [RSI]
+ [RDI]
+ [?RBX]only if RBX is clobbered
+ [?RBP]only if RBP and RBX are clobbered
+ [?R12]only if R12 and all previous regs are clobbered
+ [?R13]only if R13 and all previous regs are clobbered
+ [?R14]only if R14 and all previous regs are clobbered
+ [?R15]only if R15 and all previous regs are clobbered
+   <- end of stub-saved/restored regs
+ [padding1]
+   <- outlined_save_offset
+ [saved regs]  Any remaning regs are saved in-line
+   <- reg_save_offset
+ [saved SSE regs]  not yet verified, but I *think* that there should be no
+   other SSE regs to save here.
+   ]
<- sse_regs_save_offset
-   [padding1]  |
+   [padding2]
   |<- FRAME_POINTER
[va_arg registers]  |
   |
@@ -2692,6 +2716,7 @@ struct ix86_frame
   HOST_WIDE_INT hard_frame_pointer_offset;
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
@@ -12489,6 +12514,8 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
+  m->outline_ms_sysv_pad_in = 0;
+  m->outline_ms_sysv_pad_out = 0;
   CLEAR_HARD_REG_SET (stub_managed_regs);
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
@@ -12504,6 +12531,45 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   crtl->stack_alignment_needed = 128;
 }
 
+  /* m->outline_ms_sysv is initially enabled in ix86_expand_call for all
+ 64-bit ms_abi functions that call a sysv function.  So this is where
+ we prune away cases where actually don't want to out-of-line the
+ pro/epilogues.  */
+  if (m->outline_ms_sysv)
+  {
+gcc_assert (TARGET_64BIT_MS_ABI);
+gcc_assert (TARGET_OUTLINE_MSABI_XLOGUES);
+
+/* Do we need to handle SEH and disable the optimization? */
+gcc_assert (!TARGET_SEH);
+
+if (!TARGET_SSE)
+  m->outline_ms_sysv = false;
+
+/* Don't break hot-patched functions.  */
+else if (ix86_function_ms_hook_prologue (current_function_decl))
+  m->outline_ms_sysv = false;
+
+/* TODO: Cases that have not yet been examined.  */
+else if (crtl->calls_eh_return
+|| crtl->need_drap
+|| m->static_chain_on_stack
+|| ix86_using_red_zone ()
+|| flag_split_stack)
+  {
+   static bool warned = false;
+   if (!warned)
+ {
+   warned = true;
+   warning (OPT_moutline_msabi_xlogues,
+"not currently supported with the following: SEH, "
+"DRAP, static call chains on the stack, red zones or "
+"split stack.");
+ }
+   m->outline_ms_sysv = false;
+  }
+  }
+
   stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
   preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
 
@@ -12572,6 +12638,60 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   /* The traditional frame pointer location is at the top of the frame.  */
   frame->hard_frame_pointer_offset = offset;
 
+  if (m->outline_ms_sysv)
+{
+  unsigned i;
+  HOST_WIDE_INT offset_after_int_regs;
+
+  gcc_assert (!(offset & 7));
+
+  /* Select an appropriate layout for incoming stack offset.  */
+  m->outline_ms_sysv_pad_in = (!crtl->stack_realign_needed && (offset & 
8));
+  const struct xlogue_layout  = xlogue_layout::get_instance ();
+
+  gcc_assert (frame->nregs >= 2);
+  gcc_assert (frame->nsseregs >= 10);
+
+

[PATCH 5/9] Modify ix86_save_reg to optionally omit stub-managed registers

2016-11-22 Thread Daniel Santos
Adds static HARD_REG_SET stub_managed_regs to track registers that will
be managed by the pro/epilogue stubs for the function.

Adds a third parameter bool ignore_outlined to ix86_save_reg to specify
rather or not the count should include registers marked in
stub_managed_regs.
---
 gcc/config/i386/i386.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 40f9acf..f661b3f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12294,10 +12294,14 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
  && df_regs_ever_live_p (regno)));
 }
 
+/* Registers who's save & restore will be managed by stubs called from
+   pro/epilogue (inited in ix86_compute_frame_layout).  */
+static HARD_REG_SET GTY(()) stub_managed_regs;
+
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
-ix86_save_reg (unsigned int regno, bool maybe_eh_return)
+ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined)
 {
   /* If there are no caller-saved registers, we preserve all registers,
  except for MMX and x87 registers which aren't supported when saving
@@ -12365,6 +12369,10 @@ ix86_save_reg (unsigned int regno, bool 
maybe_eh_return)
}
 }
 
+  if (ignore_outlined && cfun->machine->outline_ms_sysv
+  && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+return false;
+
   if (crtl->drap_reg
   && regno == REGNO (crtl->drap_reg)
   && !cfun->machine->no_drap_save_restore)
@@ -12385,7 +12393,7 @@ ix86_nsaved_regs (void)
   int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, false))
   nregs ++;
   return nregs;
 }
@@ -12401,7 +12409,7 @@ ix86_nsaved_sseregs (void)
   if (!TARGET_64BIT_MS_ABI)
 return 0;
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, false))
   nregs ++;
   return nregs;
 }
@@ -12481,6 +12489,7 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
+  CLEAR_HARD_REG_SET (stub_managed_regs);
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
  except for function prologues, leaf functions and when the defult
@@ -12792,7 +12801,7 @@ ix86_emit_save_regs (void)
   rtx_insn *insn;
 
   for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; )
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno)));
RTX_FRAME_RELATED_P (insn) = 1;
@@ -12874,7 +12883,7 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
 ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
cfa_offset -= UNITS_PER_WORD;
@@ -12889,7 +12898,7 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset);
cfa_offset -= GET_MODE_SIZE (V4SFmode);
@@ -13269,13 +13278,13 @@ get_scratch_register_on_entry (struct scratch_reg *sr)
   && !static_chain_p
   && drap_regno != CX_REG)
regno = CX_REG;
-  else if (ix86_save_reg (BX_REG, true))
+  else if (ix86_save_reg (BX_REG, true, false))
regno = BX_REG;
   /* esi is the static chain register.  */
   else if (!(regparm == 3 && static_chain_p)
-  && ix86_save_reg (SI_REG, true))
+  && ix86_save_reg (SI_REG, true, false))
regno = SI_REG;
-  else if (ix86_save_reg (DI_REG, true))
+  else if (ix86_save_reg (DI_REG, true, false))
regno = DI_REG;
   else
{
@@ -14376,7 +14385,7 @@ ix86_emit_restore_regs_using_mov (HOST_WIDE_INT 
cfa_offset,
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, maybe_eh_return))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, maybe_eh_return, 
true))
   {
rtx reg = gen_rtx_REG (word_mode, regno);
rtx mem;
@@ -14415,7 +14424,7 @@ ix86_emit_restore_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset,
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)

[PATCH 3/9] Add option -moutline-msabi-xlogues

2016-11-22 Thread Daniel Santos
Adds the option to i386.opt and i386.c and adds documentation to
invoke.texi.
---
 gcc/config/i386/i386.c   |  1 +
 gcc/config/i386/i386.opt |  5 +
 gcc/doc/invoke.texi  | 11 ++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5ed8fb6..0e1d871 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4290,6 +4290,7 @@ ix86_target_string (HOST_WIDE_INT isa, int flags, int 
ix86_flags,
 { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD},
 { "-mavx256-split-unaligned-store",
MASK_AVX256_SPLIT_UNALIGNED_STORE},
 { "-mprefer-avx128",   MASK_PREFER_AVX128},
+{ "-mmoutline-msabi-xlogues",  MASK_OUTLINE_MSABI_XLOGUES},
   };
 
   /* Additional flag options.  */
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 9eef558..f556978 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -528,6 +528,11 @@ Enum(calling_abi) String(sysv) Value(SYSV_ABI)
 EnumValue
 Enum(calling_abi) String(ms) Value(MS_ABI)
 
+moutline-msabi-xlogues
+Target Report Mask(OUTLINE_MSABI_XLOGUES) Save
+Reduces function size by using out-of-line stubs to save & restore registers
+clobberd by differences in Microsoft and System V ABIs.
+
 mveclibabi=
 Target RejectNegative Joined Var(ix86_veclibabi_type) Enum(ix86_veclibabi) 
Init(ix86_veclibabi_type_none)
 Vector library ABI to use.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8e2f466..4706085 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1188,7 +1188,7 @@ See RS/6000 and PowerPC Options.
 -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
 -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
 -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop -mgeneral-regs-only}
+-mmitigate-rop -mgeneral-regs-only -moutline-msabi-xlogues}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -25004,6 +25004,15 @@ You can control this behavior for specific functions by
 using the function attributes @code{ms_abi} and @code{sysv_abi}.
 @xref{Function Attributes}.
 
+@item -moutline-msabi-xlogues
+@itemx -mno-outline-msabi-xlogues
+@opindex moutline-msabi-xlogues
+Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a
+SysV ABI function must consider RSI, RDI and XMM6-15 as clobbered, emitting
+fairly lengthy prologues & epilogues.  This option generates prologues &
+epilogues that instead call stubs in libgcc to perform these saves & restores,
+thus reducing function size at the cost of a few extra instructions.
+
 @item -mtls-dialect=@var{type}
 @opindex mtls-dialect
 Generate code to access thread-local storage using the @samp{gnu} or
-- 
2.9.0



[PATCH 1/9] Change type of x86_64_ms_sysv_extra_clobbered_registers

2016-11-22 Thread Daniel Santos
This will need to be unsigned for a subsequent patch. Also adds the
constant NUM_X86_64_MS_CLOBBERED_REGS for brievity.
---
 gcc/config/i386/i386.c | 8 +++-
 gcc/config/i386/i386.h | 4 +++-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..56cc67d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2421,7 +2421,7 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-int const x86_64_ms_sysv_extra_clobbered_registers[12] =
+unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -28209,11 +28209,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   else if (TARGET_64BIT_MS_ABI
   && (!callarg2 || INTVAL (callarg2) != -2))
 {
-  int const cregs_size
-   = ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers);
-  int i;
+  unsigned i;
 
-  for (i = 0; i < cregs_size; i++)
+  for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; i++)
{
  int regno = x86_64_ms_sysv_extra_clobbered_registers[i];
  machine_mode mode = SSE_REGNO_P (regno) ? TImode : DImode;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..a45b66a 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2172,7 +2172,9 @@ extern int const dbx_register_map[FIRST_PSEUDO_REGISTER];
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern int const x86_64_ms_sysv_extra_clobbered_registers[12];
+extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
+#define NUM_X86_64_MS_CLOBBERED_REGS \
+  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-- 
2.9.0



[PATCH 2/9] Minor refactor in ix86_compute_frame_layout

2016-11-22 Thread Daniel Santos
This refactor is separated from a future patch that actually alters
ix86_compute_frame_layout.
---
 gcc/config/i386/i386.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 56cc67d..5ed8fb6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12256,6 +12256,7 @@ ix86_builtin_setjmp_frame_value (void)
 static void
 ix86_compute_frame_layout (struct ix86_frame *frame)
 {
+  struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
   unsigned HOST_WIDE_INT preferred_alignment;
@@ -12290,19 +12291,19 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
  scheduling that can be done, which means that there's very little point
  in doing anything except PUSHs.  */
   if (TARGET_SEH)
-cfun->machine->use_fast_prologue_epilogue = false;
+m->use_fast_prologue_epilogue = false;
 
   /* During reload iteration the amount of registers saved can change.
  Recompute the value as needed.  Do not recompute when amount of registers
  didn't change as reload does multiple calls to the function and does not
  expect the decision to change within single iteration.  */
   else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-   && cfun->machine->use_fast_prologue_epilogue_nregs != frame->nregs)
+  && m->use_fast_prologue_epilogue_nregs != frame->nregs)
 {
   int count = frame->nregs;
   struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-  cfun->machine->use_fast_prologue_epilogue_nregs = count;
+  m->use_fast_prologue_epilogue_nregs = count;
 
   /* The fast prologue uses move instead of push to save registers.  This
  is significantly longer, but also executes faster as modern hardware
@@ -12319,14 +12320,14 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (node->frequency < NODE_FREQUENCY_NORMAL
  || (flag_branch_probabilities
  && node->frequency < NODE_FREQUENCY_HOT))
-cfun->machine->use_fast_prologue_epilogue = false;
+   m->use_fast_prologue_epilogue = false;
   else
-cfun->machine->use_fast_prologue_epilogue
+   m->use_fast_prologue_epilogue
   = !expensive_function_p (count);
 }
 
   frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
+= (TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
&& flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
-- 
2.9.0



[PATCH 4/9] Adds class xlouge_layout and new fields to struct machine_function

2016-11-22 Thread Daniel Santos
Of the new fields added to struct machine_function, outline_ms_sysv is
initially set in ix86_expand_call, but may later be cleared when
ix86_compute_frame_layout is called (both of these are in subsequent
patch).  If it is not cleared, then the remaining new fields will be
set.

The new class xlouge_layout manages the layout of the stack area used by
the out-of-line save & restore stubs as well as any padding needed
before and after the save area.  It also provides the proper symbol rtx
for the requested stub based upon values of the new fields in struct
machine_function.

xlouge_layout cannot be used until stack realign flags are finalized and
ix86_compute_frame_layout is called, at which point
xlouge_layout::get_instance may be used to retrieve the appropriate
(constant) instance of xlouge_layout.
---
 gcc/config/i386/i386.c | 215 +
 gcc/config/i386/i386.h |  18 +
 2 files changed, 233 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0e1d871..40f9acf 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2429,6 +2429,221 @@ unsigned const 
x86_64_ms_sysv_extra_clobbered_registers[12] =
   XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
 };
 
+enum xlogue_stub {
+  XLOGUE_STUB_SAVE,
+  XLOGUE_STUB_RESTORE,
+  XLOGUE_STUB_RESTORE_TAIL,
+  XLOGUE_STUB_SAVE_HFP,
+  XLOGUE_STUB_RESTORE_HFP,
+  XLOGUE_STUB_RESTORE_HFP_TAIL,
+
+  XLOGUE_STUB_COUNT
+};
+
+enum xlogue_stub_sets {
+  XLOGUE_SET_ALIGNED,
+  XLOGUE_SET_ALIGNED_PLUS_8,
+  XLOGUE_SET_HFP_OR_UNALIGNED,
+
+  XLOGUE_SET_COUNT
+};
+
+/* Register save/restore layout used by an out-of-line stubs.  */
+class xlogue_layout {
+public:
+  struct reginfo {
+unsigned regno;
+HOST_WIDE_INT offset;  /* Offset used by stub base pointer (rax or
+  rsi) to where each register is stored.  */
+  };
+
+  unsigned get_nregs () const  {return m_nregs;}
+  HOST_WIDE_INT get_stack_align_off_in () const{return 
m_stack_align_off_in;}
+
+  const reginfo _reginfo (unsigned reg) const
+{
+  gcc_assert (reg < m_nregs);
+  return m_regs[reg];
+}
+
+  /* Returns an rtx for the stub's symbol based upon
+   1.) the specified stub (save, restore or restore_ret) and
+   2.) the value of cfun->machine->outline_ms_sysv_extra_regs and
+   3.) rather or not stack alignment is being performed.  */
+  rtx get_stub_rtx (enum xlogue_stub stub) const;
+
+  /* Returns the amount of stack space (including padding) that the stub
+ needs to store registers based upon data in the machine_function.  */
+  HOST_WIDE_INT get_stack_space_used () const
+{
+  const struct machine_function  = *cfun->machine;
+  unsigned last_reg = m.outline_ms_sysv_extra_regs + MIN_REGS;
+
+  gcc_assert (m.outline_ms_sysv_extra_regs <= MAX_EXTRA_REGS);
+  return m_regs[last_reg - 1].offset
++ (m.outline_ms_sysv_pad_out ? 8 : 0)
++ STUB_INDEX_OFFSET;
+}
+
+  /* Returns the offset for the base pointer used by the stub.  */
+  HOST_WIDE_INT get_stub_ptr_offset () const
+{
+  return STUB_INDEX_OFFSET + m_stack_align_off_in;
+}
+
+  static const struct xlogue_layout _instance ();
+
+  static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
+  static const unsigned MIN_REGS = 12;
+  static const unsigned MAX_REGS = 18;
+  static const unsigned MAX_EXTRA_REGS = MAX_REGS - MIN_REGS;
+  static const unsigned VARIANT_COUNT = MAX_EXTRA_REGS + 1;
+  static const unsigned STUB_NAME_MAX_LEN = 16;
+  static const char * const STUB_BASE_NAMES[XLOGUE_STUB_COUNT];
+  static const unsigned REG_ORDER[MAX_REGS];
+  static const unsigned REG_ORDER_REALIGN[MAX_REGS];
+
+private:
+  xlogue_layout ();
+  xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp);
+  xlogue_layout (const xlogue_layout &);
+
+  /* True if hard frame pointer is used.  */
+  bool m_hfp;
+
+  /* Max number of register this layout manages.  */
+  unsigned m_nregs;
+
+  /* Incoming offset from 16-byte alignment.  */
+  HOST_WIDE_INT m_stack_align_off_in;
+  struct reginfo m_regs[MAX_REGS];
+  rtx m_syms[XLOGUE_STUB_COUNT][VARIANT_COUNT];
+  char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
+
+  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+};
+
+const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
+  "savms64",
+  "resms64",
+  "resms64x",
+  "savms64f",
+  "resms64f",
+  "resms64fx"
+};
+
+const unsigned xlogue_layout::REG_ORDER[xlogue_layout::MAX_REGS] = {
+/* The below offset values are where each register is stored for the layout
+   relative to incoming stack pointer.  The value of each m_regs[].offset will
+   be relative to the incoming base pointer (rax or rsi) used by the stub.
+
+ FP offset FP offset
+Register  aligned  aligned + 8 realigned*/
+XMM15_REG, /* 0x10 0x18

[PATCH v2 0/9] Add optimization -moutline-msabi-xlougues (for Wine 64)

2016-11-22 Thread Daniel Santos
Due to ABI differences, when a 64-bit Microsoft function calls and 
System V function, it  must consider RSI, RDI and XMM6-15 as clobbered. 
Saving these registers can cost as much as 109 bytes and a similar 
amount for restoring. This patch set targets 64-bit Wine and aims to 
mitigate some of these costs by adding ms-->sysv save & restore stubs to 
libgcc, which are called from pro/epilogues rather than emitting the 
code inline.  And since we're already tinkering with stubs, they will 
also manages the save/restore of up to 6 additional registers. Analysis 
of building Wine 64 demonstrates a reduction of .text by around 20%.


The basic theory is that a reduction of I-cache misses will offset the 
extra instructions required for implementation. And since there are only 
a handful of stubs that will be in memory, I'm using the larger mov 
instructions instead of push/pop to facilitate better parallelization. I 
have not yet produced actual performance data.


Here is a sample of some generated code:

Prologue:
   23c20:   48 8d 44 24 88  lea -0x78(%rsp),%rax
   23c25:   48 81 ec 08 01 00 00sub$0x108,%rsp
   23c2c:   e8 1a 4b 03 00  callq  5874b <__savms64_15>

Epilogue (r10 stores the value to restore the stack pointer to):
   23c7c:   48 8d b4 24 90 00 00lea 0x90(%rsp),%rsi
   23c83:   00
   23c84:   4c 8d 56 78 lea 0x78(%rsi),%r10
   23c88:   e9 5e 4b 03 00  jmpq   587eb <__resms64x_15>

It would appear that forced stack realignment has become the new normal 
for Wine 64, since there are many Windows programs that violate the 
16-byte alignment requirement, but just so *happen* to not crash on 
Windows (and therefore claim that Wine should work as Windows happens to 
behave given the UB).


Prologue, stack realignment case:
   23c20:   55  push   %rbp
   23c21:   48 89 e5mov%rsp,%rbp
   23c24:   48 83 e4 f0 and $0xfff0,%rsp
   23c28:   48 8d 44 24 90  lea -0x70(%rsp),%rax
   23c2d:   48 81 ec 00 01 00 00sub$0x100,%rsp
   23c34:   e8 8e 43 03 00  callq  57fc7 <__savms64f_15>

Epilogue, stack realignment case:
   23c86:   48 8d b4 24 90 00 00lea 0x90(%rsp),%rsi
   23c8d:   00
   23c8e:   e9 80 43 03 00  jmpq   58013 <__resms64fx_15>

No additional regression tests fail with this patch set. I have tested 
about 12 builds Wine (with varying optimizations & options) and no 
additional tests fails for that either. (Actually, there appears to be 
some type of regression prior to this patch set because it magically 
fixes about 30 failed Wine tests, that don't fail when building with 
Wine with gcc-5.4.0.)


Outstanding issues:

1. My x86 assembly expertise is limited, so I would appreciate
   examination of my stubs & emitted code!
2. Regression tests only run on my old Phenom. Have not yet tested on
   AVX cpu (which should use vmovaps instead of movaps).
3. My test program is inadequate (and is not included in this patch
   set) and needs a lot of cleanup.  During development it failed to
   produce many optimization errors that I got when building Wine. 
   I've been building 64-bit Wine and running Wine's tests in the mean

   time.
4. It would help to write a benchmarking program/script.
5. I haven't yet figured out how to get Wine building with -flto and I
   thus haven't tested how these changes affect it yet.
6. I'm not 100% certain yet, but the stubs __resms64f* (restore with
   hard frame pointer, but return to the function) doesn't appear to
   ever be used because enabling hard frame pointers disables sibling
   calls, which is what it's intended to facilitate.


 gcc/config/i386/i386.c | 704 
++---

 gcc/config/i386/i386.h |  22 +-
 gcc/config/i386/i386.opt   |   5 +
 gcc/config/i386/predicates.md  | 155 +
 gcc/config/i386/sse.md |  46 +++
 gcc/doc/invoke.texi|  11 +-
 libgcc/config.host |   2 +-
 libgcc/config/i386/i386-asm.h  |  82 +
 libgcc/config/i386/resms64.S   |  63 
 libgcc/config/i386/resms64f.S  |  59 
 libgcc/config/i386/resms64fx.S |  61 
 libgcc/config/i386/resms64x.S  |  65 
 libgcc/config/i386/savms64.S   |  63 
 libgcc/config/i386/savms64f.S  |  64 
 libgcc/config/i386/t-msabi |   7 +
 15 files changed, 1358 insertions(+), 51 deletions(-)


Changes in Version 2:

 * Added ChangeLogs (attached).
 * Changed option from -f to -m and moved from gcc/common.opt to
   gcc/config/i386/i386.opt.
 * Solved problem with uncombined SP modifications.
 * Optimization now works when hard frame pointers are used and stack
   realignment is not needed.
 * Added documentation to gcc/doc/invoke.texi

Feedback and comments would be most appreciated!

Thanks,
Daniel





* config/i386/i386.opt: Add option -moutline-msabi-xlogues.

* config/i386/i386.h

[PATCH, committed] TILEPro/TILE-Gx: add trap patterns

2016-11-22 Thread Walter Lee
This patch adds a trap pattern to TILEPro/Tile-Gx.  The pattern emits
an instruction bundle that causes a SIGABRT.

Bootstrapped and tested on TILEPro/TILE-Gx hardware, also backported
to GCC 6.

* config/tilegx/tilegx.md (trap): New pattern.
* config/tilepro/tilepro.md (trap): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8e2bbdf..259eb02 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2016-11-22  Walter Lee  

+   * config/tilegx/tilegx.md (trap): New pattern.
+   * config/tilepro/tilepro.md (trap): Likewise.
+
+2016-11-22  Walter Lee  
+
* config/tilegx/tilegx.md (*zero_extract): Use
define_insn_and_split instead of define_insn; Handle pos + size >
64.
diff --git a/gcc/config/tilegx/tilegx.md b/gcc/config/tilegx/tilegx.md
index 3ad5a87..eccdd28 100644
--- a/gcc/config/tilegx/tilegx.md
+++ b/gcc/config/tilegx/tilegx.md
@@ -2773,6 +2773,12 @@
   "nop"
   [(set_attr "type" "Y01")])

+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  ""
+  "raise; moveli zero, 6"
+  [(set_attr "type" "cannot_bundle")])
+
 ^L
 ;;
 ;; Conditional branches
diff --git a/gcc/config/tilepro/tilepro.md b/gcc/config/tilepro/tilepro.md
index 6493b06..d1536ed 100644
--- a/gcc/config/tilepro/tilepro.md
+++ b/gcc/config/tilepro/tilepro.md
@@ -1578,6 +1578,12 @@
   "nop"
   [(set_attr "type" "Y01")])

+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  ""
+  "raise; moveli zero, 6"
+  [(set_attr "type" "cannot_bundle")])
+
 ^L
 ;;
 ;; Conditional branches


[PATCH, committed] TILE-Gx: fix zero_extract/sign_extract patterns

2016-11-22 Thread Walter Lee
This patch fixes the zero_extract/sign_extract patterns so that they
properly handle the case when pos + size > number of bits in a word.

Bootstrapped and tested on TILE-Gx hardware, also backported to GCC 6.

* config/tilegx/tilegx.md (*zero_extract): Use
define_insn_and_split instead of define_insn; Handle pos +
size > 64.
(*sign_extract): Likewise.

diff --git a/gcc/config/tilegx/tilegx.md b/gcc/config/tilegx/tilegx.md
index 55c345c..3ad5a87 100644
--- a/gcc/config/tilegx/tilegx.md
+++ b/gcc/config/tilegx/tilegx.md
@@ -1237,7 +1237,7 @@
   "ld_tls\t%0, %1, tls_ie_load(%2)"
   [(set_attr "type" "X1_2cycle")])

-(define_insn "*zero_extract"
+(define_insn_and_split "*zero_extract"
   [(set (match_operand:I48MODE 0 "register_operand" "=r")
(zero_extract:I48MODE
  (match_operand:I48MODE 1 "reg_or_0_operand" "r")
@@ -1245,6 +1245,18 @@
  (match_operand:I48MODE 3 "u6bit_cint_operand" "n")))]
   ""
   "bfextu\t%0, %r1, %3, %3+%2-1"
+  "&& reload_completed"
+  [(set (match_dup 0) (zero_extract:I48MODE
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)))]
+{
+  HOST_WIDE_INT bit_width = INTVAL (operands[2]);
+  HOST_WIDE_INT bit_offset = INTVAL (operands[3]);
+
+  if (bit_offset + bit_width > 64)
+operands[2] = GEN_INT (64 - bit_offset);
+}
   [(set_attr "type" "X0")])

 (define_insn "*sign_extract_low32"
@@ -1256,7 +1268,7 @@
   "INTVAL (operands[3]) == 0 && INTVAL (operands[2]) == 32"
   "addxi\t%0, %r1, 0")

-(define_insn "*sign_extract"
+(define_insn_and_split "*sign_extract"
   [(set (match_operand:I48MODE 0 "register_operand" "=r")
(sign_extract:I48MODE
  (match_operand:I48MODE 1 "reg_or_0_operand" "r")
@@ -1264,6 +1276,18 @@
  (match_operand:I48MODE 3 "u6bit_cint_operand" "n")))]
   ""
   "bfexts\t%0, %r1, %3, %3+%2-1"
+  "&& reload_completed"
+  [(set (match_dup 0) (sign_extract:I48MODE
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)))]
+{
+  HOST_WIDE_INT bit_width = INTVAL (operands[2]);
+  HOST_WIDE_INT bit_offset = INTVAL (operands[3]);
+
+  if (bit_offset + bit_width > 64)
+operands[2] = GEN_INT (64 - bit_offset);
+}
   [(set_attr "type" "X0")])



[PATCH] eliminate calls to snprintf(0, 0, ...) with known return value (pr78476)

2016-11-22 Thread Martin Sebor

Calls to bounded functions like snprintf with a zero-size buffer
are special requests to compute the size of output without actually
writing any.  For example:

  int n = snprintf(0, 0, "%08x", rand ());

is a request to compute the number of bytes that the function would
format if it were passed a buffer of non-zero size.  In the example
above since the return value is known to be exactly 8, not only can
the snprintf return value be folded into a constant but the whole
call to snprintf can be eliminated.

The attached patch enables this optimization under the
-fprintf-return-value option.  The patch depends on the one for bug
78461 (posted earlier today) during the testing of which I noticed
that this optimization was missing from the gimple-ssa-sprintf pass.

Thanks
Martin
PR tree-optimization/78476 - snprintf(0, 0, ...) with known arguments not optimized away

gcc/testsuite/ChangeLog:

	PR tree-optimization/78476
	* gcc.dg/tree-ssa/builtin-sprintf-5.c: New test.

gcc/ChangeLog:

	PR tree-optimization/78476
	* gimple-ssa-sprintf.c (struct pass_sprintf_length::call_info):
	Add a member.
	(handle_gimple_call): Adjust signature.
	(try_substitute_return_value): Remove calls to bounded functions
	with zero buffer size whose result is known.
	(pass_sprintf_length::execute): Adjust call to handle_gimple_call.


Index: gcc/gimple-ssa-sprintf.c
===
--- gcc/gimple-ssa-sprintf.c	(revision 242703)
+++ gcc/gimple-ssa-sprintf.c	(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-object-size.h"
 #include "params.h"
 #include "tree-cfg.h"
+#include "tree-ssa-propagate.h"
 #include "calls.h"
 #include "cfgloop.h"
 #include "intl.h"
@@ -122,7 +123,7 @@ class pass_sprintf_length : public gimple_opt_pass
   fold_return_value = param;
 }
 
-  void handle_gimple_call (gimple_stmt_iterator);
+  void handle_gimple_call (gimple_stmt_iterator*);
 
   struct call_info;
   void compute_format_length (const call_info &, format_result *);
@@ -712,6 +713,11 @@ struct pass_sprintf_length::call_info
   /* True for functions like snprintf that specify the size of
  the destination, false for others like sprintf that don't.  */
   bool bounded;
+
+  /* True for bounded functions like snprintf that specify a zero-size
+ buffer as a request to compute the size of output without actually
+ writing any.  */
+  bool nowrite;
 };
 
 /* Return the result of formatting the '%%' directive.  */
@@ -2484,7 +2490,7 @@ get_destination_size (tree dest)
have its range set to the range of return values, if that is known.  */
 
 static void
-try_substitute_return_value (gimple_stmt_iterator gsi,
+try_substitute_return_value (gimple_stmt_iterator *gsi,
 			 const pass_sprintf_length::call_info ,
 			 const format_result )
 {
@@ -2500,15 +2506,30 @@ static void
   && (info.bounded || res.number_chars <= info.objsize)
   && res.number_chars - 1 <= target_int_max ())
 {
-  /* Replace the left-hand side of the call with the constant
-	 result of the formatted function minus 1 for the terminating
-	 NUL which the functions' return value does not include.  */
-  gimple_call_set_lhs (info.callstmt, NULL_TREE);
   tree cst = build_int_cst (integer_type_node, res.number_chars - 1);
-  gimple *g = gimple_build_assign (lhs, cst);
-  gsi_insert_after (, g, GSI_NEW_STMT);
-  update_stmt (info.callstmt);
 
+  if (info.nowrite)
+	{
+	  /* Replace the call to the bounded function with a zero size
+	 (e.g., snprintf(0, 0, "%i", 123) with the constant result
+	 of the function minus 1 for the terminating NUL which
+	 the function's  return value does not include.  */
+	  if (!update_call_from_tree (gsi, cst))
+	gimplify_and_update_call_from_tree (gsi, cst);
+	  gimple *callstmt = gsi_stmt (*gsi);
+	  update_stmt (callstmt);
+	}
+  else
+	{
+	  /* Replace the left-hand side of the call with the constant
+	 result of the formatted function minus 1 for the terminating
+	 NUL which the function's return value does not include.  */
+	  gimple_call_set_lhs (info.callstmt, NULL_TREE);
+	  gimple *g = gimple_build_assign (lhs, cst);
+	  gsi_insert_after (gsi, g, GSI_NEW_STMT);
+	  update_stmt (info.callstmt);
+	}
+
   if (dump_file)
 	{
 	  location_t callloc = gimple_location (info.callstmt);
@@ -2517,7 +2538,8 @@ static void
 	  print_generic_expr (dump_file, cst, dump_flags);
 	  fprintf (dump_file, " for ");
 	  print_generic_expr (dump_file, info.func, dump_flags);
-	  fprintf (dump_file, " return value (output %s).\n",
+	  fprintf (dump_file, " %s (output %s).\n",
+		   info.nowrite ? "call" : "return value",
 		   res.constant ? "constant" : "variable");
 	}
 }
@@ -2582,11 +2604,11 @@ static void
functions and if so, handle it.  */
 
 void
-pass_sprintf_length::handle_gimple_call (gimple_stmt_iterator gsi)
+pass_sprintf_length::handle_gimple_call 

Re: GCC patch committed: Fix -fdump-go-spec to not align last field to type alignment

2016-11-22 Thread Martin Sebor

On 11/22/2016 04:25 PM, Ian Lance Taylor wrote:

The code that handles -fdump-go-spec was incorrectly trying to pad the
last field of a struct/union to the alignment of the overall
struct/union.  That is unnecessary and incorrect, as the alignment is
handled by go_force_record_alignment anyhow.  It caused a compiler
crash on x32 and various other 32-bit targets when generating the Go
version of the libffi ffi_closure type, which is explicitly aligned to
an 8 byte boundary but does not necessarily have that size.  This
caused PRs 78431 and 78432.  Ran bootstrap and Go testsuite on
x86_64-pc-linux-gnu, with x32 multilib enabled.  Committed to
mainline.


I'm seeing failures in the gcc.misc-tests/godump-1.c test that may
be related to this change.  The same failures are also reported on
gcc-testresults:

  https://gcc.gnu.org/ml/gcc-testresults/2016-11/msg02556.html

Martin


Re: [PATCH v3] cpp/c: Add -Wexpansion-to-defined

2016-11-22 Thread Joseph Myers
On Tue, 22 Nov 2016, Paolo Bonzini wrote:

> > It's not obvious to me whether this belongs in -Wextra.  After all, this
> > is a perfectly reasonable and useful GNU C feature, or at least some cases
> > of it are (like "#define FOO (BAR || defined something)").  Is the
> > argument that there are too many details of it that differ between
> > implementations, as discussed in section 3.2 of
> > ?
> 
> Yes, and in general it fits the group of "often annoying warnings, that
> people may nevertheless appreciate" that are already in -Wextra, for
> example -Wunused-parameter, -Wmissing-field-initializers or
> -Wshift-negative-value.

Thanks.  The patch is OK, but we'll need to see how disruptive it is and 
consider whether evidence from large-scale builds indicates moving it out 
of -Wextra.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix up handle_pragma_target (PR target/78451)

2016-11-22 Thread Joseph Myers
On Tue, 22 Nov 2016, Jakub Jelinek wrote:

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2016-11-22  Jakub Jelinek  
> 
>   PR target/78451
>   * c-pragma.c (handle_pragma_target): Don't replace
>   current_target_pragma, but chainon the new args to the current one.
> 
>   * gcc.target/i386/pr78451.c: New test.
>   * gcc.target/i386/pr69255-1.c: Use #pragma GCC push_options
>   and #pragma GCC pop_options around the first #pragma GCC target.
>   * gcc.target/i386/pr69255-2.c: Likewise.
>   * gcc.target/i386/pr69255-3.c: Likewise.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-11-22 Thread Doug Gilmore
On 11/22/2016 08:07 AM, Bin.Cheng wrote:
> On Mon, Nov 21, 2016 at 9:34 PM, Doug Gilmore  wrote:
>> I haven't seen any followups to this discussion of Bin's patch to
>> PR68303 and PR69710, the patch submission:
>> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02000.html
>>
>> Discussion:
>> http://gcc.gnu.org/ml/gcc-patches/2016-07/msg00761.html
>> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg01551.html
>> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg00372.html
>> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg01550.html
>> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02162.html
>> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02155.html
>> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02154.html
>>
>>
>> so I did some investigation to get a better understanding of the
>> issues involved.
> Hi Doug,
> Thanks for looking into this problem.
>>
>> On 07/13/2016 01:59 PM, Jeff Law wrote:
>>> On 05/25/2016 05:22 AM, Bin Cheng wrote:
 Hi, As analyzed in PR68303 and PR69710, vectorizer generates
 duplicated computations in loop's pre-header basic block when
 creating base address for vector reference to the same memory object.
>>> Not a huge surprise.  Loop optimizations generally have a tendency
>>> to create and/or expose CSE opportunities.  Unrolling is a common
>>> culprit, there's certainly the possibility for header duplication,
>>> code motions and IV rewriting to also expose/create redundant code.
>>>
>>> ...
>>>
>>>  But, 1) It
 doesn't fix all the problem on x86_64.  Root cause is computation for
 base address of the first reference is somehow moved outside of
 loop's pre-header, local CSE can't help in this case.
>>> That's a bid odd -- have you investigated why this is outside the loop 
>>> header?
>>> ...
>> I didn't look at this issue per se, but I did try running DOM between
>> autovectorization and IVS.  Just running DOM had little effect, what
>> was crucial was adding the change Bin mentioned in his original
>> message:
>>
>> Besides CSE issue, this patch also re-associates address
>> expressions in vect_create_addr_base_for_vector_ref, specifically,
>> it splits constant offset and adds it back near the expression
>> root in IR.  This is necessary because GCC only handles
>> re-association for commutative operators in CSE.
>>
>> I attached a patch for these changes only.  These are the important
>> modifications that address the some of the IVS related issues exposed
>> by PR68303. I found that adding the CSE change (or calling DOM between
>> autovectorization and IVOPTS) is not needed, and from what I have
> I checked the code again.  As you said, re-association part is important
> to enable CSE opportunities, no matter when and which pass handles it.
> After re-association, the computation of base addresses are like:
> 
> //preheader
> b_1 = g_Input + var_offset_1;
> vectp_1 = b_1 + cst_offset_1;
> b_2 = g_Input + var_offset_2;
> vectp_2 = b_2 + cst_offset_2;
> ...
> b_n = g_input + var_offset_n;
> vectp_n = b_n + cst_offset_n;
> 
> //loop
> MEM[vectp_1];
> MEM[vectp_2];
> ...
> MEM[vectp_n];
> 
> In fact, var_offset_1, var_offset_2, ..., var_offset_n are equal to others.  
> So
> the addresses are in the form of "g_Input + var_offset + cst_offset_x" 
> differing
> to each other wrto constant offset.  The purpose of CSE is to propagate all
> parts of this address to IVOPTs, otherwise IVOPTS only knows IVs as below:
> 
> iv_use_1: {b_1 + cst_offset_1, step}_loop
> iv_use_1: {b_2 + cst_offset_2, step}_loop
> ...
> iv_use_n: {b_n + cst_offset_n, step}_loop
> 
>> seen, actually makes the code worse.
>>
>> Applying only the modifications to
>> vect_create_addr_base_for_vector_ref, additional simplifications will
>> be done when induction variables are found (function
>> find_induction_variables).  These simplications are indicated by the
>> appearance of lines:
>>
>> Applying pattern match.pd:1056, generic-match.c:11865
> This doesn't look related to this problem to me.  The simplification of this
> problem is CSE, it's not what match.pd does.
> 
>>
>> in the IVOPS dump file.  Now IVOPTs transforms the code so that
>> constants now appear in the computation of the effective addresses for
>> the memory OPs.  However the code generated by IVOPTS still uses a
>> separate base register for each memory reference.  Later DOM3
>> transforms the code to use just one base register, which is the form
> 
> Indeed CSE now looks like unnecessary fixing the problem, we can relying on
> DOM pass to explore the equality among new bases (b_1, b_2, ..., b_n).  This
> actually echoes my humble opinion: we shouldn't rely on IVOPTs to fix all bad
> code issues.  On the other handle, for cases in which these bases
> (b_1, b_2, ..., b_n)
> are not equal to each other, there is not much to lose in this way either.
> 
>> the code needs to be in for the preliminary phase of IVOPTs where
>> 

GCC patch committed: Fix -fdump-go-spec to not align last field to type alignment

2016-11-22 Thread Ian Lance Taylor
The code that handles -fdump-go-spec was incorrectly trying to pad the
last field of a struct/union to the alignment of the overall
struct/union.  That is unnecessary and incorrect, as the alignment is
handled by go_force_record_alignment anyhow.  It caused a compiler
crash on x32 and various other 32-bit targets when generating the Go
version of the libffi ffi_closure type, which is explicitly aligned to
an 8 byte boundary but does not necessarily have that size.  This
caused PRs 78431 and 78432.  Ran bootstrap and Go testsuite on
x86_64-pc-linux-gnu, with x32 multilib enabled.  Committed to
mainline.

Ian

PR go/78431
PR go/78432
* godump.c (go_format_type): Always pass alignment as 1 when
calling go_append_padding at end of struct/union.
Index: gcc/godump.c
===
--- gcc/godump.c(revision 242724)
+++ gcc/godump.c(working copy)
@@ -1006,14 +1006,9 @@ go_format_type (struct godump_container
  }
  }
/* Padding.  */
-   {
- unsigned int align_unit;
-
- align_unit = (is_anon_record_or_union) ? 1 : TYPE_ALIGN_UNIT (type);
- *p_art_i = go_append_padding
-   (ob, prev_field_end, TREE_INT_CST_LOW (TYPE_SIZE_UNIT (type)),
-align_unit, *p_art_i, _field_end);
-   }
+   *p_art_i = go_append_padding (ob, prev_field_end,
+ TREE_INT_CST_LOW (TYPE_SIZE_UNIT (type)),
+ 1, *p_art_i, _field_end);
/* Alignment.  */
if (!is_anon_record_or_union
&& known_alignment < TYPE_ALIGN_UNIT (type))


Re: [PING][PATCH][2/2] Early LTO debug -- main part

2016-11-22 Thread Jason Merrill

On 11/11/2016 03:06 AM, Richard Biener wrote:

+/* ???  In some cases the C++ FE (at least) fails to
+   set DECL_CONTEXT properly.  Simply globalize stuff
+   in this case.  For example
+   __dso_handle created via iostream line 74 col 25.  */


The comment for DECL_CONTEXT says that a VAR_DECL can have 'NULL_TREE or 
a TRANSLATION_UNIT_DECL if the given decl has "file scope"'


So this doesn't seem like a FE bug.


+ /* ???  We cannot unconditionally output die_offset if
+non-zero - at least -feliminate-dwarf2-dups will
+create references to those DIEs via symbols.  And we
+do not clear its DIE offset after outputting it
+(and the label refers to the actual DIEs, not the
+DWARF CU unit header which is when using label + offset
+would be the correct thing to do).


I'd be happy to remove or disable -feliminate-dwarf2-dups at this point, 
since it's already useless for C++ without reimplementation.



+  /* "Unwrap" the decls DIE which we put in the imported unit context.
+  ???  If we finish dwarf2out_function_decl refactoring we can
+ do this in a better way from the start and only lazily emit
+ the early DIE references.  */


Can you elaborate more on the refactoring?  dwarf2out_function_decl is 
already very small, I'm guessing you mean gen_subprogram_die?



+  /* ???  We can't annotate types late, but for LTO we may not
+generate a location early either (gfortran.dg/save_5.f90).
+The proper way is to handle it like VLAs though it is told
+that DW_AT_string_length does not support this.  */


I think go ahead and handle it like VLAs, this is an obvious 
generalization and should go into the spec soon enough.  This can happen 
later.



+ /* ???  This all (and above) should probably be simply
+a ! early_dwarf check somehow.  */
+  && ((DECL_ARTIFICIAL (decl) || in_lto_p)
   || (get_AT_file (old_die, DW_AT_decl_file) == file_index
   && (get_AT_unsigned (old_die, DW_AT_decl_line)
   == (unsigned) s.line


Why doesn't the existing source position check handle the LTO case? 
Also the extra parens aren't necessary.



   /* If we're emitting an out-of-line copy of an inline function,
 emit info for the abstract instance and set up to refer to it.  */
+  /* ???  We have output an abstract instance early already and
+ could just re-use that.  This is how LTO treats all functions
+for example.  */


Isn't this what you do now?


+  /* Avoid generating stray type DIEs during late dwarf dumping.
+ All types have been dumped early.  */
+  if (! (decl ? lookup_decl_die (decl) : NULL)


Why do you still want to gen_type_die if decl_or_origin is origin?


+init_sections_and_labels (bool early_lto_debug)


You're changing this function to do the same thing in four slightly 
different ways rather than two.  I'd rather control each piece as 
appropriate; we ought to make SECTION_DEBUG or 
SECTION_DEBUG|SECTION_EXCLUDE a local variable, and select between 
*_SECTION and the DWO variant at each statement rather than in different 
blocks.



+  /* Remove DW_AT_macro from the early output.  */
+  if (have_macinfo)
+   remove_AT (comp_unit_die (),
+  dwarf_strict ? DW_AT_macro_info : DW_AT_GNU_macros);


This will need adjustment for Jakub's DWARF 5 work.  Please make the 
choice of AT value a macro.



+  /* ???  Mostly duplicated from dwarf2out_finish.  */


:(

Jason



Go patch committed: Move encoding utilities from gcc/go to gofrontend proper

2016-11-22 Thread Ian Lance Taylor
This patch by Than McIntosh moves the name encoding utilities from
gcc/go/go-gcc.cc to the gofrontend proper, where they are available
for other backend implementations.  Bootstrapped and ran Go testsuite
on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/Make-lang.in
===
--- gcc/go/Make-lang.in (revision 242724)
+++ gcc/go/Make-lang.in (working copy)
@@ -55,6 +55,7 @@ GO_OBJS = \
go/expressions.o \
go/go-backend.o \
go/go-diagnostics.o \
+   go/go-encode-id.o \
go/go-dump.o \
go/go-gcc.o \
go/go-gcc-diagnostics.o \
@@ -230,6 +231,7 @@ CFLAGS-go/go-gcc.o += $(GOINCLUDES)
 CFLAGS-go/go-linemap.o += $(GOINCLUDES)
 CFLAGS-go/go-sha1.o += $(GOINCLUDES)
 CFLAGS-go/go-gcc-diagnostics.o += $(GOINCLUDES)
+CFLAGS-go/go-encode-id.o += $(GOINCLUDES)
 
 go/%.o: go/gofrontend/%.cc
$(COMPILE) $(GOINCLUDES) $<
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 242724)
+++ gcc/go/go-gcc.cc(working copy)
@@ -412,9 +412,8 @@ class Gcc_backend : public Backend
   { return new Bvariable(error_mark_node); }
 
   Bvariable*
-  global_variable(const std::string& package_name,
- const std::string& pkgpath,
- const std::string& name,
+  global_variable(const std::string& var_name,
+ const std::string& asm_name,
  Btype* btype,
  bool is_external,
  bool is_hidden,
@@ -440,25 +439,27 @@ class Gcc_backend : public Backend
 Location, Bstatement**);
 
   Bvariable*
-  implicit_variable(const std::string&, Btype*, bool, bool, bool,
-   int64_t);
+  implicit_variable(const std::string&, const std::string&, Btype*,
+bool, bool, bool, int64_t);
 
   void
   implicit_variable_set_init(Bvariable*, const std::string&, Btype*,
 bool, bool, bool, Bexpression*);
 
   Bvariable*
-  implicit_variable_reference(const std::string&, Btype*);
+  implicit_variable_reference(const std::string&, const std::string&, Btype*);
 
   Bvariable*
-  immutable_struct(const std::string&, bool, bool, Btype*, Location);
+  immutable_struct(const std::string&, const std::string&,
+   bool, bool, Btype*, Location);
 
   void
   immutable_struct_set_init(Bvariable*, const std::string&, bool, bool, Btype*,
Location, Bexpression*);
 
   Bvariable*
-  immutable_struct_reference(const std::string&, Btype*, Location);
+  immutable_struct_reference(const std::string&, const std::string&,
+ Btype*, Location);
 
   // Labels.
 
@@ -550,102 +551,6 @@ get_identifier_from_string(const std::st
   return get_identifier_with_length(str.data(), str.length());
 }
 
-// Return whether the character c is OK to use in the assembler.
-
-static bool
-char_needs_encoding(char c)
-{
-  switch (c)
-{
-case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
-case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
-case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
-case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
-case 'Y': case 'Z':
-case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
-case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
-case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
-case 's': case 't': case 'u': case 'v': case 'w': case 'x':
-case 'y': case 'z':
-case '0': case '1': case '2': case '3': case '4':
-case '5': case '6': case '7': case '8': case '9':
-case '_': case '.': case '$': case '/':
-  return false;
-default:
-  return true;
-}
-}
-
-// Return whether the identifier needs to be translated because it
-// contains non-ASCII characters.
-
-static bool
-needs_encoding(const std::string& str)
-{
-  for (std::string::const_iterator p = str.begin();
-   p != str.end();
-   ++p)
-if (char_needs_encoding(*p))
-  return true;
-  return false;
-}
-
-// Pull the next UTF-8 character out of P and store it in *PC.  Return
-// the number of bytes read.
-
-static size_t
-fetch_utf8_char(const char* p, unsigned int* pc)
-{
-  unsigned char c = *p;
-  if ((c & 0x80) == 0)
-{
-  *pc = c;
-  return 1;
-}
-  size_t len = 0;
-  while ((c & 0x80) != 0)
-{
-  ++len;
-  c <<= 1;
-}
-  unsigned int rc = *p & ((1 << (7 - len)) - 1);
-  for (size_t i = 1; i < len; i++)
-{
-  unsigned int u = p[i];
-  rc <<= 6;
-  rc |= u & 0x3f;
-}
-  *pc = rc;
-  return len;
-}
-
-// Encode an identifier using ASCII characters.
-
-static std::string
-encode_id(const std::string id)
-{
-  std::string ret;
-  const char* p = id.c_str();
-  const char* pend = p + id.length();
-  while (p < pend)
-{
-  unsigned int c;
-  size_t len = 

[PATCH] 78461 - [7 Regression] ICE: in operator+=, at gimple-ssa-sprintf.c:214

2016-11-22 Thread Martin Sebor

With r242674 having enabled the -fprintf-return-value option by
default, when warnings are disabled the gimple-ssa-sprintf pass
is now exercised in ways it was not being tested.  One of these
untested use cases exposed a bug in the logic used to compute
the minimum number of bytes output by a %.*s directive with
a known precision and a string of unknown length.  The bug
manifested itself by triggering an ICE. The attached patch
corrects this problem.

Thanks
Martin
PR middle-end/78461 - [7 Regression] ICE: in operator+=

gcc/testsuite/ChangeLog:

	PR middle-end/78461
	* gcc.dg/tree-ssa/builtin-sprintf-4.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Adjust warning text.

gcc/ChangeLog:

	PR middle-end/78461
	* gimple-ssa-sprintf.c (format_string): Correct the maxima and
	set the minimum number of bytes for an unknown string to zero.

Index: gcc/gimple-ssa-sprintf.c
===
--- gcc/gimple-ssa-sprintf.c	(revision 242703)
+++ gcc/gimple-ssa-sprintf.c	(working copy)
@@ -1533,18 +1533,15 @@ format_string (const conversion_spec , tree a
   fmtresult res;
 
   /* The maximum number of bytes for an unknown wide character argument
- to a "%lc" directive adjusted for precision but not field width.  */
+ to a "%lc" directive adjusted for precision but not field width.
+ 6 is the longest UTF-8 sequence for a single wide character.  */
   const unsigned HOST_WIDE_INT max_bytes_for_unknown_wc
-= (1 == warn_format_length ? 0 <= prec ? prec : 0
-   : 2 == warn_format_length ? 0 <= prec ? prec : 1
-   : 0 <= prec ? prec : 6 /* Longest UTF-8 sequence.  */);
+= (0 <= prec ? prec : 1 < warn_format_length ? 6 : 1);
 
   /* The maximum number of bytes for an unknown string argument to either
  a "%s" or "%ls" directive adjusted for precision but not field width.  */
   const unsigned HOST_WIDE_INT max_bytes_for_unknown_str
-= (1 == warn_format_length ? 0 <= prec ? prec : 0
-   : 2 == warn_format_length ? 0 <= prec ? prec : 1
-   : HOST_WIDE_INT_MAX);
+= (0 <= prec ? prec : 1 < warn_format_length);
 
   /* The result is bounded unless overriddden for a non-constant string
  of an unknown length.  */
@@ -1648,7 +1645,7 @@ format_string (const conversion_spec , tree a
 	  if (0 <= prec)
 	{
 	  if (slen.range.min >= target_int_max ())
-		slen.range.min = max_bytes_for_unknown_str;
+		slen.range.min = 0;
 	  else if ((unsigned)prec < slen.range.min)
 		slen.range.min = prec;
 
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-4.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-4.c	(working copy)
@@ -0,0 +1,69 @@
+/* PR middle-end/78461 - [7 Regression] ICE: in operator+=, at
+   gimple-ssa-sprintf.c:214
+   Disable warnings to exercise code paths through the pass that may
+   not be exercised when the -Wformat-length option is in effect.  */
+/* { dg-compile }
+   { dg-options "-O2 -fdump-tree-optimized -w" } */
+
+
+#define CAT(s, n)   s ## n
+#define FAIL(line)  CAT (failure_on_line_, line)
+
+/* Emit a call to a function named failure_on_line_NNN when EXPR is false.  */
+#define ASSERT(expr)\
+  do {		\
+extern void FAIL (__LINE__)(void);		\
+if (!(expr)) FAIL (__LINE__)();		\
+  } while (0)
+
+#define KEEP(line)  CAT (keep_call_on_line_, line)
+
+/* Emit a call to a function named keep_call_on_line_NNN when EXPR is true.
+   Used to verify that the expression need not be the only one that holds.  */
+#define ASSERT_MAYBE(expr)			\
+  do {		\
+extern void KEEP (__LINE__)(void);		\
+if (expr) KEEP (__LINE__)();		\
+  } while (0)
+
+int f0 (const char *s)
+{
+  int n = __builtin_snprintf (0, 0, "%.*s%08x", 1, s, 1);
+
+  ASSERT (7 < n && n < 10);
+
+  ASSERT_MAYBE (8 == n);
+  ASSERT_MAYBE (9 == n);
+
+  return n;
+}
+
+char buf[64];
+
+int f1 (const char *s)
+{
+  int n = __builtin_snprintf (buf, 64, "%.*s%08x", 1, s, 1);
+
+  ASSERT (7 < n && n < 10);
+
+  ASSERT_MAYBE (8 == n);
+  ASSERT_MAYBE (9 == n);
+
+  return n;
+}
+
+int f2 (const char *s)
+{
+  int n = __builtin_snprintf (0, 0, "%.*s", 2, s);
+
+  ASSERT (0 <= n && n <= 2);
+
+  ASSERT_MAYBE (0 == n);
+  ASSERT_MAYBE (1 == n);
+  ASSERT_MAYBE (2 == n);
+
+  return n;
+}
+
+/* { dg-final { scan-tree-dump-not "failure_on_line" "optimized"} }
+   { dg-final { scan-tree-dump-times "keep_call_on_line" 7 "optimized"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c	(revision 242703)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-2.c	(working copy)
@@ -93,10 +93,10 @@ void test_s_nonconst (const char *s, const wchar_t
   T (1, "%s",   s); /* { dg-warning "nul past the end" "sprintf transformed into strcpy" { xfail 

Re: [PATCH] PR fortran/78479 -- allocate a charlen

2016-11-22 Thread Steve Kargl
On Tue, Nov 22, 2016 at 10:21:52PM +0100, Janus Weil wrote:
> 
> > The patch and ChangeLog shuod be sufficient to explain the change.
> > Regression tested on x86_64-*-freebsd.  OK to commit?
> 
> the patch itself looks good.
> 

Thanks.

> For the test case, I'd prefer a somewhat more meaningful name (e.g.
> char_component_initializer_3.f90 or similar) and a mention of the PR
> number in a comment inside the test case.

At one time, I also preferred a meaningful name, but have
changed over time to using the PR number as the name.  There
are various pros and cons for using the PR number as the name.
In this, as char*_[1,2].f90 already exist, I rename the testcase.

-- 
Steve


Re: [PATCH 7/9] Add RTL-error-handling to host

2016-11-22 Thread Richard Sandiford
David Malcolm  writes:
> +inline file_location::file_location (const char *filename_in, int lineno_in, 
> int colno_in)
> +: filename (filename_in), lineno (lineno_in), colno (colno_in) {}
> +

Long line (a pre-existing problem, since you're just moving the code).

I'm happy with this FWIW, but it'd be a stretch to say the whole thing
comes under the gen* umbrella.

Thanks,
Richard


Re: [PATCH 6/9] Split class rtx_reader into md_reader vs rtx_reader

2016-11-22 Thread Richard Sandiford
Sorry, only just realised that this one hadn't been approved as
part of the earlier series.

David Malcolm  writes:
> gcc/ChangeLog:
>   * genpreds.c (write_tm_constrs_h): Update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (write_tm_preds_h): Likewise.
>   (write_insn_preds_c): Likewise.
>   * read-md.c (rtx_reader_ptr): Rename to...
>   (md_reader_ptr): ...this, and convert from an
>   rtx_reader * to a md_reader *.
>   (rtx_reader::set_md_ptr_loc): Rename to...
>   (md_reader::set_md_ptr_loc): ...this.
>   (rtx_reader::get_md_ptr_loc): Rename to...
>   (md_reader::get_md_ptr_loc): ...this.
>   (rtx_reader::copy_md_ptr_loc): Rename to...
>   (md_reader::copy_md_ptr_loc): ...this.
>   (rtx_reader::fprint_md_ptr_loc): Rename to...
>   (md_reader::fprint_md_ptr_loc): ...this.
>   (rtx_reader::print_md_ptr_loc): Rename to...
>   (md_reader::print_md_ptr_loc): ...this.
>   (rtx_reader::join_c_conditions): Rename to...
>   (md_reader::join_c_conditions): ...this.
>   (rtx_reader::fprint_c_condition): ...this.
>   (rtx_reader::print_c_condition): Rename to...
>   (md_reader::print_c_condition): ...this.
>   (fatal_with_file_and_line):  Update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (rtx_reader::require_char): Rename to...
>   (md_reader::require_char): ...this.
>   (rtx_reader::require_char_ws): Rename to...
>   (md_reader::require_char_ws): ...this.
>   (rtx_reader::require_word_ws): Rename to...
>   (md_reader::require_word_ws): ...this.
>   (rtx_reader::read_char): Rename to...
>   (md_reader::read_char): ...this.
>   (rtx_reader::unread_char): Rename to...
>   (md_reader::unread_char): ...this.
>   (rtx_reader::peek_char): Rename to...
>   (md_reader::peek_char): ...this.
>   (rtx_reader::read_name): Rename to...
>   (md_reader::read_name): ...this.
>   (rtx_reader::read_escape): Rename to...
>   (md_reader::read_escape): ...this.
>   (rtx_reader::read_quoted_string): Rename to...
>   (md_reader::read_quoted_string): ...this.
>   (rtx_reader::read_braced_string): Rename to...
>   (md_reader::read_braced_string): ...this.
>   (rtx_reader::read_string): Rename to...
>   (md_reader::read_string): ...this.
>   (rtx_reader::read_skip_construct): Rename to...
>   (md_reader::read_skip_construct): ...this.
>   (rtx_reader::handle_constants): Rename to...
>   (md_reader::handle_constants): ...this.
>   (rtx_reader::traverse_md_constants): Rename to...
>   (md_reader::traverse_md_constants): ...this.
>   (rtx_reader::handle_enum): Rename to...
>   (md_reader::handle_enum): ...this.
>   (rtx_reader::lookup_enum_type): Rename to...
>   (md_reader::lookup_enum_type): ...this.
>   (rtx_reader::traverse_enum_types): Rename to...
>   (md_reader::traverse_enum_types): ...this.
>   (rtx_reader::rtx_reader): Rename to...
>   (md_reader::md_reader): ...this, and update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (rtx_reader::~rtx_reader): Rename to...
>   (md_reader::~md_reader): ...this, and update for renaming of
>   rtx_reader_ptr to md_reader_ptr.
>   (rtx_reader::handle_include): Rename to...
>   (md_reader::handle_include): ...this.
>   (rtx_reader::handle_file): Rename to...
>   (md_reader::handle_file): ...this.
>   (rtx_reader::handle_toplevel_file): Rename to...
>   (md_reader::handle_toplevel_file): ...this.
>   (rtx_reader::get_current_location): Rename to...
>   (md_reader::get_current_location): ...this.
>   (rtx_reader::add_include_path): Rename to...
>   (md_reader::add_include_path): ...this.
>   (rtx_reader::read_md_files): Rename to...
>   (md_reader::read_md_files): ...this.
>   * read-md.h (class rtx_reader): Split into...
>   (class md_reader): ...new class.
>   (rtx_reader_ptr): Rename to...
>   (md_reader_ptr): ...this, and convert to a md_reader *.
>   (class noop_reader): Update base class to be md_reader.
>   (class rtx_reader): Reintroduce as a subclass of md_reader.
>   (rtx_reader_ptr): Reintroduce as a rtx_reader *.
>   (read_char): Update for renaming of rtx_reader_ptr to
>   md_reader_ptr.
>   (unread_char): Likewise.
>   * read-rtl.c (rtx_reader_ptr): New global.
>   (rtx_reader::apply_iterator_to_string): Rename to...
>   (md_reader::apply_iterator_to_string): ...this.
>   (rtx_reader::copy_rtx_for_iterators): Rename to...
>   (md_reader::copy_rtx_for_iterators): ...this.
>   (rtx_reader::read_conditions): Rename to...
>   (md_reader::read_conditions): ...this.
>   (rtx_reader::record_potential_iterator_use): Rename to...
>   (md_reader::record_potential_iterator_use): ...this.
>   (rtx_reader::read_mapping): Rename to...
>   

Re: [PATCH] PR fortran/78479 -- allocate a charlen

2016-11-22 Thread Janus Weil
Hi Steve,

> The patch and ChangeLog shuod be sufficient to explain the change.
> Regression tested on x86_64-*-freebsd.  OK to commit?

the patch itself looks good.

For the test case, I'd prefer a somewhat more meaningful name (e.g.
char_component_initializer_3.f90 or similar) and a mention of the PR
number in a comment inside the test case.

Thanks,
Janus



> 2016-11-22  Steven G. Kargl  
>
> PR fortran/78479
> * expr.c (gfc_apply_init):  Allocate a charlen if needed.
>
> 2016-11-22  Steven G. Kargl  
>
> PR fortran/78479
> * gfortran.dg/pr78479.f90: New test.
>
> --
> Steve


libgo patch committed: Don't check standard packages in go tool with gccgo

2016-11-22 Thread Ian Lance Taylor
When using the go tool with gccgo, we can't check for whether the
standard packages are up to date, because we can't assume that the
source code is available.  And we can't read
runtime/internal/sys/zversion.go, because that too is not generally
available.  This was fixed in the gc repository with
https://golang.org/cl/33295.  This patch simply brings that change
over to gccgo.  This fixes GCC PR 77910.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242715)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-7593cc83a03999331c5e2dc65a9306c5fe57dfd0
+e66f30e862cb5d02b9d55bf44ac439bb8fc4ea19
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/pkg.go
===
--- libgo/go/cmd/go/pkg.go  (revision 242581)
+++ libgo/go/cmd/go/pkg.go  (working copy)
@@ -523,6 +523,11 @@ func disallowInternal(srcDir string, p *
return p
}
 
+   // We can't check standard packages with gccgo.
+   if buildContext.Compiler == "gccgo" && p.Standard {
+   return p
+   }
+
// The stack includes p.ImportPath.
// If that's the only thing on the stack, we started
// with a name given on the command line, not an
@@ -1588,7 +1593,7 @@ func computeBuildID(p *Package) {
// Include the content of runtime/internal/sys/zversion.go in the hash
// for package runtime. This will give package runtime a
// different build ID in each Go release.
-   if p.Standard && p.ImportPath == "runtime/internal/sys" {
+   if p.Standard && p.ImportPath == "runtime/internal/sys" && 
buildContext.Compiler != "gccgo" {
data, err := ioutil.ReadFile(filepath.Join(p.Dir, 
"zversion.go"))
if err != nil {
fatalf("go: %s", err)


[PATCH] PR fortran/78479 -- allocate a charlen

2016-11-22 Thread Steve Kargl
The patch and ChangeLog shuod be sufficient to explain the change.
Regression tested on x86_64-*-freebsd.  OK to commit?

2016-11-22  Steven G. Kargl  

PR fortran/78479
* expr.c (gfc_apply_init):  Allocate a charlen if needed.

2016-11-22  Steven G. Kargl  

PR fortran/78479
* gfortran.dg/pr78479.f90: New test.

-- 
Steve
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c	(revision 242638)
+++ gcc/fortran/expr.c	(working copy)
@@ -4132,7 +4132,12 @@ gfc_apply_init (gfc_typespec *ts, symbol
 {
   gfc_set_constant_character_len (len, ctor->expr,
   has_ts ? -1 : first_len);
-  ctor->expr->ts.u.cl->length = gfc_copy_expr (ts->u.cl->length);
+		  if (!ctor->expr->ts.u.cl)
+		ctor->expr->ts.u.cl
+		  = gfc_new_charlen (gfc_current_ns, ts->u.cl);
+		  else
+ctor->expr->ts.u.cl->length
+		  = gfc_copy_expr (ts->u.cl->length);
 }
 }
 }
Index: gcc/testsuite/gfortran.dg/pr78479.f90
===
--- gcc/testsuite/gfortran.dg/pr78479.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78479.f90	(working copy)
@@ -0,0 +1,6 @@
+! { dg-do compile }
+program p
+   type t
+  character(3) :: c(1) = 'a' // ['b']
+   end type
+end


[C++ PATCH] Fix ICE during VEC_INIT_EXPR gimplification (PR c++/77739)

2016-11-22 Thread Jakub Jelinek
Hi!

As mentioned in the PR, we ICE because part of the body is genericized
twice and each time it wraps is_invisiref_parm RESULT_DECL (in this case,
could be also PARM_DEC) into REFERENCE_REF_P INDIRECT_REF.
The first time it is desirable, but when done again during VEC_INIT_EXPR
gimplification which calls cp_genericize_tree again, it is undesirable.

The following patch fixes it by only wrapping the invisiref parms/result
during the first cp_genericize_tree when the whole function is genericized.
I'd expect that any references to invisiref parms/result should be only
present in the VEC_INIT_EXPR arguments (which should be genericized already)
and that build_vec_init shouldn't create new ones out of the air.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

As mentioned in the PR, another option I see is special case
REFERENCE_REF_P INDIRECT_REFs and MEM_REFs into which they are gimplified
in cp_genericize_r by not changing is_invisiref_parm decls if they are
already wrapped in those.

2016-11-22  Jakub Jelinek  

PR c++/77739
* cp-gimplify.c (cp_gimplify_tree) : Pass
false as handle_invisiref_parm_p to cp_genericize_tree.
(struct cp_genericize_data): Add handle_invisiref_parm_p field.
(cp_genericize_r): Don't wrap is_invisiref_parm into references
if !wtd->handle_invisiref_parm_p.
(cp_genericize_tree): Add handle_invisiref_parm_p argument,
set wtd.handle_invisiref_parm_p to it.
(cp_genericize): Pass true as handle_invisiref_parm_p to
cp_genericize_tree.  Formatting fix.

* g++.dg/cpp1y/pr77739.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2016-11-15 16:18:49.0 +0100
+++ gcc/cp/cp-gimplify.c2016-11-22 19:12:07.606813783 +0100
@@ -38,7 +38,7 @@ along with GCC; see the file COPYING3.
 
 static tree cp_genericize_r (tree *, int *, void *);
 static tree cp_fold_r (tree *, int *, void *);
-static void cp_genericize_tree (tree*);
+static void cp_genericize_tree (tree*, bool);
 static tree cp_fold (tree);
 
 /* Local declarations.  */
@@ -623,7 +623,7 @@ cp_gimplify_expr (tree *expr_p, gimple_s
  tf_warning_or_error);
hash_set pset;
cp_walk_tree (expr_p, cp_fold_r, , NULL);
-   cp_genericize_tree (expr_p);
+   cp_genericize_tree (expr_p, false);
ret = GS_OK;
input_location = loc;
   }
@@ -995,6 +995,7 @@ struct cp_genericize_data
   struct cp_genericize_omp_taskreg *omp_ctx;
   tree try_block;
   bool no_sanitize_p;
+  bool handle_invisiref_parm_p;
 };
 
 /* Perform any pre-gimplification folding of C++ front end trees to
@@ -,7 +1112,7 @@ cp_genericize_r (tree *stmt_p, int *walk
 }
 
   /* Otherwise, do dereference invisible reference parms.  */
-  if (is_invisiref_parm (stmt))
+  if (wtd->handle_invisiref_parm_p && is_invisiref_parm (stmt))
 {
   *stmt_p = convert_from_reference (stmt);
   *walk_subtrees = 0;
@@ -1511,7 +1512,7 @@ cp_genericize_r (tree *stmt_p, int *walk
 /* Lower C++ front end trees to GENERIC in T_P.  */
 
 static void
-cp_genericize_tree (tree* t_p)
+cp_genericize_tree (tree* t_p, bool handle_invisiref_parm_p)
 {
   struct cp_genericize_data wtd;
 
@@ -1520,6 +1521,7 @@ cp_genericize_tree (tree* t_p)
   wtd.omp_ctx = NULL;
   wtd.try_block = NULL_TREE;
   wtd.no_sanitize_p = false;
+  wtd.handle_invisiref_parm_p = handle_invisiref_parm_p;
   cp_walk_tree (t_p, cp_genericize_r, , NULL);
   delete wtd.p_set;
   wtd.bind_expr_stack.release ();
@@ -1639,12 +1641,12 @@ cp_genericize (tree fndecl)
   /* Expand all the array notations here.  */
   if (flag_cilkplus 
   && contains_array_notation_expr (DECL_SAVED_TREE (fndecl)))
-DECL_SAVED_TREE (fndecl) = 
-  expand_array_notation_exprs (DECL_SAVED_TREE (fndecl));
+DECL_SAVED_TREE (fndecl)
+  = expand_array_notation_exprs (DECL_SAVED_TREE (fndecl));
 
   /* We do want to see every occurrence of the parms, so we can't just use
  walk_tree's hash functionality.  */
-  cp_genericize_tree (_SAVED_TREE (fndecl));
+  cp_genericize_tree (_SAVED_TREE (fndecl), true);
 
   if (flag_sanitize & SANITIZE_RETURN
   && do_ubsan_in_current_function ())
--- gcc/testsuite/g++.dg/cpp1y/pr77739.C.jj 2016-11-22 19:15:02.182659407 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/pr77739.C2016-11-22 19:13:37.0 
+0100
@@ -0,0 +1,15 @@
+// PR c++/77739
+// { dg-do compile { target c++14 } }
+
+struct A {
+  A();
+  A(const A &);
+};
+struct B {
+  B();
+  template  auto g(Args &&... p1) {
+return [=] { f(p1...); };
+  }
+  void f(A, const char *);
+};
+B::B() { g(A(), ""); }

Jakub


[patch] boehm-gc removal and libobjc changes to build with an external bdw-gc

2016-11-22 Thread Matthias Klose
Re-posting this top-level, discussions and review happened in the GCJ removal
thread:

 - https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02069.html (last patch
   review).
 - https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00387.html (first patch
   sent)
 - https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00290.html (OK by Jeff
   Law to remove boehm-gc)

Afaiu, it needs an ok by a global reviewer, and maybe an libobjc maintainer (?).

Matthias



2016-11-19  Matthias Klose  

* Makefile.def: Remove reference to boehm-gc target module.
* configure.ac: Include pkg.m4, check for --with-target-bdw-gc
options and for the bdw-gc pkg-config module.
* configure: Regenerate.
* Makefile.in: Regenerate.

gcc/

2016-11-19  Matthias Klose  

* doc/install.texi: Document configure options --enable-objc-gc
and --with-target-bdw-gc.

config/

2016-11-19  Matthias Klose  

* pkg.m4: New file.

libobjc/

2016-11-19  Matthias Klose  

* configure.ac (--enable-objc-gc): Allow to configure with a
system provided boehm-gc.
* configure: Regenerate.
* Makefile.in (OBJC_BOEHM_GC_LIBS): Get value from configure.
* gc.c: Include system bdw-gc headers.
* memory.c: Likewise
* objects.c: Likewise

boehm-gc/

2016-11-19  Matthias Klose  

Remove




2016-11-19  Matthias Klose  

	* Makefile.def: Remove reference to boehm-gc target module.
  	* configure.ac: Include pkg.m4, check for --with-target-bdw-gc
	options and for the bdw-gc pkg-config module.
	* configure: Regenerate.
	* Makefile.in: Regenerate.

gcc/

2016-11-19  Matthias Klose  

	* doc/install.texi: Document configure options --enable-objc-gc
	and --with-target-bdw-gc.

config/

2016-11-19  Matthias Klose  

	* pkg.m4: New file.

libobjc/

2016-11-19  Matthias Klose  

	* configure.ac (--enable-objc-gc): Allow to configure with a
	system provided boehm-gc.
	* configure: Regenerate.
	* Makefile.in (OBJC_BOEHM_GC_LIBS): Get value from configure.
	* gc.c: Include system bdw-gc headers.
	* memory.c: Likewise
	* objects.c: Likewise

boehm-gc/

2016-11-19  Matthias Klose  

	Remove

Index: Makefile.def
===
--- Makefile.def	(revision 242721)
+++ Makefile.def	(working copy)
@@ -166,7 +166,6 @@
 target_modules = { module= libgloss; no_check=true; };
 target_modules = { module= libffi; no_install=true; };
 target_modules = { module= zlib; };
-target_modules = { module= boehm-gc; };
 target_modules = { module= rda; };
 target_modules = { module= libada; };
 target_modules = { module= libgomp; bootstrap= true; lib_path=.libs; };
@@ -543,7 +542,6 @@
 // a dependency on libgcc for native targets to configure.
 lang_env_dependencies = { module=libiberty; no_c=true; };
 
-dependencies = { module=configure-target-boehm-gc; on=all-target-libstdc++-v3; };
 dependencies = { module=configure-target-fastjar; on=configure-target-zlib; };
 dependencies = { module=all-target-fastjar; on=all-target-zlib; };
 dependencies = { module=configure-target-libgo; on=configure-target-libffi; };
@@ -551,8 +549,6 @@
 dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
 dependencies = { module=all-target-libgo; on=all-target-libffi; };
 dependencies = { module=all-target-libgo; on=all-target-libatomic; };
-dependencies = { module=configure-target-libobjc; on=configure-target-boehm-gc; };
-dependencies = { module=all-target-libobjc; on=all-target-boehm-gc; };
 dependencies = { module=configure-target-libstdc++-v3; on=configure-target-libgomp; };
 dependencies = { module=configure-target-liboffloadmic; on=configure-target-libgomp; };
 dependencies = { module=configure-target-libsanitizer; on=all-target-libstdc++-v3; };
Index: config/pkg.m4
===
--- config/pkg.m4	(nonexistent)
+++ config/pkg.m4	(working copy)
@@ -0,0 +1,825 @@
+dnl pkg.m4 - Macros to locate and utilise pkg-config.   -*- Autoconf -*-
+dnl serial 11 (pkg-config-0.29)
+dnl
+dnl Copyright © 2004 Scott James Remnant .
+dnl Copyright © 2012-2015 Dan Nicholson 
+dnl
+dnl This program is free software; you can redistribute it and/or modify
+dnl it under the terms of the GNU General Public License as published by
+dnl the Free Software Foundation; either version 2 of the License, or
+dnl (at your option) any later version.
+dnl
+dnl This program is distributed in the hope that it will be useful, but
+dnl WITHOUT ANY WARRANTY; without even the implied warranty of
+dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+dnl General Public License for more details.
+dnl
+dnl You should have received a copy of the GNU General Public License
+dnl along with this program; if 

Re: [PATCH] Replace _mm_setzero_[hd]i with _mm_setzero_si128 (PR target/78451)

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 05:36:38PM +0100, Uros Bizjak wrote:
> > Note that there is still _mm512_setzero_qi and _mm512_setzero_hi,
> > shall those be replaced with _mm512_setzero_si512 too?
> > Even those 2 aren't mentioned in ICC headers nor AVX512 manuals.
> 
> Yes, please also remove these two.
> 
> Patch to replace them with _mm512_setzero_si512 is pre-approved.

Ok, here is what I've committed after another bootstrap/regtest on
x86_64-linux and i686-linux:

2016-11-22  Jakub Jelinek  

PR target/78451
* config/i386/avx512bwintrin.h (_mm512_setzero_qi,
_mm512_setzero_hi): Removed.
(_mm512_maskz_mov_epi16, _mm512_maskz_loadu_epi16,
_mm512_maskz_mov_epi8, _mm512_maskz_loadu_epi8,
_mm512_maskz_broadcastb_epi8, _mm512_maskz_set1_epi8,
_mm512_maskz_broadcastw_epi16, _mm512_maskz_set1_epi16,
_mm512_mulhrs_epi16, _mm512_maskz_mulhrs_epi16, _mm512_mulhi_epi16,
_mm512_maskz_mulhi_epi16, _mm512_mulhi_epu16,
_mm512_maskz_mulhi_epu16, _mm512_maskz_mullo_epi16,
_mm512_cvtepi8_epi16, _mm512_maskz_cvtepi8_epi16, _mm512_cvtepu8_epi16,
_mm512_maskz_cvtepu8_epi16, _mm512_permutexvar_epi16,
_mm512_maskz_permutexvar_epi16, _mm512_avg_epu8, _mm512_maskz_avg_epu8,
_mm512_maskz_add_epi8, _mm512_maskz_sub_epi8, _mm512_avg_epu16,
_mm512_maskz_avg_epu16, _mm512_subs_epi8, _mm512_maskz_subs_epi8,
_mm512_subs_epu8, _mm512_maskz_subs_epu8, _mm512_adds_epi8,
_mm512_maskz_adds_epi8, _mm512_adds_epu8, _mm512_maskz_adds_epu8,
_mm512_maskz_sub_epi16, _mm512_subs_epi16, _mm512_maskz_subs_epi16,
_mm512_subs_epu16, _mm512_maskz_subs_epu16, _mm512_maskz_add_epi16,
_mm512_adds_epi16, _mm512_maskz_adds_epi16, _mm512_adds_epu16,
_mm512_maskz_adds_epu16, _mm512_srl_epi16, _mm512_maskz_srl_epi16,
_mm512_packs_epi16, _mm512_sll_epi16, _mm512_maskz_sll_epi16,
_mm512_maddubs_epi16, _mm512_maskz_maddubs_epi16, _mm512_unpackhi_epi8,
_mm512_maskz_unpackhi_epi8, _mm512_unpackhi_epi16,
_mm512_maskz_unpackhi_epi16, _mm512_unpacklo_epi8,
_mm512_maskz_unpacklo_epi8, _mm512_unpacklo_epi16,
_mm512_maskz_unpacklo_epi16, _mm512_shuffle_epi8,
_mm512_maskz_shuffle_epi8, _mm512_min_epu16, _mm512_maskz_min_epu16,
_mm512_min_epi16, _mm512_maskz_min_epi16, _mm512_max_epu8,
_mm512_maskz_max_epu8, _mm512_max_epi8, _mm512_maskz_max_epi8,
_mm512_min_epu8, _mm512_maskz_min_epu8, _mm512_min_epi8,
_mm512_maskz_min_epi8, _mm512_max_epi16, _mm512_maskz_max_epi16,
_mm512_max_epu16, _mm512_maskz_max_epu16, _mm512_sra_epi16,
_mm512_maskz_sra_epi16, _mm512_srav_epi16, _mm512_maskz_srav_epi16,
_mm512_srlv_epi16, _mm512_maskz_srlv_epi16, _mm512_sllv_epi16,
_mm512_maskz_sllv_epi16, _mm512_maskz_packs_epi16, _mm512_packus_epi16,
_mm512_maskz_packus_epi16, _mm512_abs_epi8, _mm512_maskz_abs_epi8,
_mm512_abs_epi16, _mm512_maskz_abs_epi16, _mm512_dbsad_epu8,
_mm512_maskz_dbsad_epu8, _mm512_srli_epi16, _mm512_maskz_srli_epi16,
_mm512_slli_epi16, _mm512_maskz_slli_epi16, _mm512_shufflehi_epi16,
_mm512_maskz_shufflehi_epi16, _mm512_shufflelo_epi16,
_mm512_maskz_shufflelo_epi16, _mm512_srai_epi16,
_mm512_maskz_srai_epi16, _mm512_packs_epi32,
_mm512_maskz_packs_epi32, _mm512_packus_epi32,
_mm512_maskz_packus_epi32): Use _mm512_setzero_si512 instead of
_mm512_setzero_qi or _mm512_setzero_hi.
(_mm512_maskz_alignr_epi8, _mm512_dbsad_epu8,
_mm512_maskz_dbsad_epu8): Formatting fixes.
(_mm512_srli_epi16, _mm512_maskz_srli_epi16, _mm512_slli_epi16,
_mm512_maskz_slli_epi16, _mm512_shufflehi_epi16,
_mm512_maskz_shufflehi_epi16, _mm512_shufflelo_epi16,
_mm512_maskz_shufflelo_epi16, _mm512_srai_epi16,
_mm512_maskz_srai_epi16): Use _mm512_setzero_si512 instead of
_mm512_setzero_qi or _mm512_setzero_hi.

--- gcc/config/i386/avx512bwintrin.h.jj 2016-08-15 10:13:27.0 +0200
+++ gcc/config/i386/avx512bwintrin.h2016-11-22 18:18:04.664913960 +0100
@@ -42,30 +42,6 @@ typedef unsigned long long __mmask64;
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_setzero_qi (void)
-{
-  return __extension__ (__m512i)(__v64qi){ 0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0,
-  0, 0, 0, 0, 0, 0, 0, 0 };
-}
-
-extern __inline __m512i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))

Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-22 Thread Georg-Johann Lay

Denis Chertykov schrieb:

Do you have any objections, George ?


No, the last delta rev3 from 2016-11-10 looks fine to me.



2016-11-22 8:05 GMT+03:00 Pitchumani Sivanupandi
:

Ping!

On Monday 14 November 2016 07:03 PM, Pitchumani Sivanupandi wrote:

Ping!

On Thursday 10 November 2016 01:53 PM, Pitchumani Sivanupandi wrote:

On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote:

On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:

On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:

On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:

I have updated patch to include the flash size as well. Took that
info from
device headers (it was fed into crt's device information note section
also).


The new option would render -mn-flash superfluous, but we should
keep it for
backward compatibility.

Ok.

Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
LINK_RELAX_SPEC?  And what happens if relaxation is off?

Yes. Removed link_pmem_wrap from link_relax.
Disabling relaxation doesn't change -mpmem-wrap-around behavior.

flashsize-and-wrap-around.patch



diff --git a/gcc/config/avr/avr-mcus.def
b/gcc/config/avr/avr-mcus.def
index 6bcc6ff..9d4aa1a 100644



 /*



 /* Classic, > 8K, <= 64K.  */
-AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1)
-AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT43USB355__",0x0060, 0x0, 1)
-AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT76C711__",  0x0060, 0x0, 1)
+AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000)
+AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT76C711__",  0x0060, 0x0, 1, 0x4000)
+AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE,
"__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1)
 /* Classic, == 128K.  */
-AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2)
-AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2)
-AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
"__AVR_AT43USB320__",   0x0060, 0x0, 2)
+AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
0x0060, 0x0, 2, 0x2)
+AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
"__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
 /* Classic + MOVW + JMP/CALL.  */


If at43usb320 is in the wrong multilib, then this should be handled as
separate issue / patch together with its own PR. Sorry for the confusion.  I
just noticed that some fields don't match...

It is not even clear to me from the data sheet if avr3 is the correct
multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it also has
MUL) as there is no reference to the exact instruction set -- Atmochip will
know.

Moreover, such a change should be sync'ed with avr-libc as all multilib
stuff is hand-wired there: no use of --print-foo meta information retrieval
by avr-libc :-((

I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 for
this one.


Thats better. I've attached the updated patch. If OK, could someone
commit please?

I'll try if I could find some more info for AT43USB320.

Regards,
Pitchumani







Re: [PATCH] OpenACC routines -- c++ front end

2016-11-22 Thread Cesar Philippidis
On 11/11/2016 03:43 PM, Cesar Philippidis wrote:
> Like it's c FE counterpart, this contains the following changes:
> 
>  * Updates c_parser_oacc_shape_clause to accept a location_t
>argument in order to make the diagnostics more precise.
> 
>  * Adds support for the bind and nohost clauses.
> 
>  * Adds more diagnostics for invalid acc routines.
> 
> Is this patch OK for trunk?

Here is the updated version of the c++ OpenACC routine patch. It's
mostly the same as before, but now cp_parser_oacc_shape_clause no has a
dummy cp_parser argument like its c FE counterpart.

Is this patch ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/cp/
	* cp-tree.h (bind_decls_match): Declare.
	* decl.c (bind_decls_match): New function.
	* parser.c (cp_parser_omp_clause_name): 
	(cp_parser_oacc_simple_clause): Remove unused cp_parser argument.
	(cp_parser_oacc_shape_clause): New location_t loc argument.  Use it
	to report more accurate diagnostics.  Remove parser argument.
	(cp_parser_oacc_clause_bind): New function.
	(cp_parser_oacc_all_clauses): Handle OpenACC bind and nohost clauses.
	Update calls to c_parser_oacc_{simple,shape}_clause.
	(OACC_ROUTINE_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{BIND,NOHOST}.
	(cp_parser_oacc_routine): Update diagnostics.
	(cp_parser_late_parsing_oacc_routine): Likewise.
	(cp_finalize_oacc_routine): Likewise.
	* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_{BIND,NOHOST}.


diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5674886..c9dbc4f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5785,6 +5785,7 @@ extern void finish_scope			(void);
 extern void push_switch(tree);
 extern void pop_switch(void);
 extern tree make_lambda_name			(void);
+extern int bind_decls_match			(tree, tree);
 extern int decls_match(tree, tree);
 extern tree duplicate_decls			(tree, tree, bool);
 extern tree declare_local_label			(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6893eae..09f9ffc 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -1198,6 +1198,138 @@ decls_match (tree newdecl, tree olddecl)
   return types_match;
 }
 
+/* Similiar to decls_match, but only applies to FUNCTION_DECLS.  Functions
+   in separate namespaces may match.
+*/
+
+int
+bind_decls_match (tree newdecl, tree olddecl)
+{
+  int types_match;
+
+  if (newdecl == olddecl)
+return 1;
+
+  if (TREE_CODE (newdecl) != TREE_CODE (olddecl))
+/* If the two DECLs are not even the same kind of thing, we're not
+   interested in their types.  */
+return 0;
+
+  gcc_assert (DECL_P (newdecl));
+  gcc_assert (TREE_CODE (newdecl) == FUNCTION_DECL);
+
+  tree f1 = TREE_TYPE (newdecl);
+  tree f2 = TREE_TYPE (olddecl);
+  tree p1 = TYPE_ARG_TYPES (f1);
+  tree p2 = TYPE_ARG_TYPES (f2);
+  tree r2;
+
+  /* Specializations of different templates are different functions
+ even if they have the same type.  */
+  tree t1 = (DECL_USE_TEMPLATE (newdecl)
+	 ? DECL_TI_TEMPLATE (newdecl)
+	 : NULL_TREE);
+  tree t2 = (DECL_USE_TEMPLATE (olddecl)
+	 ? DECL_TI_TEMPLATE (olddecl)
+	 : NULL_TREE);
+  if (t1 != t2)
+return 0;
+
+  if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl)
+  && TREE_CODE (CP_DECL_CONTEXT (newdecl)) != NAMESPACE_DECL
+  && TREE_CODE (CP_DECL_CONTEXT (olddecl)) != NAMESPACE_DECL
+  && ! (DECL_EXTERN_C_P (newdecl)
+	&& DECL_EXTERN_C_P (olddecl)))
+return 0;
+
+  /* A new declaration doesn't match a built-in one unless it
+ is also extern "C".  */
+  if (DECL_IS_BUILTIN (olddecl)
+  && DECL_EXTERN_C_P (olddecl) && !DECL_EXTERN_C_P (newdecl))
+return 0;
+
+  if (TREE_CODE (f1) != TREE_CODE (f2))
+return 0;
+
+  /* A declaration with deduced return type should use its pre-deduction
+ type for declaration matching.  */
+  r2 = fndecl_declared_return_type (olddecl);
+
+  if (same_type_p (TREE_TYPE (f1), r2))
+{
+  if (!prototype_p (f2) && DECL_EXTERN_C_P (olddecl)
+	  && (DECL_BUILT_IN (olddecl)
+#ifndef NO_IMPLICIT_EXTERN_C
+	  || (DECL_IN_SYSTEM_HEADER (newdecl) && !DECL_CLASS_SCOPE_P (newdecl))
+	  || (DECL_IN_SYSTEM_HEADER (olddecl) && !DECL_CLASS_SCOPE_P (olddecl))
+#endif
+	  ))
+	{
+	  types_match = self_promoting_args_p (p1);
+	  if (p1 == void_list_node)
+	TREE_TYPE (newdecl) = TREE_TYPE (olddecl);
+	}
+#ifndef NO_IMPLICIT_EXTERN_C
+  else if (!prototype_p (f1)
+	   && (DECL_EXTERN_C_P (olddecl)
+		   && DECL_IN_SYSTEM_HEADER (olddecl)
+		   && !DECL_CLASS_SCOPE_P (olddecl))
+	   && (DECL_EXTERN_C_P (newdecl)
+		   && DECL_IN_SYSTEM_HEADER (newdecl)
+		   && !DECL_CLASS_SCOPE_P (newdecl)))
+	{
+	  types_match = self_promoting_args_p (p2);
+	  TREE_TYPE (newdecl) = TREE_TYPE (olddecl);
+	}
+#endif
+  else
+	types_match =
+	  compparms (p1, p2)
+	  && type_memfn_rqual (f1) == type_memfn_rqual (f2)
+	  && (TYPE_ATTRIBUTES (TREE_TYPE (newdecl)) == NULL_TREE
+	  || comp_type_attributes 

Re: formatting cleanups

2016-11-22 Thread Nathan Sidwell

On 11/22/2016 01:48 PM, Jakub Jelinek wrote:


When you are already changing this, the = should be on the next line.


done


--
Nathan Sidwell
2016-11-22  Nathan Sidwell  

	* array-notation-common.c (cilkplus_extract_an_trplets): Fix
	indentation and formatting.

Index: c-family/array-notation-common.c
===
--- c-family/array-notation-common.c	(revision 242719)
+++ c-family/array-notation-common.c	(working copy)
@@ -629,12 +629,12 @@ cilkplus_extract_an_triplets (vec

Re: [PATCH] OpenACC routines -- middle end

2016-11-22 Thread Cesar Philippidis
On 11/22/2016 11:58 AM, Jakub Jelinek wrote:
> On Tue, Nov 22, 2016 at 11:53:50AM -0800, Cesar Philippidis wrote:
>> I've incorporated those changes in this patch. Is it ok for trunk?
> 
> The ChangeLog mentions omp-low.[ch] changes, but the patch doesn't include
> them.
> Have they been dropped, or moved to another patch?

No, sorry I forgot to include them in the diff. This patch should
contain all of the middle end changes.

Cesar

>> 2016-11-22  Cesar Philippidis  
>>  Thomas Schwinge  
>>
>>  gcc/c-family/
>>  * c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
>>  * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
>>  and PRAGMA_OACC_CLAUSE_NOHOST.
>>
>>  gcc/
>>  * gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
>>  OMP_CLAUSE_NOHOST.
>>  (gimplify_adjust_omp_clauses): Likewise.
>>  * omp-low.c (scan_sharing_clauses): Likewise.
>>  (verify_oacc_routine_clauses): New function.
>>  (maybe_discard_oacc_function): New function.
>>  (execute_oacc_device_lower): Don't generate code for NOHOST.
>>  * omp-low.h (verify_oacc_routine_clauses): Declare.
>>  * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
>>  OMP_CLAUSE_NOHOST.
>>  * tree-pretty-print.c (dump_omp_clause): Likewise.
>>  * tree.c (omp_clause_num_ops): Likewise.
>>  (omp_clause_code_name): Likewise.
>>  (walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
>>  * tree.h (OMP_CLAUSE_BIND_NAME): Define.
> 
>   Jakub
> 

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/c-family/
	* c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
	* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
	and PRAGMA_OACC_CLAUSE_NOHOST.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	(verify_oacc_routine_clauses): New function.
	(maybe_discard_oacc_function): New function.
	(execute_oacc_device_lower): Don't generate code for NOHOST.
	* omp-low.h (verify_oacc_routine_clauses): Declare.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (omp_clause_num_ops): Likewise.
	(omp_clause_code_name): Likewise.
	(walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
	* tree.h (OMP_CLAUSE_BIND_NAME): Define.


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 964efe9..4b8 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -322,7 +322,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_omp_declare_simd_attribute, false },
   { "simd",		  0, 1, true,  false, false,
 			  handle_simd_attribute, false },
-  { "omp declare target", 0, 0, true, false, false,
+  { "omp declare target", 0, -1, true, false, false,
 			  handle_omp_declare_target_attribute, false },
   { "omp declare target link", 0, 0, true, false, false,
 			  handle_omp_declare_target_attribute, false },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6d9cb08..dd2722a 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -149,6 +149,7 @@ enum pragma_omp_clause {
   /* Clauses for OpenACC.  */
   PRAGMA_OACC_CLAUSE_ASYNC = PRAGMA_CILK_CLAUSE_VECTORLENGTH + 1,
   PRAGMA_OACC_CLAUSE_AUTO,
+  PRAGMA_OACC_CLAUSE_BIND,
   PRAGMA_OACC_CLAUSE_COPY,
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
@@ -158,6 +159,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
+  PRAGMA_OACC_CLAUSE_NOHOST,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
   PRAGMA_OACC_CLAUSE_PRESENT,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8611060..04b591e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8373,6 +8373,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	  ctx->default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
 	}
@@ -9112,6 +9114,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  remove = true;
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	default:
 	  gcc_unreachable ();
 	}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7c58c03..b8a414b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2201,6 +2201,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	install_var_local (decl, ctx);
 	  break;
 
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
 	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE__CACHE_:
 	default:
@@ -2365,6 +2367,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	case 

Re: [PATCH] OpenACC routines -- fortran front end

2016-11-22 Thread Cesar Philippidis
On 11/18/2016 04:29 AM, Jakub Jelinek wrote:
> On Fri, Nov 11, 2016 at 03:44:07PM -0800, Cesar Philippidis wrote:
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -314,6 +314,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
> 
> Please add a newline after {.
> 
>>if (clauses)
>>  {
>>unsigned mask = 0;
>>  
>>if (clauses->gang)
>> -level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_GANG;
>> +}
>>if (clauses->worker)
>> -level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_WORKER;
>> +}
>>if (clauses->vector)
>> -level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_VECTOR;
>> +}
> 
> As you have {}s around, please use
>   level = GOMP_DIM_*;
>   mask |= GOMP_DIM_MASK (level);
>   ret = OACC_FUNCTION_*;
> 
>>if (clauses->seq)
>>  level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>>  
>>if (mask != (mask & -mask))
>> -gfc_error ("Multiple loop axes specified for routine");
>> +ret = OACC_FUNCTION_NONE;
>>  }
>>  
>> -  if (level < 0)
>> -level = GOMP_DIM_MAX;
>> -
>> -  return level;
>> +  return ret;
>>  }
>>  
>>  match
>>  gfc_match_oacc_routine (void)
>>  {
>>locus old_loc;
>> -  gfc_symbol *sym = NULL;
>>match m;
>> +  gfc_intrinsic_sym *isym = NULL;
>> +  gfc_symbol *sym = NULL;
>>gfc_omp_clauses *c = NULL;
>>gfc_oacc_routine_name *n = NULL;
>> +  oacc_function dims = OACC_FUNCTION_NONE;
>> +  bool seen_error = false;
>>  
>>old_loc = gfc_current_locus;
>>  
>> @@ -2287,45 +2314,52 @@ gfc_match_oacc_routine (void)
>>if (m == MATCH_YES)
>>  {
>>char buffer[GFC_MAX_SYMBOL_LEN + 1];
>> -  gfc_symtree *st;
>> +  gfc_symtree *st = NULL;
>>  
>>m = gfc_match_name (buffer);
>>if (m == MATCH_YES)
>>  {
>> -  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
>> +  if ((isym = gfc_find_function (buffer)) == NULL
>> +  && (isym = gfc_find_subroutine (buffer)) == NULL)
>> +{
>> +  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
>> +  if (st == NULL && gfc_current_ns->proc_name->attr.contained
> 
> Please add a newline before &&.
> 
>> +  && gfc_current_ns->parent)
>> +st = gfc_find_symtree (gfc_current_ns->parent->sym_root,
>> +   buffer);
>> +}
> 
>> @@ -5934,6 +6033,21 @@ gfc_resolve_oacc_blocks (gfc_code *code, 
>> gfc_namespace *ns)
>>ctx.private_iterators = new hash_set;
>>ctx.previous = omp_current_ctx;
>>ctx.is_openmp = false;
>> +
>> +  if (code->ext.omp_clauses->gang)
>> +dims = OACC_FUNCTION_GANG;
>> +  if (code->ext.omp_clauses->worker)
>> +dims = OACC_FUNCTION_WORKER;
>> +  if (code->ext.omp_clauses->vector)
>> +dims = OACC_FUNCTION_VECTOR;
>> +  if (code->ext.omp_clauses->seq)
>> +dims = OACC_FUNCTION_SEQ;
> 
> Shouldn't these be else if ?
>> +
>> +  if (dims == OACC_FUNCTION_NONE && ctx.previous != NULL
> 
> Again, as the whole condition doesn't fit on one line, please
> put && on a new line.
>> +  && !ctx.previous->is_openmp)
>> +dims = ctx.previous->dims;

I've address those issues in this patch. Is it ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  

	gcc/fortran/
	* gfortran.h (enum oacc_function): Make OACC_FUNCTION_SEQ the last
	entry the enum.
	(oacc_function_types): Declare.
	(symbol_attribute): Add oacc_function, oacc_function_nohost members.
	(gfc_omp_clauses): Add routine_bind, nohost, bind members.
	(gfc_oacc_routine_name): Add loc.
	(gfc_resolve_oacc_routine_call): Declare.
	(gfc_resolve_oacc_routines): Declare.
	* module.c (oacc_function): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set the oacc_function attribute.
	* openmp.c (enum omp_mask2): Add OMP_CLAUSE_BIND and OMP_CLAUSE_NOHOST.
	(gfc_match_omp_clauses): Likewise.
	(OACC_ROUTINE_CLAUSES): Add OMP_CLAUSE_BIND and OMP_CLAUSE_NOHOST.
	(gfc_oacc_routine_dims): Change the type of oacc_function from unsigned
	to an ENUM_BITFIELD.Move gfc_error to gfc_match_oacc_routine.  Return
	OACC_FUNCTION_NONE on error.
	(gfc_match_oacc_routine):  Make error reporting more
	precise.  Defer rejection of non-function and subroutine symbols
	until gfc_resolve_oacc_routines.
	(struct fortran_omp_context): Add a dims member.
	(gfc_resolve_oacc_blocks): Update ctx->dims.
	(gfc_resolve_oacc_routine_call): New function.
	(gfc_resolve_oacc_routines): New function.
	* resolve.c (resolve_function): Call 

Re: [PATCH] OpenACC routines -- middle end

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 11:53:50AM -0800, Cesar Philippidis wrote:
> I've incorporated those changes in this patch. Is it ok for trunk?

The ChangeLog mentions omp-low.[ch] changes, but the patch doesn't include
them.
Have they been dropped, or moved to another patch?

> 2016-11-22  Cesar Philippidis  
>   Thomas Schwinge  
> 
>   gcc/c-family/
>   * c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
>   * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
>   and PRAGMA_OACC_CLAUSE_NOHOST.
> 
>   gcc/
>   * gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
>   OMP_CLAUSE_NOHOST.
>   (gimplify_adjust_omp_clauses): Likewise.
>   * omp-low.c (scan_sharing_clauses): Likewise.
>   (verify_oacc_routine_clauses): New function.
>   (maybe_discard_oacc_function): New function.
>   (execute_oacc_device_lower): Don't generate code for NOHOST.
>   * omp-low.h (verify_oacc_routine_clauses): Declare.
>   * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
>   OMP_CLAUSE_NOHOST.
>   * tree-pretty-print.c (dump_omp_clause): Likewise.
>   * tree.c (omp_clause_num_ops): Likewise.
>   (omp_clause_code_name): Likewise.
>   (walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
>   * tree.h (OMP_CLAUSE_BIND_NAME): Define.

Jakub


Re: [PATCH] OpenACC routines -- c front end

2016-11-22 Thread Cesar Philippidis
On 11/18/2016 04:21 AM, Jakub Jelinek wrote:
> On Fri, Nov 11, 2016 at 03:43:23PM -0800, Cesar Philippidis wrote:
>> @@ -11801,12 +11807,11 @@ c_parser_oacc_shape_clause (c_parser *parser, 
>> omp_clause_code kind,
>>  }
>>  
>>location_t expr_loc = c_parser_peek_token (parser)->location;
>> -  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
>> -  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
>> -  tree expr = cexpr.value;
>> +  tree expr = c_parser_expr_no_commas (parser, NULL).value;
>>if (expr == error_mark_node)
>>  goto cleanup_error;
>>  
>> +  mark_exp_read (expr);
>>expr = c_fully_fold (expr, false, NULL);
>>  
>>/* Attempt to statically determine when the number isn't a
> 
> Why?  Are the arguments of the clauses lvalues?

The spec is unclear if those args must be constants or not. The only
time it explicitly mentions constant int-expr is for the tile clause,
which was added late. Gang, worker and vector were added early in the
1.0 spec, where things were defined somewhat loosely.

>> @@ -11867,12 +11872,12 @@ c_parser_oacc_shape_clause (c_parser *parser, 
>> omp_clause_code kind,
>> seq */
>>  
>>  static tree
>> -c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
>> - tree list)
>> +c_parser_oacc_simple_clause (c_parser * /* parser */, location_t loc,
> 
> Just leave it as c_parser *, or better yet remove the argument if you don't
> need it.

I removed that argument.

>> +  else
>> +{
>> +  //TODO? TREE_USED (decl) = 1;
> 
> This would be /* FIXME: TREE_USED (decl) = 1;  */
> but wouldn't it be better to figure out if you want to do that or not?

Thomas has more state on that, but it seems unneeded. The c++ FE doesn't
do that either, so I removed that comment.

Is this patch ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Handle OpenACC bind and
	nohost.
	(c_parser_oacc_shape_clause): New location_t loc argument.  Use it
	to report more accurate diagnostics.
	(c_parser_oacc_simple_clause): Likewise.
	(c_parser_oacc_clause_bind): New function.
	(c_parser_oacc_all_clauses): Handle OpenACC bind and nohost clauses.
	Update calls to c_parser_oacc_{simple,shape}_clause.
	(OACC_ROUTINE_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{BIND,NOHOST}.
	(c_parser_oacc_routine): Update diagnostics.
	(c_finish_oacc_routine): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_{BIND,NOHOST}.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 00fe731..fd87b54 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10408,6 +10408,10 @@ c_parser_omp_clause_name (c_parser *parser)
 	  else if (!strcmp ("async", p))
 	result = PRAGMA_OACC_CLAUSE_ASYNC;
 	  break;
+	case 'b':
+	  if (!strcmp ("bind", p))
+	result = PRAGMA_OACC_CLAUSE_BIND;
+	  break;
 	case 'c':
 	  if (!strcmp ("collapse", p))
 	result = PRAGMA_OMP_CLAUSE_COLLAPSE;
@@ -10489,6 +10493,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	result = PRAGMA_OMP_CLAUSE_NOTINBRANCH;
 	  else if (!strcmp ("nowait", p))
 	result = PRAGMA_OMP_CLAUSE_NOWAIT;
+	  else if (!strcmp ("nohost", p))
+	result = PRAGMA_OACC_CLAUSE_NOHOST;
 	  else if (!strcmp ("num_gangs", p))
 	result = PRAGMA_OACC_CLAUSE_NUM_GANGS;
 	  else if (!strcmp ("num_tasks", p))
@@ -11676,12 +11682,12 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 */
 
 static tree
-c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+c_parser_oacc_shape_clause (c_parser *parser, location_t loc,
+			omp_clause_code kind,
 			const char *str, tree list)
 {
   const char *id = "num";
   tree ops[2] = { NULL_TREE, NULL_TREE }, c;
-  location_t loc = c_parser_peek_token (parser)->location;
 
   if (kind == OMP_CLAUSE_VECTOR)
 id = "length";
@@ -11746,12 +11752,11 @@ c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
 	}
 
 	  location_t expr_loc = c_parser_peek_token (parser)->location;
-	  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
-	  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
-	  tree expr = cexpr.value;
+	  tree expr = c_parser_expr_no_commas (parser, NULL).value;
 	  if (expr == error_mark_node)
 	goto cleanup_error;
 
+	  mark_exp_read (expr);
 	  expr = c_fully_fold (expr, false, NULL);
 
 	  /* Attempt to statically determine when the number isn't a
@@ -11812,12 +11817,12 @@ c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
seq */
 
 static tree
-c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
+c_parser_oacc_simple_clause (location_t loc, enum omp_clause_code code,
 			 tree list)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
 
-  tree c = build_omp_clause (c_parser_peek_token (parser)->location, code);
+ 

Re: [PATCH] OpenACC routines -- middle end

2016-11-22 Thread Cesar Philippidis
On 11/18/2016 04:14 AM, Jakub Jelinek wrote:
> On Fri, Nov 11, 2016 at 03:43:02PM -0800, Cesar Philippidis wrote:
>> +error_at (OMP_CLAUSE_LOCATION (c),
>> +  "%qs specifies a conflicting level of parallelism",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
>> +inform (OMP_CLAUSE_LOCATION (c_level),
>> +"... to the previous %qs clause here",
> 
> I think the '... ' part is unnecessary.
> Perhaps word it better like we word errors/warnings for mismatched
> attributes etc.?
> 
>> +incompatible:
>> +  if (c_diag != NULL_TREE)
>> +error_at (OMP_CLAUSE_LOCATION (c_diag),
>> +  "incompatible %qs clause when applying"
>> +  " %<%s%> to %qD, which has already been"
>> +  " marked as an accelerator routine",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c_diag)],
>> +  routine_str, fndecl);
>> +  else if (c_diag_p != NULL_TREE)
>> +error_at (loc,
>> +  "missing %qs clause when applying"
>> +  " %<%s%> to %qD, which has already been"
>> +  " marked as an accelerator routine",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c_diag_p)],
>> +  routine_str, fndecl);
>> +  else
>> +gcc_unreachable ();
>> +  if (c_diag_p != NULL_TREE)
>> +inform (OMP_CLAUSE_LOCATION (c_diag_p),
>> +"... with %qs clause here",
>> +omp_clause_code_name[OMP_CLAUSE_CODE (c_diag_p)]);
> 
> Again, I think this usually would be something like "previous %qs clause"
> or similar in the inform.  Generally, I think the error message should
> be self-contained and infom should be just extra information, rather than
> error containing first half of the diagnostic message and inform the second
> one.  E.g. for translations, while such a sentence crossing the two
> diagnostic routines might make sense in english, it might look terrible in
> other languages.
> 
>> +  else
>> +{
>> +  /* In the front ends, we don't preserve location information for the
>> + OpenACC routine directive itself.  However, that of c_level_p
>> + should be close.  */
>> +  location_t loc_routine = OMP_CLAUSE_LOCATION (c_level_p);
>> +  inform (loc_routine, "... without %qs clause near to here",
>> +  omp_clause_code_name[OMP_CLAUSE_CODE (c_diag)]);
>> +}
>> +  /* Incompatible.  */
>> +  return -1;
>> +}
>> +
>> +  return 0;

I've incorporated those changes in this patch. Is it ok for trunk?

Cesar

2016-11-22  Cesar Philippidis  
	Thomas Schwinge  

	gcc/c-family/
	* c-attribs.c (c_common_attribute_table): Adjust "omp declare target".
	* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND
	and PRAGMA_OACC_CLAUSE_NOHOST.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Likewise.
	(verify_oacc_routine_clauses): New function.
	(maybe_discard_oacc_function): New function.
	(execute_oacc_device_lower): Don't generate code for NOHOST.
	* omp-low.h (verify_oacc_routine_clauses): Declare.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_BIND and
	OMP_CLAUSE_NOHOST.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
	* tree.c (omp_clause_num_ops): Likewise.
	(omp_clause_code_name): Likewise.
	(walk_tree_1): Handle OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST.
	* tree.h (OMP_CLAUSE_BIND_NAME): Define.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 964efe9..4b8 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -322,7 +322,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_omp_declare_simd_attribute, false },
   { "simd",		  0, 1, true,  false, false,
 			  handle_simd_attribute, false },
-  { "omp declare target", 0, 0, true, false, false,
+  { "omp declare target", 0, -1, true, false, false,
 			  handle_omp_declare_target_attribute, false },
   { "omp declare target link", 0, 0, true, false, false,
 			  handle_omp_declare_target_attribute, false },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 6d9cb08..dd2722a 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -149,6 +149,7 @@ enum pragma_omp_clause {
   /* Clauses for OpenACC.  */
   PRAGMA_OACC_CLAUSE_ASYNC = PRAGMA_CILK_CLAUSE_VECTORLENGTH + 1,
   PRAGMA_OACC_CLAUSE_AUTO,
+  PRAGMA_OACC_CLAUSE_BIND,
   PRAGMA_OACC_CLAUSE_COPY,
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
@@ -158,6 +159,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
+  PRAGMA_OACC_CLAUSE_NOHOST,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
   PRAGMA_OACC_CLAUSE_PRESENT,
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 

[PATCH, Fortran, accaf, v1] Add caf-API-calls to asynchronously handle allocatable components in derived type coarrays.

2016-11-22 Thread Andre Vehreschild
Hi all,

attached patch addresses the need of extending the API of the caf-libs to
enable allocatable components asynchronous allocation. Allocatable components
in derived type coarrays are different from regular coarrays or coarrayed
components. The latter have to be allocated on all images or on none.
Furthermore is the allocation a point of synchronisation.

For allocatable components the F2008 allows to have some allocated on some
images and on others not. Furthermore is the registration with the caf-lib, that
an allocatable component is present in a derived type coarray no longer a
synchronisation point. To implement these features two new types of coarray
registration have been introduced. The first one just registering the component
with the caf-lib and the latter doing the allocate. Furthermore has the caf-API
been extended to provide a query function to learn about the allocation status
of a component on a remote image. 

Sorry, that the patch is rather lengthy. Most of this is due to the
structure_alloc_comps' signature change. The routine and its wrappers are used
rather often which needed the appropriate changes.

I know I left two or three TODOs in the patch to remind me of things I have to
investigate further. For the current state these TODOs are no reason to hold
back the patch. The third party library opencoarrays implements the mpi-part of
the caf-model and will change in sync. It would of course be advantageous to
just have to say: With gcc-7 gfortran implements allocatable components in
derived coarrays nearly completely.

I know we are in stage 3. But the patch bootstraps and regtests ok on
x86_64-linux/F23. So, is it ok for trunk or shall it go to 7.2?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
libgfortran/ChangeLog:

2016-11-22  Andre Vehreschild  

* caf/libcaf.h: Add new action types for (de-)registration of
allocatable components in derived type coarrays.  Add _caf_is_present
prototype.
* caf/single.c (_gfortran_caf_register): Add support for registration
only and allocation of already registered allocatable components in
derived type coarrays.
(_gfortran_caf_deregister): Add mode to deallocate but not deregister
an allocatable component in a derived type coarray.
(_gfortran_caf_is_present): New function.  Query whether an
allocatable component in a derived type coarray on a remote image is
allocated.


gcc/testsuite/ChangeLog:

2016-11-22  Andre Vehreschild  

* gfortran.dg/coarray/alloc_comp_1.f90: Fix tree-dump scans to adhere
to the changed interfaces.
* gfortran.dg/coarray_alloc_comp_1.f08: Likewise.
* gfortran.dg/coarray_allocate_7.f08: Likewise.
* gfortran.dg/coarray_lib_alloc_1.f90: Likewise.
* gfortran.dg/coarray_lib_alloc_2.f90: Likewise.
* gfortran.dg/coarray_lib_alloc_3.f90: Likewise.
* gfortran.dg/coarray_lib_comm_1.f90: Likewise.
* gfortran.dg/coarray_lib_alloc_4.f90: New test.

gcc/fortran/ChangeLog:

2016-11-22  Andre Vehreschild  

* check.c (gfc_check_allocated): By pass the caf_get call and check on
the array.
* gfortran.h: Add optional flag to gfc_caf_attr.
* gfortran.texi: Document new enum values and _caf_is_present function.
* primary.c (caf_variable_attr): Add optional flag to indicate that the
expression is reffing a component.
(gfc_caf_attr): Likewise.
* trans-array.c (gfc_array_deallocate): Handle deallocation mode for
coarray deregistration.
(gfc_trans_dealloc_allocated): Likewise.
(duplicate_allocatable_coarray): This function is similar to
duplicate_allocatable but tailored to handle coarrays.
(structure_alloc_comps): A mode for handling coarrays, that is no
longer encode in the purpose.  This makes the use cases of the
routine more flexible without repeating.  Allocatable components in
derived type coarrays are now registered only when nullifying an
object and allocated before copying data into them.
(gfc_nullify_alloc_comp): Use the caf_mode of structure_alloc_comps
now.
(gfc_deallocate_alloc_comp): Likewise.
(gfc_deallocate_alloc_comp_no_caf): Likewise.
(gfc_reassign_alloc_comp_caf): Likewise.
(gfc_copy_alloc_comp): Likewise.
(gfc_copy_only_alloc_comp): Likewise.
(gfc_alloc_allocatable_for_assignment): Make use to the cheaper way of
reallocating a coarray without deregistering and reregistering it.
(gfc_trans_deferred_array): Initialize the coarray token correctly for
deferred variables and tear them down on exit.
* trans-array.h: Change some prototypes to add the coarray (de-)
registration modes.
* trans-decl.c (gfc_build_builtin_function_decls): Generate 

Re: formatting cleanups

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 01:45:07PM -0500, Nathan Sidwell wrote:
> - tree ii_tree = array_exprs[ii][jj];
> - (*node)[ii][jj].is_vector = true;
> - (*node)[ii][jj].value = ARRAY_NOTATION_ARRAY (ii_tree);
> - (*node)[ii][jj].start = ARRAY_NOTATION_START (ii_tree);
> - (*node)[ii][jj].length =
> -   fold_build1 (CONVERT_EXPR, integer_type_node,
> -ARRAY_NOTATION_LENGTH (ii_tree));
> - (*node)[ii][jj].stride =
> -   fold_build1 (CONVERT_EXPR, integer_type_node,
> -ARRAY_NOTATION_STRIDE (ii_tree));
> -   }
> +  for (size_t ii = 0; ii < size; ii++)
> +if (TREE_CODE ((*list)[ii]) == ARRAY_NOTATION_REF)
> +  for (size_t jj = 0; jj < rank; jj++)
> + {
> +   tree ii_tree = array_exprs[ii][jj];
> +   (*node)[ii][jj].is_vector = true;
> +   (*node)[ii][jj].value = ARRAY_NOTATION_ARRAY (ii_tree);
> +   (*node)[ii][jj].start = ARRAY_NOTATION_START (ii_tree);
> +   (*node)[ii][jj].length =
> + fold_build1 (CONVERT_EXPR, integer_type_node,
> +  ARRAY_NOTATION_LENGTH (ii_tree));
> +   (*node)[ii][jj].stride =
> + fold_build1 (CONVERT_EXPR, integer_type_node,
> +  ARRAY_NOTATION_STRIDE (ii_tree));

When you are already changing this, the = should be on the next line.

Jakub


formatting cleanups

2016-11-22 Thread Nathan Sidwell

I noticed some wonky formatting.  Fixed as obvious.

nathan
--
Nathan Sidwell
2016-11-22  Nathan Sidwell  

	gcc/
	* gcc-ar.c (main): Fix indentation.
	* gcov-io.c (gcov_write_summary): Remove extraneous {...}
	* ggc-page.c (move_ptes_to_front): Fix formatting.
	* hsa-dump.c (dump_has_cfun): Fix indentation.
	* sel-sched-ir.h: Remove trailing blank lines.

	gcc/c-family/
	* array-notation-common.c (cilkplus_extrat_an_triplets): Fix
	indentation.

Index: gcc-ar.c
===
--- gcc-ar.c	(revision 242695)
+++ gcc-ar.c	(working copy)
@@ -162,7 +162,7 @@ main (int ac, char **av)
 
 	  len = strlen (arg);
 	  if (len > 0)
-		  len--;
+	len--;
 	  end = arg + len;
 
 	  /* Always add a dir separator for the prefix list.  */
Index: gcov-io.c
===
--- gcov-io.c	(revision 242695)
+++ gcov-io.c	(working copy)
@@ -421,13 +421,11 @@ gcov_write_summary (gcov_unsigned_t tag,
 histo_bitvector[bv_ix] = 0;
   csum = >ctrs[GCOV_COUNTER_ARCS];
   for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
-{
-  if (csum->histogram[h_ix].num_counters > 0)
-{
-  histo_bitvector[h_ix / 32] |= 1 << (h_ix % 32);
-  h_cnt++;
-}
-}
+if (csum->histogram[h_ix].num_counters)
+  {
+	histo_bitvector[h_ix / 32] |= 1 << (h_ix % 32);
+	h_cnt++;
+  }
   gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH (h_cnt));
   gcov_write_unsigned (summary->checksum);
   for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
Index: hsa-dump.c
===
--- hsa-dump.c	(revision 242695)
+++ hsa-dump.c	(working copy)
@@ -1130,10 +1130,10 @@ dump_hsa_cfun (FILE *f)
 }
 
   FOR_ALL_BB_FN (bb, cfun)
-  {
-hsa_bb *hbb = (struct hsa_bb *) bb->aux;
-dump_hsa_bb (f, hbb);
-  }
+{
+  hsa_bb *hbb = (struct hsa_bb *) bb->aux;
+  dump_hsa_bb (f, hbb);
+}
 }
 
 /* Dump textual representation of HSA IL instruction INSN to stderr.  */
Index: sel-sched-ir.h
===
--- sel-sched-ir.h	(revision 242695)
+++ sel-sched-ir.h	(working copy)
@@ -1669,11 +1669,3 @@ extern void alloc_sched_pools (void);
 extern void free_sched_pools (void);
 
 #endif /* GCC_SEL_SCHED_IR_H */
-
-
-
-
-
-
-
-
Index: c-family/array-notation-common.c
===
--- c-family/array-notation-common.c	(revision 242695)
+++ c-family/array-notation-common.c	(working copy)
@@ -621,21 +621,21 @@ cilkplus_extract_an_triplets (vec

Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-22 Thread Denis Chertykov
Do you have any objections, George ?

2016-11-22 8:05 GMT+03:00 Pitchumani Sivanupandi
:
> Ping!
>
> On Monday 14 November 2016 07:03 PM, Pitchumani Sivanupandi wrote:
>>
>> Ping!
>>
>> On Thursday 10 November 2016 01:53 PM, Pitchumani Sivanupandi wrote:
>>>
>>> On Wednesday 09 November 2016 08:05 PM, Georg-Johann Lay wrote:

 On 09.11.2016 10:14, Pitchumani Sivanupandi wrote:
>
> On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote:
>>
>> On 08.11.2016 08:08, Pitchumani Sivanupandi wrote:
>>>
>>> I have updated patch to include the flash size as well. Took that
>>> info from
>>> device headers (it was fed into crt's device information note section
>>> also).


 The new option would render -mn-flash superfluous, but we should
 keep it for
 backward compatibility.
>>>
>>> Ok.

 Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from
 LINK_RELAX_SPEC?  And what happens if relaxation is off?
>>>
>>> Yes. Removed link_pmem_wrap from link_relax.
>>> Disabling relaxation doesn't change -mpmem-wrap-around behavior.
>>> 
>>> flashsize-and-wrap-around.patch
>>
>>
>>> diff --git a/gcc/config/avr/avr-mcus.def
>>> b/gcc/config/avr/avr-mcus.def
>>> index 6bcc6ff..9d4aa1a 100644
>>
>>
>>>  /*
>>
>> 
>
>  /* Classic, > 8K, <= 64K.  */
> -AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
> 0x0060, 0x0, 1)
> -AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT43USB355__",0x0060, 0x0, 1)
> -AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT76C711__",  0x0060, 0x0, 1)
> +AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL,
> 0x0060, 0x0, 1, 0x6000)
> +AVR_MCU ("at43usb355",   ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT43USB355__",0x0060, 0x0, 1, 0x6000)
> +AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT76C711__",  0x0060, 0x0, 1, 0x4000)
> +AVR_MCU ("at43usb320",   ARCH_AVR3, AVR_ISA_NONE,
> "__AVR_AT43USB320__",0x0060, 0x0, 1, 0x1)
>  /* Classic, == 128K.  */
> -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
> 0x0060, 0x0, 2)
> -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
> "__AVR_ATmega103__", 0x0060, 0x0, 2)
> -AVR_MCU ("at43usb320",   ARCH_AVR31, AVR_ISA_NONE,
> "__AVR_AT43USB320__",   0x0060, 0x0, 2)
> +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,
> 0x0060, 0x0, 2, 0x2)
> +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP,
> "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2)
>  /* Classic + MOVW + JMP/CALL.  */


 If at43usb320 is in the wrong multilib, then this should be handled as
 separate issue / patch together with its own PR. Sorry for the confusion.  
 I
 just noticed that some fields don't match...

 It is not even clear to me from the data sheet if avr3 is the correct
 multilib or perhaps avr35 (if it supports MOVW) or even avr5 (if it also 
 has
 MUL) as there is no reference to the exact instruction set -- Atmochip will
 know.

 Moreover, such a change should be sync'ed with avr-libc as all multilib
 stuff is hand-wired there: no use of --print-foo meta information retrieval
 by avr-libc :-((

 I filed PR78275 and https://savannah.nongnu.org/bugs/index.php?49565 for
 this one.

>>> Thats better. I've attached the updated patch. If OK, could someone
>>> commit please?
>>>
>>> I'll try if I could find some more info for AT43USB320.
>>>
>>> Regards,
>>> Pitchumani
>>>
>>
>


Re: [patch,avr] Fix PR60300: Minor prologue improvement.

2016-11-22 Thread Denis Chertykov
2016-11-22 15:41 GMT+03:00 Georg-Johann Lay :
> This patch is a minor improvement of prologue length.  It now allows frame
> sizes of up to 11 to be allocated by RCALL + PUSH 0 sequences but limits the
> number of RCALLs to 3.
>
> The PR has some discussion on size vs. speed consideration w.r. to using
> RCALL in prologues, and following that I picked the rather arbitrary upper
> bound of 3 RCALLs.  The prior maximal frame size opt to such sequences was 6
> which also never produced more than 3 RCALLs.
>
> Ok for trunk?
>
>
> Johann
>
> gcc/
> PR target/60300
> * config/avr/constraints.md (Csp): Widen range to [-11..6].
> * config/avr/avr.c (avr_prologue_setup_frame): Limit number
> of RCALLs in prologue to 3.

Approved.


[PATCH, Fortran, cosmetics] Use convenience functions and constants

2016-11-22 Thread Andre Vehreschild
Hi all,

during more hacking on the allocatable components in derived type coarrays, I
encountered the improvable code fragments in the patch attached.

Bootstraps and regtests ok on x86_64-linux/F23. Ok for trunk?

Regards,
Andre

PS: The patch that motivated these changes follows as soon as its regtesting
has finished.
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


cosmetics_4.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 1708f7c..45e1369 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7855,9 +7859,7 @@ duplicate_allocatable (tree dest, tree src, tree type, int rank,

   if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (dest)))
 {
-  tmp = null_pointer_node;
-  tmp = fold_build2_loc (input_location, MODIFY_EXPR, type, dest, tmp);
-  gfc_add_expr_to_block (, tmp);
+  gfc_add_modify (, dest, fold_convert (type, null_pointer_node));
   null_data = gfc_finish_block ();

   gfc_init_block ();
@@ -7869,9 +7871,7 @@ duplicate_allocatable (tree dest, tree src, tree type, int rank,
   if (!no_malloc)
{
  tmp = gfc_call_malloc (, type, size);
- tmp = fold_build2_loc (input_location, MODIFY_EXPR, void_type_node,
-dest, fold_convert (type, tmp));
- gfc_add_expr_to_block (, tmp);
+ gfc_add_modify (, dest, fold_convert (type, tmp));
}

   if (!no_memcpy)
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index ba71a21..2e6ef2a 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -5093,8 +5103,8 @@ generate_coarray_sym_init (gfc_symbol *sym)
 build_int_cst (integer_type_node, reg_type),
 token, gfc_build_addr_expr (pvoid_type_node, desc),
 null_pointer_node, /* stat.  */
-null_pointer_node, /* errgmsg, errmsg_len.  */
-build_int_cst (integer_type_node, 0));
+null_pointer_node, /* errgmsg.  */
+integer_zero_node); /* errmsg_len.  */
   gfc_add_expr_to_block (_init_block, tmp);
   gfc_add_modify (_init_block, decl, fold_convert (TREE_TYPE (decl),
  gfc_conv_descriptor_data_get (desc)));


Go patch committed: Rewrite panic/defer code from C to Go

2016-11-22 Thread Ian Lance Taylor
This patch to the Go frontend and libgo rewrites the panic/defer code
from C to Go.  The actual stack unwind code is still in C, but the
rest of the code, notably all the memory allocation, is now in Go.
The names are changed to the names used in the Go 1.7 runtime, but the
code is necessarily somewhat different.

The __go_makefunc_can_recover function is dropped, as the uses of it
were removed by Richard Henderson's work in
https://golang.org/cl/198770044.

This moves more memory allocation from C to Go, which will simplify
the move to the new concurrent garbage collector.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2016-11-22  Ian Lance Taylor  

* go-gcc.cc (Gcc_backend::Gcc_backend): Add builtin function
__builtin_frame_address.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 242581)
+++ gcc/go/go-gcc.cc(working copy)
@@ -828,6 +828,15 @@ Gcc_backend::Gcc_backend()
   this->define_builtin(BUILT_IN_FRAME_ADDRESS, "__builtin_frame_address",
   NULL, t, false, false);
 
+  // The runtime calls __builtin_extract_return_addr when recording
+  // the address to which a function returns.
+  this->define_builtin(BUILT_IN_EXTRACT_RETURN_ADDR,
+  "__builtin_extract_return_addr", NULL,
+  build_function_type_list(ptr_type_node,
+   ptr_type_node,
+   NULL_TREE),
+  false, false);
+
   // The compiler uses __builtin_trap for some exception handling
   // cases.
   this->define_builtin(BUILT_IN_TRAP, "__builtin_trap", NULL,
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242600)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-bf4762823c4543229867436399be3ae30b4d13bb
+7593cc83a03999331c5e2dc65a9306c5fe57dfd0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/backend.h
===
--- gcc/go/gofrontend/backend.h (revision 242581)
+++ gcc/go/gofrontend/backend.h (working copy)
@@ -707,7 +707,7 @@ class Backend
   // Create a statement that runs all deferred calls for FUNCTION.  This should
   // be a statement that looks like this in C++:
   //   finish:
-  // try { UNDEFER; } catch { CHECK_DEFER; goto finish; }
+  // try { DEFER_RETURN; } catch { CHECK_DEFER; goto finish; }
   virtual Bstatement*
   function_defer_statement(Bfunction* function, Bexpression* undefer,
Bexpression* check_defer, Location) = 0;
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 242581)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -280,7 +280,7 @@ Node::op_format() const
{
  switch (e->func_expression()->runtime_code())
{
-   case Runtime::PANIC:
+   case Runtime::GOPANIC:
  op << "panic";
  break;
 
@@ -300,11 +300,11 @@ Node::op_format() const
  op << "make";
  break;
 
-   case Runtime::DEFER:
+   case Runtime::DEFERPROC:
  op << "defer";
  break;
 
-   case Runtime::RECOVER:
+   case Runtime::GORECOVER:
  op << "recover";
  break;
 
@@ -1189,7 +1189,7 @@ Escape_analysis_assign::expression(Expre
  {
switch (fe->runtime_code())
  {
- case Runtime::PANIC:
+ case Runtime::GOPANIC:
{
  // Argument could leak through recover.
  Node* panic_arg = Node::make_node(call->args()->front());
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 242581)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -8951,7 +8951,7 @@ Builtin_call_expression::do_get_backend(
 arg = Expression::convert_for_assignment(gogo, empty, arg, location);
 
 Expression* panic =
-Runtime::make_call(Runtime::PANIC, location, 1, arg);
+Runtime::make_call(Runtime::GOPANIC, location, 1, arg);
 return panic->get_backend(context);
   }
 
@@ -8972,8 +8972,8 @@ Builtin_call_expression::do_get_backend(
// because it changes whether it can recover a panic or not.
// See test7 in test/recover1.go.
 Expression* recover = Runtime::make_call((this->is_deferred()
-  ? Runtime::DEFERRED_RECOVER
- 

Re: gomp-nvptx branch - middle-end changes

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 08:25:45PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > Ok for trunk, once the needed corresponding config/nvptx bits are committed,
> > with one nit below that needs immediate action and the rest can be resolved
> > incrementally.  I'd like to check in afterwards the attached patch, at least
> > for now, so that non-offloaded SIMD code is less affected.
> 
> Testing your patch revealed an issue in Fortran offloaded code; types of
> boolean_type_node in f951 and boolean_false_node in lto1 (when 
> omp_device_lower
> runs) don't match.  I'm attaching a revised patch that addresses it by simply
> using an integer type (there are also two other minor issues, below).

Ok.

> > Please change this into
> > (ENABLE_OFFLOADING && (flag_openmp || in_lto))
> > for now, so that we don't waste compile time even when clearly it
> > isn't needed, and incrementally change the inliner to propagate
> > the property.
> 
> As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
> needs to accept ACCEL_COMPILER.  Applied like this:
> 
> +  virtual bool gate (function *ARG_UNUSED (fun))
> +{
> +  /* FIXME: this should use PROP_gimple_lomp_dev.  */
> +#ifdef ACCEL_COMPILER
> +  return true;
> +#else
> +  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
> +#endif
> +}

Makes sense.

> > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v
> >if (max_vf == 0)
> >  {
> >max_vf = omp_max_vf ();
> > +  if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> > +  OMP_CLAUSE__SIMT_))
> > +   {
> > + int max_simt = omp_max_simt_vf ();
> > + max_vf = MAX (max_vf, max_simt);
> > +   }
> 
> I don't believe here there's a need to take a maximum.  Cloning the loop 
> upfront
> means that SIMD+SIMT styles are not going to mix within a single loop.  I've
> simplified it to an if-then-else in the revised patch.

Ok.

> > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi
> >bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
> >for (struct omp_region *rgn = region; !offloaded && rgn; rgn = 
> > rgn->outer)
> >  offloaded = rgn->type == GIMPLE_OMP_TARGET;
> > -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
> > +  bool is_simt
> > += (offloaded
> > +   && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
> > +  OMP_CLAUSE__SIMT_)
> > +   && safelen_int > 1);
> 
> Here computation of 'offloaded' is no longer needed, because presence of
> OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.
> 
> I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" 
> lowering.
> Not sure if that's intentional -- as I understand it's possible to look at the
> enclosing context's clauses because 'omp ordered' must be closely nested with

Right now omp ordered simd for non-simt basically causes vf 1, because the
vectorizer isn't ready for having non-vectorized portions of code within
vectorized loop.

> the corresponding loop.  I've added a FIXME in the patch.

Ok for trunk, thanks.

Jakub


Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-22 Thread Alexander Monakov
On Tue, 15 Nov 2016, Alexander Monakov wrote:
> On Tue, 15 Nov 2016, Alexander Monakov wrote:
> > Yep, I do see new test execution failures with both Intel MIC and PTX 
> > offloading
> > on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested 
> > patch
> > for the first two (on target2.f90 there's a different problem).
> 
> And here's a patch for target2.f90.  I don't have a perfect understanding of
> mapping clauses, but the test appears to need to explicitly map pointer
> variables, at a minimum.  Also, 'map (from: r)' is missing on the last target
> region.
> 
>   * testsuite/libgomp.fortran/target2.f90 (foo): Add mapping clauses to
>   target construct.

Ping.

> diff --git a/libgomp/testsuite/libgomp.fortran/target2.f90 
> b/libgomp/testsuite/libgomp.fortran/target2.f90
> index 42f704f..7119774 100644
> --- a/libgomp/testsuite/libgomp.fortran/target2.f90
> +++ b/libgomp/testsuite/libgomp.fortran/target2.f90
> @@ -63,7 +63,7 @@ contains
>r = r .or. (any (k(5:n-5) /= 17)) .or. (lbound (k, 1) /= 4) .or. 
> (ubound (k, 1) /= n)
>  !$omp end target
>  if (r) call abort
> -!$omp target map (to: d(2:n+1), n)
> +!$omp target map (to: d(2:n+1), f, j) map (from: r)
>r = a /= 7
>r = r .or. (any (b /= 8)) .or. (lbound (b, 1) /= 3) .or. (ubound (b, 
> 1) /= n)
>r = r .or. (any (c /= 9)) .or. (lbound (c, 1) /= 5) .or. (ubound (c, 
> 1) /= n + 4)
> 
> 


Re: gomp-nvptx branch - middle-end changes

2016-11-22 Thread Alexander Monakov
On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> Ok for trunk, once the needed corresponding config/nvptx bits are committed,
> with one nit below that needs immediate action and the rest can be resolved
> incrementally.  I'd like to check in afterwards the attached patch, at least
> for now, so that non-offloaded SIMD code is less affected.

Testing your patch revealed an issue in Fortran offloaded code; types of
boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower
runs) don't match.  I'm attaching a revised patch that addresses it by simply
using an integer type (there are also two other minor issues, below).

> Please change this into
> (ENABLE_OFFLOADING && (flag_openmp || in_lto))
> for now, so that we don't waste compile time even when clearly it
> isn't needed, and incrementally change the inliner to propagate
> the property.

As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
needs to accept ACCEL_COMPILER.  Applied like this:

+  virtual bool gate (function *ARG_UNUSED (fun))
+{
+  /* FIXME: this should use PROP_gimple_lomp_dev.  */
+#ifdef ACCEL_COMPILER
+  return true;
+#else
+  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
+#endif
+}


In your GOMP_USE_SIMT() patch,

> @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v
>if (max_vf == 0)
>  {
>max_vf = omp_max_vf ();
> +  if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> +OMP_CLAUSE__SIMT_))
> + {
> +   int max_simt = omp_max_simt_vf ();
> +   max_vf = MAX (max_vf, max_simt);
> + }

I don't believe here there's a need to take a maximum.  Cloning the loop upfront
means that SIMD+SIMT styles are not going to mix within a single loop.  I've
simplified it to an if-then-else in the revised patch.

> @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi
>bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
>for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)
>  offloaded = rgn->type == GIMPLE_OMP_TARGET;
> -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
> +  bool is_simt
> += (offloaded
> +   && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
> +OMP_CLAUSE__SIMT_)
> +   && safelen_int > 1);

Here computation of 'offloaded' is no longer needed, because presence of
OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.

I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering.
Not sure if that's intentional -- as I understand it's possible to look at the
enclosing context's clauses because 'omp ordered' must be closely nested with
the corresponding loop.  I've added a FIXME in the patch.

Alexander	* internal-fn.c (expand_GOMP_USE_SIMT): New function.
	* tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands.
	(omp_clause_code_name): Add _simt_ name.
	(walk_tree_1): Handle OMP_CLAUSE__SIMT_.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_.
	(scan_omp_simd): New function.
	(scan_omp_1_stmt): Use it in target regions if needed.
	(omp_max_vf): Don't max with omp_max_simt_vf.
	(lower_rec_simd_input_clauses): Use omp_max_simt_vf if
	OMP_CLAUSE__SIMT_ is present.
	(lower_rec_input_clauses): Compute maybe_simt from presence of
	OMP_CLAUSE__SIMT_.
	(lower_lastprivate_clauses): Likewise.
	(expand_omp_simd): Likewise.
	(execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT.
	* internal-fn.def (GOMP_USE_SIMT): New internal function.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 6cd8522..b1dbc98 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -158,6 +158,14 @@ expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_USE_SIMT (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
without SIMT execution this should be expanded in omp_device_lower pass.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index f055230..9a03e17 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -141,6 +141,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs, unary)
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6c52bff..eab0af5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -278,6 

Re: [PATCH] Delete GCJ

2016-11-22 Thread Sandra Loosemore

On 11/21/2016 04:23 PM, Matthias Klose wrote:

On 21.11.2016 18:16, Rainer Orth wrote:

Hi Matthias,


ahh, didn't see that :-/ Now fixed, is this clearer now?

The options @option{--with-target-bdw-gc-include} and
@option{--with-target-bdw-gc-lib} must always specified together for

^ be


thanks to all sorting out the documentation issues. Now attaching the updated
diff. Ok to commit?


The documentation part is OK now.

-Sandra



Re: [PATCH] Fix PR78230

2016-11-22 Thread Jeff Law

On 11/08/2016 07:43 PM, Kito Cheng wrote:

gcc/testsuite/ChangeLog:

2016-11-09  Kito Cheng 

PR target/78230
* gcc.dg/torture/pr66178.c (test): Use uintptr_t instead of int.
(test2) Ditto.

OK.
jeff


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-22 Thread Jeff Law

On 11/08/2016 05:09 PM, Martin Sebor wrote:

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically 
allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.
Sorry.  Just not getting to many of the pre-stage1 close patches as fast 
as I'd like.


My only real concern here is that if we call compute_builtin_object_size 
without having initialized the passes, then we initialize, compute, then 
finalize.  Subsequent calls will go through the same process -- the key 
being each one re-computes the internal state which might get expensive.


Wouldn't it just make more sense to pull up the init/fini calls, either 
explicitly (which likely means externalizing the init/fini routines) or 
by wrapping all this stuff in a class and instantiating a suitable object?


I think the answer to your memory management question is that internal 
state is likely not marked as a GC root and thus if you get a GC call 
pieces of internal state are not seen as reachable, but you still can 
reference them.  ie, you end up with dangling pointers.


Usually all you'd have to do is mark them so that gengtype will see 
them.  Bitmaps, trees, rtl, are all good examples.  So marking the 
bitmap would look like:


static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where 
someone has an object expected to be live across passes, but it isn't 
reachable because someone failed to register a GC root.


Jeff


Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Uros Bizjak
On Tue, Nov 22, 2016 at 4:54 PM, Marc Glisse  wrote:
> On Tue, 22 Nov 2016, Uros Bizjak wrote:
>
>> New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
>> different way. While older makes pass only "-j", newer makes pass e.g.
>> "-j4" when -j is specified on the command line. The detection of "-j"
>> make argument doesn't work in the later case.
>>
>> Attached patch reworks this functionality to detect -j correctly in all
>> cases.
>
>
> Hello,
>
> I didn't read the patch, but do you think this also fixes PR 53155 ?

 Looking at the PR, I don't think so - but I did test my patch with
CentOS 5.11 (with make 3.81) and detection worked there without
problems.

Maybe MAKEFLAGS should be used instead of MFLAGS, since docs mentions
that MFLAGS is intended for historical compatibility?

[1] 
https://www.gnu.org/software/make/manual/html_node/Options_002fRecursion.html

Uros.


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-22 Thread Kyrill Tkachov


On 18/11/16 12:50, Segher Boessenkool wrote:

On Fri, Nov 18, 2016 at 09:29:13AM +, Kyrill Tkachov wrote:

So your COMPONENTS_FOR_BB returns both components in a pair whenever one
of those is needed?  That should work afaics.

I mean I still want to have one component per register and since
emit_{prologue,epilogue}_components knows how to form pairs from the
components passed down to it I just need to restrict the number of
components in any particular basic block to an even number.
So say a function can wrap 5 registers: x22,x23,x24,x25,x26.
I want get_separate_components to return 5 components since in that hook
we don't know how these registers are distributed across each basic block.
components_for_bb has that information.
In components_for_bb I want to restrict the components for a basic block to
an even number, so if normally all 5 registers would be valid for wrapping
in that bb I'd only choose 4 so I could form 2 pairs. But selecting only 4
of the 5 registers, say only x22,x23,x24,x25 leads to x26 not being saved
or restored at all, even during the normal prologue and epilogue because
x26 was marked as a component in components_for_bb and therefore omitted
from
the prologue and epilogue.
So I'm thinking x26 should be removed from the wrappable components of
a basic block by disqualify_components. I'm trying that approach now.

My suggestion was, in components_for_bb, whenever you mark x22 as needed
you also mark x23 as needed, and whenever you mark x23 as needed you also
mark x22.  I think this is a lot simpler?


But then we'd have cases where we're saving and restoring x23
even when it's not necessary.
In any case, I tried it out and it didn't fix the gobmk issue, though it did 
reduce the code
size increase somewhat.

With the patch already posted at [1] the net result is still positive on
both SPECINT and SPECFP.

I also ran the numbers on a Cortex-A57. The changes are less pronounced
with SPECINT being neutral (gobmk shows only a 0.8% regression) and SPECFP
having a small improvement, due to povray improving by 2.9%.

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01352.html



Segher




Re: [PR target/78213] Do not ICE on non-empty -fself-test

2016-11-22 Thread Bernd Schmidt

On 11/16/2016 11:45 AM, Aldy Hernandez wrote:


I would prefer Jakub's suggestion of running in finish_options().


I suspect we'll want both. Selftests should really run in an environment 
that's as close as possible to what would normally be going on in the 
compiler.



I assume there are other places throughout the self-tests that depend on
NOT continuing the compilation process, and I'd hate to plug each one.

Would the attached patch be acceptable to both of you?


Good enough for now.


Bernd


Re: [PATCH][ARM] PR target/78439: Update movdi constraints for Cortex-A8 tuning to handle LDRD/STRD

2016-11-22 Thread Ramana Radhakrishnan
On Tue, Nov 22, 2016 at 9:57 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This PR is an ICE while bootstrapping GCC with Cortex-A8 tuning, which we
> also get from the default ARMv7-A tuning.
> The ldrd/strd peepholes were recently made more aggressive and in this case
> they transform:
> (insn 13 33 40 2 (set (mem/c:SI (plus:SI (reg/f:SI 11 fp)
> (const_int -28 [0xffe4])) [3 d.num_comps+0 S4
> A64])
> (reg:SI 12 ip [orig:117 _20 ] [117])) "cp-demangle.c":32 632
> {*arm_movsi_vfp}
>  (expr_list:REG_DEAD (reg:SI 12 ip [orig:117 _20 ] [117])
> (nil)))
> (insn 40 13 39 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 11 fp)
> (const_int -24 [0xffe8])) [2 d.subs+0 S4 A32])
> (reg/f:SI 13 sp)) "cp-demangle.c":51 632 {*arm_movsi_vfp}
>  (nil))
>
> into:
> (insn 68 33 39 2 (set (mem/c:DI (plus:SI (reg/f:SI 11 fp)
> (const_int -28 [0xffe4])) [3 d.num_comps+0 S8
> A64])
> (reg:DI 12 ip)) "cp-demangle.c":51 -1
>  (nil))
>
> This is okay, but the *movdi_vfp_cortexa8 pattern doesn't deal with the IP
> being the source
> of the store. The reason is that when the LDRD/STRD peepholes and machinery
> was introduced back in r197530
> it created the 'q' constraint which should be used for the register operands
> of the DImode stores and loads
> ('q' means CORE_REGS when LDRD/STRD is enabled in ARM mode and GENERAL_REGS
> otherwise). That revision
> updated the movdi_vfp pattern to use it in alternatives 4,5,6 but neglected
> to udpate the Cortex-A8-specific
> pattern. This is a sign that we should perhaps get rid of this special-cased
> pattern at some point, but for now

I would expect any patch that does this "i.e. remove the pattern" to
be tested to see the impact of the difference in constraints. AFAIR
the pattern was added to distinguish between  Neon for DImode
operations and non-Neon for DImode variations many moons ago. So
please do the archeology and measurements ( look at output from crafty
for a variety of options and a variety of cores before cleaning all
this up).

Ramana


> this simple patch updates the appropriate alternatives to use the 'q'
> constraint so that output_move_double
> can output the correct LDRD/STRD instruction.
>
> Bootstrapped on arm-none-linux-gnueabihf with --with-arch=armv7-a that
> exercises this code (bootstrap currently fails
> without this patch) and tested with /-mtune=cortex-a8.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2016-11-22  Kyrylo Tkachov  
>
> PR target/78439
> * config/arm/vfp.md (*movdi_vfp_cortexa8): Use 'q' constraints for the
> register operand in alternatives 4,5,6.
>
> 2016-11-22  Kyrylo Tkachov  
>
> PR target/78439
> * gcc.c-torture/compile/pr78439.c: New test.


Re: [PATCH] Add avx5124fmaps,avx5124vnniw to sse-22.c target pragma (PR target/78451)

2016-11-22 Thread Uros Bizjak
On Tue, Nov 22, 2016 at 5:12 PM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, these 2 ISAs were added to just the first of the two
> Intel specific target pragmas (the first one is used in the sse-22.c test
> itself, the second one when it is included from sse-22a.c).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-11-22  Jakub Jelinek  
>
> PR target/78451
> * gcc.target/i386/sse-22.c: Add avx5124fmaps,avx5124vnniw to
> GCC target pragma before including immintrin.h.

OK.

Thanks,
Uros.


Re: [PATCH] Replace _mm_setzero_[hd]i with _mm_setzero_si128 (PR target/78451)

2016-11-22 Thread Uros Bizjak
On Tue, Nov 22, 2016 at 5:09 PM, Jakub Jelinek  wrote:
> Hi!
>
> _mm_setzero_di is problematic, because it is outside of AVX512* guarded
> area, but it actually requires SSE2 which might not be enabled.
> As discussed in the PR, I don't see neither _mm_setzero_[dh]i routines
> in ICC headers nor in AVX/AVX512 manuals, and fail to see what the
> difference is between those and the standard _mm_setzero_si128.
> All these functions return __m128i containing all zeros, how exactly
> it is constructed should be irrelevant after folding during gimplification
> (all 3 routines gimplify to the same return stmt, __m128i is
> typedef long long __m128i __attribute__ ((__vector_size__ (16), 
> __may_alias__));
> and therefore all 3 are return (__m128i) { 0, 0 }, it doesn't matter
> how those 0s were constructed).
>
> This patch removes those two routines, uses _mm_setzero_si128 instead,
> and I've also done some limited formatting fixes (mainly I tried to
> fix up calls with no space before ( ).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note that there is still _mm512_setzero_qi and _mm512_setzero_hi,
> shall those be replaced with _mm512_setzero_si512 too?
> Even those 2 aren't mentioned in ICC headers nor AVX512 manuals.

Yes, please also remove these two.

Patch to replace them with _mm512_setzero_si512 is pre-approved.

> 2016-11-22  Jakub Jelinek  
>
> PR target/78451
> * config/i386/avx512vlintrin.h (_mm_setzero_di): Removed.
> (_mm_maskz_mov_epi64): Use _mm_setzero_si128 instead of
> _mm_setzero_di.
> (_mm_maskz_load_epi64): Likewise.
> (_mm_setzero_hi): Removed.
> (_mm_maskz_loadu_epi64): Use _mm_setzero_si128 instead of
> _mm_setzero_di.
> (_mm_abs_epi64, _mm_maskz_abs_epi64, _mm_maskz_srl_epi64,
> _mm_maskz_unpackhi_epi64, _mm_maskz_unpacklo_epi64,
> _mm_maskz_compress_epi64, _mm_srav_epi64, _mm_maskz_srav_epi64,
> _mm_maskz_sllv_epi64, _mm_maskz_srlv_epi64, _mm_rolv_epi64,
> _mm_maskz_rolv_epi64, _mm_rorv_epi64, _mm_maskz_rorv_epi64,
> _mm_min_epi64, _mm_max_epi64, _mm_max_epu64, _mm_min_epu64,
> _mm_lzcnt_epi64, _mm_maskz_lzcnt_epi64, _mm_conflict_epi64,
> _mm_maskz_conflict_epi64, _mm_sra_epi64, _mm_maskz_sra_epi64,
> _mm_maskz_sll_epi64, _mm_rol_epi64, _mm_maskz_rol_epi64,
> _mm_ror_epi64, _mm_maskz_ror_epi64, _mm_alignr_epi64,
> _mm_maskz_alignr_epi64, _mm_srai_epi64, _mm_maskz_slli_epi64):
> Likewise.
> (_mm_cvtepi32_epi8, _mm256_cvtepi32_epi8, _mm_cvtsepi32_epi8,
> _mm256_cvtsepi32_epi8, _mm_cvtusepi32_epi8, _mm256_cvtusepi32_epi8,
> _mm_cvtepi32_epi16, _mm256_cvtepi32_epi16, _mm_cvtsepi32_epi16,
> _mm256_cvtsepi32_epi16, _mm_cvtusepi32_epi16, _mm256_cvtusepi32_epi16,
> _mm_cvtepi64_epi8, _mm256_cvtepi64_epi8, _mm_cvtsepi64_epi8,
> _mm256_cvtsepi64_epi8, _mm_cvtusepi64_epi8, _mm256_cvtusepi64_epi8,
> _mm_cvtepi64_epi16, _mm256_cvtepi64_epi16, _mm_cvtsepi64_epi16,
> _mm256_cvtsepi64_epi16, _mm_cvtusepi64_epi16, _mm256_cvtusepi64_epi16,
> _mm_cvtepi64_epi32, _mm256_cvtepi64_epi32, _mm_cvtsepi64_epi32,
> _mm256_cvtsepi64_epi32, _mm_cvtusepi64_epi32, _mm256_cvtusepi64_epi32,
> _mm_maskz_set1_epi32, _mm_maskz_set1_epi64): Formatting fixes.
> (_mm_maskz_cvtps_ph, _mm256_maskz_cvtps_ph): Use _mm_setzero_si128
> instead of _mm_setzero_hi.
> (_mm256_permutex_pd, _mm256_maskz_permutex_epi64, _mm256_insertf32x4,
> _mm256_maskz_insertf32x4, _mm256_inserti32x4, 
> _mm256_maskz_inserti32x4,
> _mm256_extractf32x4_ps, _mm256_maskz_extractf32x4_ps,
> _mm256_shuffle_i32x4, _mm256_maskz_shuffle_i32x4, 
> _mm256_shuffle_f64x2,
> _mm256_maskz_shuffle_f64x2, _mm256_shuffle_f32x4,
> _mm256_maskz_shuffle_f32x4, _mm256_maskz_shuffle_pd,
> _mm_maskz_shuffle_pd, _mm256_maskz_shuffle_ps, _mm_maskz_shuffle_ps,
> _mm256_maskz_srli_epi32, _mm_maskz_srli_epi32, _mm_maskz_srli_epi64,
> _mm256_mask_slli_epi32, _mm256_maskz_slli_epi32, 
> _mm256_mask_slli_epi64,
> _mm256_maskz_slli_epi64, _mm256_roundscale_ps,
> _mm256_maskz_roundscale_ps, _mm256_roundscale_pd,
> _mm256_maskz_roundscale_pd, _mm_roundscale_ps, 
> _mm_maskz_roundscale_ps,
> _mm_roundscale_pd, _mm_maskz_roundscale_pd, _mm256_getmant_ps,
> _mm256_maskz_getmant_ps, _mm_getmant_ps, _mm_maskz_getmant_ps,
> _mm256_getmant_pd, _mm256_maskz_getmant_pd, _mm_getmant_pd,
> _mm_maskz_getmant_pd, _mm256_maskz_shuffle_epi32,
> _mm_maskz_shuffle_epi32, _mm256_rol_epi32, _mm256_maskz_rol_epi32,
> _mm_rol_epi32, _mm_maskz_rol_epi32, _mm256_ror_epi32,
> _mm256_maskz_ror_epi32, _mm_ror_epi32, _mm_maskz_ror_epi32,
> _mm_maskz_alignr_epi32, _mm_maskz_alignr_epi64,
> _mm256_maskz_srai_epi32, 

[PATCH] PR78465 Remove runtime tests for macros

2016-11-22 Thread Jonathan Wakely

Andrew MacLeod did some digging and foudn that this test was changed
from using #if to using a runtime if and abort() because the LOCK_FREE
macros resolved to runtime calls at one point. However, they later got
changed to predefined macros, and so can be changed back to using #if.

This should fix the regression on Solaris, where the mismatched
abort() declaration causes it to FAIL.

PR libstdc++/78465
* testsuite/29_atomics/headers/atomic/macros.cc: Replace runtime tests
with preprocessor conditions.

Tested x86_64-linux, committed to trunk.

commit e6dcd511c9d9641e15f637cf1337149abf97c1e4
Author: Jonathan Wakely 
Date:   Tue Nov 22 16:17:52 2016 +

PR78465 Remove runtime tests for  macros

PR libstdc++/78465
* testsuite/29_atomics/headers/atomic/macros.cc: Replace runtime tests
with preprocessor conditions.

diff --git a/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc 
b/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
index 9ef8c78..4cb3e1a 100644
--- a/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
+++ b/libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++11 } }
+// { dg-do preprocess { target c++11 } }
 
 // Copyright (C) 2008-2016 Free Software Foundation, Inc.
 //
@@ -21,42 +21,61 @@
 
 #ifndef ATOMIC_BOOL_LOCK_FREE 
 # error "ATOMIC_BOOL_LOCK_FREE must be a macro"
+#elif ATOMIC_BOOL_LOCK_FREE != 1 && ATOMIC_BOOL_LOCK_FREE != 2
+# error "ATOMIC_BOOL_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_CHAR_LOCK_FREE 
 # error "ATOMIC_CHAR_LOCK_FREE must be a macro"
+#elif ATOMIC_CHAR_LOCK_FREE != 1 && ATOMIC_CHAR_LOCK_FREE != 2
+# error "ATOMIC_CHAR_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_CHAR16_T_LOCK_FREE 
 # error "ATOMIC_CHAR16_T_LOCK_FREE must be a macro"
+#elif ATOMIC_CHAR16_T_LOCK_FREE != 1 && ATOMIC_CHAR16_T_LOCK_FREE != 2
 #endif
 
 #ifndef ATOMIC_CHAR32_T_LOCK_FREE 
 # error "ATOMIC_CHAR32_T_LOCK_FREE must be a macro"
+#elif ATOMIC_CHAR32_T_LOCK_FREE != 1 && ATOMIC_CHAR32_T_LOCK_FREE != 2
+# error "ATOMIC_CHAR32_T_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_WCHAR_T_LOCK_FREE 
 # error "ATOMIC_WCHAR_T_LOCK_FREE must be a macro"
+#elif ATOMIC_WCHAR_T_LOCK_FREE != 1 && ATOMIC_WCHAR_T_LOCK_FREE != 2
+# error "ATOMIC_WCHAR_T_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_SHORT_LOCK_FREE 
 # error "ATOMIC_SHORT_LOCK_FREE must be a macro"
+#elif ATOMIC_SHORT_LOCK_FREE != 1 && ATOMIC_SHORT_LOCK_FREE != 2
+# error "ATOMIC_SHORT_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_INT_LOCK_FREE 
 # error "ATOMIC_INT_LOCK_FREE must be a macro"
+#elif ATOMIC_INT_LOCK_FREE != 1 && ATOMIC_INT_LOCK_FREE != 2
+# error "ATOMIC_INT_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_LONG_LOCK_FREE 
 # error "ATOMIC_LONG_LOCK_FREE must be a macro"
+#elif ATOMIC_LONG_LOCK_FREE != 1 && ATOMIC_LONG_LOCK_FREE != 2
+# error "ATOMIC_LONG_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_LLONG_LOCK_FREE 
 # error "ATOMIC_LLONG_LOCK_FREE must be a macro"
+#elif ATOMIC_LLONG_LOCK_FREE != 1 && ATOMIC_LLONG_LOCK_FREE != 2
+# error "ATOMIC_LLONG_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_POINTER_LOCK_FREE 
 # error "ATOMIC_POINTER_LOCK_FREE must be a macro"
+#elif ATOMIC_POINTER_LOCK_FREE != 1 && ATOMIC_POINTER_LOCK_FREE != 2
+# error "ATOMIC_POINTER_LOCK_FREE must be 1 or 2"
 #endif
 
 #ifndef ATOMIC_FLAG_INIT
@@ -66,49 +85,3 @@
 #ifndef ATOMIC_VAR_INIT
 #error "ATOMIC_VAR_INIT_must_be_a_macro"
 #endif
-
-
-extern void abort(void);
-
-int main ()
-{
-#if (ATOMIC_BOOL_LOCK_FREE != 1 && ATOMIC_BOOL_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_CHAR_LOCK_FREE != 1 && ATOMIC_CHAR_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_CHAR16_T_LOCK_FREE != 1 && ATOMIC_CHAR16_T_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_CHAR32_T_LOCK_FREE != 1 && ATOMIC_CHAR32_T_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_WCHAR_T_LOCK_FREE != 1 && ATOMIC_WCHAR_T_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_SHORT_LOCK_FREE != 1 && ATOMIC_SHORT_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_INT_LOCK_FREE != 1 && ATOMIC_INT_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_LONG_LOCK_FREE != 1 && ATOMIC_LONG_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_LLONG_LOCK_FREE != 1 && ATOMIC_LLONG_LOCK_FREE != 2)
-   abort ();
-#endif
-
-#if (ATOMIC_POINTER_LOCK_FREE != 1 && ATOMIC_POINTER_LOCK_FREE != 2)
-   abort ();
-#endif
-}


Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Jonathan Wakely

On 22/11/16 16:54 +0100, Marc Glisse wrote:

On Tue, 22 Nov 2016, Uros Bizjak wrote:


New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.


Hello,

I didn't read the patch, but do you think this also fixes PR 53155 ?


No, probably not, as it only changes the "-j N" case, not the "-j"
case in your PR, which doesn't match because the -j gets combined with
other make flags.



[PATCH] Add avx5124fmaps,avx5124vnniw to sse-22.c target pragma (PR target/78451)

2016-11-22 Thread Jakub Jelinek
Hi!

As mentioned in the PR, these 2 ISAs were added to just the first of the two
Intel specific target pragmas (the first one is used in the sse-22.c test
itself, the second one when it is included from sse-22a.c).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-22  Jakub Jelinek  

PR target/78451
* gcc.target/i386/sse-22.c: Add avx5124fmaps,avx5124vnniw to
GCC target pragma before including immintrin.h.

--- gcc/testsuite/gcc.target/i386/sse-22.c.jj   2016-11-18 20:04:24.0 
+0100
+++ gcc/testsuite/gcc.target/i386/sse-22.c  2016-11-22 12:31:43.721234017 
+0100
@@ -218,7 +218,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int,
 
 /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */
 #ifdef DIFFERENT_PRAGMAS
-#pragma GCC target 
("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi")
+#pragma GCC target 
("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx5124fmaps,avx5124vnniw")
 #endif
 #include 
 test_1 (_cvtss_sh, unsigned short, float, 1)


Jakub


Re: PR78153

2016-11-22 Thread Prathamesh Kulkarni
On 22 November 2016 at 20:53, Richard Biener  wrote:
> On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 22 November 2016 at 20:18, Richard Biener  wrote:
>> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 21 November 2016 at 15:10, Richard Biener  wrote:
>> >> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> Hi,
>> >> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
>> >> >> PTRDIFF_MAX.
>> >> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
>> >> >> in the attached patch.
>> >> >>
>> >> >> However it regressed strlenopt-3.c:
>> >> >>
>> >> >> Consider fn1() from strlenopt-3.c:
>> >> >>
>> >> >> __attribute__((noinline, noclone)) size_t
>> >> >> fn1 (char *p, char *q)
>> >> >> {
>> >> >>   size_t s = strlen (q);
>> >> >>   strcpy (p, q);
>> >> >>   return s - strlen (p);
>> >> >> }
>> >> >>
>> >> >> The optimized dump shows the following:
>> >> >>
>> >> >> __attribute__((noclone, noinline))
>> >> >> fn1 (char * p, char * q)
>> >> >> {
>> >> >>   size_t s;
>> >> >>   size_t _7;
>> >> >>   long unsigned int _9;
>> >> >>
>> >> >>   :
>> >> >>   s_4 = strlen (q_3(D));
>> >> >>   _9 = s_4 + 1;
>> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >>   _7 = 0;
>> >> >>   return _7;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> which introduces the regression, because the test expects "return 0;" 
>> >> >> in fn1().
>> >> >>
>> >> >> The issue seems to be in vrp2:
>> >> >>
>> >> >> Before the patch:
>> >> >> Visiting statement:
>> >> >> s_4 = strlen (q_3(D));
>> >> >> Found new range for s_4: VARYING
>> >> >>
>> >> >> Visiting statement:
>> >> >> _1 = s_4;
>> >> >> Found new range for _1: [s_4, s_4]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> Visiting statement:
>> >> >> _7 = s_4 - _1;
>> >> >> Applying pattern match.pd:111, gimple-match.c:27997
>> >> >> Match-and-simplified s_4 - _1 to 0
>> >> >> Intersecting
>> >> >>   [0, 0]
>> >> >> and
>> >> >>   [0, +INF]
>> >> >> to
>> >> >>   [0, 0]
>> >> >> Found new range for _7: [0, 0]
>> >> >>
>> >> >> __attribute__((noclone, noinline))
>> >> >> fn1 (char * p, char * q)
>> >> >> {
>> >> >>   size_t s;
>> >> >>   long unsigned int _1;
>> >> >>   long unsigned int _9;
>> >> >>
>> >> >>   :
>> >> >>   s_4 = strlen (q_3(D));
>> >> >>   _9 = s_4 + 1;
>> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >>   _1 = s_4;
>> >> >>   return 0;
>> >> >>
>> >> >> }
>> >> >>
>> >> >>
>> >> >> After the patch:
>> >> >> Visiting statement:
>> >> >> s_4 = strlen (q_3(D));
>> >> >> Intersecting
>> >> >>   [0, 9223372036854775806]
>> >> >> and
>> >> >>   [0, 9223372036854775806]
>> >> >> to
>> >> >>   [0, 9223372036854775806]
>> >> >> Found new range for s_4: [0, 9223372036854775806]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> Visiting statement:
>> >> >> _1 = s_4;
>> >> >> Intersecting
>> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> >> and
>> >> >>   [0, 9223372036854775806]
>> >> >> to
>> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> >> Found new range for _1: [0, 9223372036854775806]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> Visiting statement:
>> >> >> _7 = s_4 - _1;
>> >> >> Intersecting
>> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> and
>> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> to
>> >> >>   ~[9223372036854775807, 9223372036854775809]
>> >> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
>> >> >> marking stmt to be not simulated again
>> >> >>
>> >> >> __attribute__((noclone, noinline))
>> >> >> fn1 (char * p, char * q)
>> >> >> {
>> >> >>   size_t s;
>> >> >>   long unsigned int _1;
>> >> >>   size_t _7;
>> >> >>   long unsigned int _9;
>> >> >>
>> >> >>   :
>> >> >>   s_4 = strlen (q_3(D));
>> >> >>   _9 = s_4 + 1;
>> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >> >>   _1 = s_4;
>> >> >>   _7 = s_4 - _1;
>> >> >>   return _7;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> Then forwprop4 turns
>> >> >> _1 = s_4
>> >> >> _7 = s_4 - _1
>> >> >> into
>> >> >> _7 = 0
>> >> >>
>> >> >> and we end up with:
>> >> >> _7 = 0
>> >> >> return _7
>> >> >> in optimized dump.
>> >> >>
>> >> >> Running ccp again after forwprop4 trivially solves the issue, however
>> >> >> I am not sure if we want to run ccp again ?
>> >> >>
>> >> >> The issue is probably with extract_range_from_ssa_name():
>> >> >> For _1 = s_4
>> >> >>
>> >> >> Before patch:
>> >> >> VR for s_4 is set to varying.
>> >> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
>> >> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal 
>> >> >> to s_4,
>> >> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
>> >> >> match.pd pattern x - x -> 0).
>> >> >>
>> >> >> After patch:
>> >> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
>> >> >> And correspondingly VR 

[PATCH] Replace _mm_setzero_[hd]i with _mm_setzero_si128 (PR target/78451)

2016-11-22 Thread Jakub Jelinek
Hi!

_mm_setzero_di is problematic, because it is outside of AVX512* guarded
area, but it actually requires SSE2 which might not be enabled.
As discussed in the PR, I don't see neither _mm_setzero_[dh]i routines
in ICC headers nor in AVX/AVX512 manuals, and fail to see what the
difference is between those and the standard _mm_setzero_si128.
All these functions return __m128i containing all zeros, how exactly
it is constructed should be irrelevant after folding during gimplification
(all 3 routines gimplify to the same return stmt, __m128i is
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
and therefore all 3 are return (__m128i) { 0, 0 }, it doesn't matter
how those 0s were constructed).

This patch removes those two routines, uses _mm_setzero_si128 instead,
and I've also done some limited formatting fixes (mainly I tried to
fix up calls with no space before ( ).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note that there is still _mm512_setzero_qi and _mm512_setzero_hi,
shall those be replaced with _mm512_setzero_si512 too?
Even those 2 aren't mentioned in ICC headers nor AVX512 manuals.

2016-11-22  Jakub Jelinek  

PR target/78451
* config/i386/avx512vlintrin.h (_mm_setzero_di): Removed.
(_mm_maskz_mov_epi64): Use _mm_setzero_si128 instead of
_mm_setzero_di.
(_mm_maskz_load_epi64): Likewise.
(_mm_setzero_hi): Removed.
(_mm_maskz_loadu_epi64): Use _mm_setzero_si128 instead of
_mm_setzero_di.
(_mm_abs_epi64, _mm_maskz_abs_epi64, _mm_maskz_srl_epi64,
_mm_maskz_unpackhi_epi64, _mm_maskz_unpacklo_epi64,
_mm_maskz_compress_epi64, _mm_srav_epi64, _mm_maskz_srav_epi64,
_mm_maskz_sllv_epi64, _mm_maskz_srlv_epi64, _mm_rolv_epi64,
_mm_maskz_rolv_epi64, _mm_rorv_epi64, _mm_maskz_rorv_epi64,
_mm_min_epi64, _mm_max_epi64, _mm_max_epu64, _mm_min_epu64,
_mm_lzcnt_epi64, _mm_maskz_lzcnt_epi64, _mm_conflict_epi64,
_mm_maskz_conflict_epi64, _mm_sra_epi64, _mm_maskz_sra_epi64,
_mm_maskz_sll_epi64, _mm_rol_epi64, _mm_maskz_rol_epi64,
_mm_ror_epi64, _mm_maskz_ror_epi64, _mm_alignr_epi64,
_mm_maskz_alignr_epi64, _mm_srai_epi64, _mm_maskz_slli_epi64):
Likewise.
(_mm_cvtepi32_epi8, _mm256_cvtepi32_epi8, _mm_cvtsepi32_epi8,
_mm256_cvtsepi32_epi8, _mm_cvtusepi32_epi8, _mm256_cvtusepi32_epi8,
_mm_cvtepi32_epi16, _mm256_cvtepi32_epi16, _mm_cvtsepi32_epi16,
_mm256_cvtsepi32_epi16, _mm_cvtusepi32_epi16, _mm256_cvtusepi32_epi16,
_mm_cvtepi64_epi8, _mm256_cvtepi64_epi8, _mm_cvtsepi64_epi8,
_mm256_cvtsepi64_epi8, _mm_cvtusepi64_epi8, _mm256_cvtusepi64_epi8,
_mm_cvtepi64_epi16, _mm256_cvtepi64_epi16, _mm_cvtsepi64_epi16,
_mm256_cvtsepi64_epi16, _mm_cvtusepi64_epi16, _mm256_cvtusepi64_epi16,
_mm_cvtepi64_epi32, _mm256_cvtepi64_epi32, _mm_cvtsepi64_epi32,
_mm256_cvtsepi64_epi32, _mm_cvtusepi64_epi32, _mm256_cvtusepi64_epi32,
_mm_maskz_set1_epi32, _mm_maskz_set1_epi64): Formatting fixes.
(_mm_maskz_cvtps_ph, _mm256_maskz_cvtps_ph): Use _mm_setzero_si128
instead of _mm_setzero_hi.
(_mm256_permutex_pd, _mm256_maskz_permutex_epi64, _mm256_insertf32x4,
_mm256_maskz_insertf32x4, _mm256_inserti32x4, _mm256_maskz_inserti32x4,
_mm256_extractf32x4_ps, _mm256_maskz_extractf32x4_ps,
_mm256_shuffle_i32x4, _mm256_maskz_shuffle_i32x4, _mm256_shuffle_f64x2,
_mm256_maskz_shuffle_f64x2, _mm256_shuffle_f32x4,
_mm256_maskz_shuffle_f32x4, _mm256_maskz_shuffle_pd,
_mm_maskz_shuffle_pd, _mm256_maskz_shuffle_ps, _mm_maskz_shuffle_ps,
_mm256_maskz_srli_epi32, _mm_maskz_srli_epi32, _mm_maskz_srli_epi64,
_mm256_mask_slli_epi32, _mm256_maskz_slli_epi32, _mm256_mask_slli_epi64,
_mm256_maskz_slli_epi64, _mm256_roundscale_ps,
_mm256_maskz_roundscale_ps, _mm256_roundscale_pd,
_mm256_maskz_roundscale_pd, _mm_roundscale_ps, _mm_maskz_roundscale_ps,
_mm_roundscale_pd, _mm_maskz_roundscale_pd, _mm256_getmant_ps,
_mm256_maskz_getmant_ps, _mm_getmant_ps, _mm_maskz_getmant_ps,
_mm256_getmant_pd, _mm256_maskz_getmant_pd, _mm_getmant_pd,
_mm_maskz_getmant_pd, _mm256_maskz_shuffle_epi32,
_mm_maskz_shuffle_epi32, _mm256_rol_epi32, _mm256_maskz_rol_epi32,
_mm_rol_epi32, _mm_maskz_rol_epi32, _mm256_ror_epi32,
_mm256_maskz_ror_epi32, _mm_ror_epi32, _mm_maskz_ror_epi32,
_mm_maskz_alignr_epi32, _mm_maskz_alignr_epi64,
_mm256_maskz_srai_epi32, _mm_maskz_srai_epi32, _mm_srai_epi64,
_mm_maskz_srai_epi64, _mm256_maskz_permutex_pd,
_mm256_maskz_permute_pd, _mm256_maskz_permute_ps, _mm_maskz_permute_pd,
_mm_maskz_permute_ps, _mm256_permutexvar_ps): Formatting fixes.
(_mm_maskz_slli_epi64, _mm_rol_epi64, _mm_maskz_rol_epi64,
_mm_ror_epi64, 

Re: [Patch, Fortran, OOP] PR 78443: Incorrect behavior with non_overridable keyword

2016-11-22 Thread Janus Weil
2016-11-22 16:16 GMT+01:00 Steve Kargl :
>> here is a patch for a wrong-code problem with non_overridable
>> type-bound procedures. For details see the PR. Regtests cleanly. Ok
>> for trunk?
>
> OK.

Thanks, Steve. Committed as r242703.


>> Since the patch is very simple and it fixes wrong code which can
>> silently give bad runtime results, I think backporting to the release
>> branches might be a good idea as well. Ok?
>
> OK.

Will do soon (within a week or so).

Cheers,
Janus


Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-11-22 Thread Bin.Cheng
On Mon, Nov 21, 2016 at 9:34 PM, Doug Gilmore  wrote:
> I haven't seen any followups to this discussion of Bin's patch to
> PR68303 and PR69710, the patch submission:
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02000.html
>
> Discussion:
> http://gcc.gnu.org/ml/gcc-patches/2016-07/msg00761.html
> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg01551.html
> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg00372.html
> http://gcc.gnu.org/ml/gcc-patches/2016-06/msg01550.html
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02162.html
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02155.html
> http://gcc.gnu.org/ml/gcc-patches/2016-05/msg02154.html
>
>
> so I did some investigation to get a better understanding of the
> issues involved.
Hi Doug,
Thanks for looking into this problem.
>
> On 07/13/2016 01:59 PM, Jeff Law wrote:
>> On 05/25/2016 05:22 AM, Bin Cheng wrote:
>>> Hi, As analyzed in PR68303 and PR69710, vectorizer generates
>>> duplicated computations in loop's pre-header basic block when
>>> creating base address for vector reference to the same memory object.
>> Not a huge surprise.  Loop optimizations generally have a tendency
>> to create and/or expose CSE opportunities.  Unrolling is a common
>> culprit, there's certainly the possibility for header duplication,
>> code motions and IV rewriting to also expose/create redundant code.
>>
>> ...
>>
>>  But, 1) It
>>> doesn't fix all the problem on x86_64.  Root cause is computation for
>>> base address of the first reference is somehow moved outside of
>>> loop's pre-header, local CSE can't help in this case.
>> That's a bid odd -- have you investigated why this is outside the loop 
>> header?
>> ...
> I didn't look at this issue per se, but I did try running DOM between
> autovectorization and IVS.  Just running DOM had little effect, what
> was crucial was adding the change Bin mentioned in his original
> message:
>
> Besides CSE issue, this patch also re-associates address
> expressions in vect_create_addr_base_for_vector_ref, specifically,
> it splits constant offset and adds it back near the expression
> root in IR.  This is necessary because GCC only handles
> re-association for commutative operators in CSE.
>
> I attached a patch for these changes only.  These are the important
> modifications that address the some of the IVS related issues exposed
> by PR68303. I found that adding the CSE change (or calling DOM between
> autovectorization and IVOPTS) is not needed, and from what I have
I checked the code again.  As you said, re-association part is important
to enable CSE opportunities, no matter when and which pass handles it.
After re-association, the computation of base addresses are like:

//preheader
b_1 = g_Input + var_offset_1;
vectp_1 = b_1 + cst_offset_1;
b_2 = g_Input + var_offset_2;
vectp_2 = b_2 + cst_offset_2;
...
b_n = g_input + var_offset_n;
vectp_n = b_n + cst_offset_n;

//loop
MEM[vectp_1];
MEM[vectp_2];
...
MEM[vectp_n];

In fact, var_offset_1, var_offset_2, ..., var_offset_n are equal to others.  So
the addresses are in the form of "g_Input + var_offset + cst_offset_x" differing
to each other wrto constant offset.  The purpose of CSE is to propagate all
parts of this address to IVOPTs, otherwise IVOPTS only knows IVs as below:

iv_use_1: {b_1 + cst_offset_1, step}_loop
iv_use_1: {b_2 + cst_offset_2, step}_loop
...
iv_use_n: {b_n + cst_offset_n, step}_loop

> seen, actually makes the code worse.
>
> Applying only the modifications to
> vect_create_addr_base_for_vector_ref, additional simplifications will
> be done when induction variables are found (function
> find_induction_variables).  These simplications are indicated by the
> appearance of lines:
>
> Applying pattern match.pd:1056, generic-match.c:11865
This doesn't look related to this problem to me.  The simplification of this
problem is CSE, it's not what match.pd does.

>
> in the IVOPS dump file.  Now IVOPTs transforms the code so that
> constants now appear in the computation of the effective addresses for
> the memory OPs.  However the code generated by IVOPTS still uses a
> separate base register for each memory reference.  Later DOM3
> transforms the code to use just one base register, which is the form

Indeed CSE now looks like unnecessary fixing the problem, we can relying on
DOM pass to explore the equality among new bases (b_1, b_2, ..., b_n).  This
actually echoes my humble opinion: we shouldn't rely on IVOPTs to fix all bad
code issues.  On the other handle, for cases in which these bases
(b_1, b_2, ..., b_n)
are not equal to each other, there is not much to lose in this way either.

> the code needs to be in for the preliminary phase of IVOPTs where
> "IV uses" associated with memory OPs are placed into groups.  At the
> time of this grouping, checks are done to ensure that for each member
> of a group the constant offsets don't overflow the 

Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Marc Glisse

On Tue, 22 Nov 2016, Uros Bizjak wrote:


New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.


Hello,

I didn't read the patch, but do you think this also fixes PR 53155 ?

--
Marc Glisse


[PATCH] Fix up handle_pragma_target (PR target/78451)

2016-11-22 Thread Jakub Jelinek
Hi!

#pragma GCC targets when used more than once without being
undone through #pragma GCC pop_options in between seems to act wierdly
and is the reason why sse-22a.c testcase now fails on x86_64/i686-linux.
The problem is that to some extent
#pragma GCC target ("f1", "f2,f3")
#pragma GCC target ("f4,f5", "f6")
acts as
#pragma GCC target ("f1", "f2,f3", "f4,f5", "f6")
(when computing the current set of global options e.g.), but
when a target node is being created for a function, we don't use the
current global options at the point of declaration, but instead use
current_target_pragma TREE_LIST with the current target pragma options;
that list is properly saved/restored on push_options/pop_options pragma,
but a new GCC target pragma overwrites the previous list rather than
appending to it, so to some other extent the above two pragmas act as
just #pragma GCC target ("f4,f5", "f6").
In particular, in sse-22a.c test we start with #pragma GCC target
containing huge list of ISAs, then #include  header, and
there most inlines are wrapped in #pragma GCC push_options/#pragma GCC
target (someisa) and #pragma GCC pop_options, but there are some inlines
that aren't wrapped at all.  The effect of that is that those wrapped
routines get their target attribute solely from the innermost target option,
while those not wrapped ones get one from their innermost GCC target,
which is the huge list of ISAs.  I think this is undesirable, the
pragmas should stack (append to each other).  If users want to override
completely to something different, they can push_options/pop_options around
the former, or #pragma GCC target ("no-isa1,no-isa2,isa3").
Note that sse-22a.c fails because of this with -save-temps even in GCC 6.

The patch just treats these consistently as appending to current set of
options.  So if one does:
#pragma GCC push_options
#pragma GCC target ("isa1")
#pragma GCC push_options
#pragma GCC target ("isa2")
void foo () { ... }
#pragma GCC pop_options
#pragma GCC pop_options
the foo function gets both isa1 and isa2 target attributes (in that order).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-22  Jakub Jelinek  

PR target/78451
* c-pragma.c (handle_pragma_target): Don't replace
current_target_pragma, but chainon the new args to the current one.

* gcc.target/i386/pr78451.c: New test.
* gcc.target/i386/pr69255-1.c: Use #pragma GCC push_options
and #pragma GCC pop_options around the first #pragma GCC target.
* gcc.target/i386/pr69255-2.c: Likewise.
* gcc.target/i386/pr69255-3.c: Likewise.

--- gcc/c-family/c-pragma.c.jj  2016-10-31 13:28:06.0 +0100
+++ gcc/c-family/c-pragma.c 2016-11-22 11:34:34.535159762 +0100
@@ -893,7 +893,7 @@ handle_pragma_target(cpp_reader *ARG_UNU
   args = nreverse (args);
 
   if (targetm.target_option.pragma_parse (args, NULL_TREE))
-   current_target_pragma = args;
+   current_target_pragma = chainon (current_target_pragma, args);
 }
 }
 
--- gcc/testsuite/gcc.target/i386/pr78451.c.jj  2016-11-22 11:57:24.743002256 
+0100
+++ gcc/testsuite/gcc.target/i386/pr78451.c 2016-11-22 11:56:51.0 
+0100
@@ -0,0 +1,35 @@
+/* PR target/78451 */
+/* { dg-options "-O2 -mno-avx512f" } */
+
+#pragma GCC push_options
+#pragma GCC target ("avx512bw")
+
+static inline int __attribute__ ((__always_inline__))
+bar (void)
+{
+  return 0;
+}
+
+#pragma GCC push_options
+#pragma GCC target ("avx512vl")
+
+int
+foo (void)
+{
+  return bar ();
+}
+
+#pragma GCC pop_options
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("avx512vl")
+#pragma GCC target ("avx512bw")
+
+int
+baz (void)
+{
+  return bar ();
+}
+
+#pragma GCC pop_options
--- gcc/testsuite/gcc.target/i386/pr69255-1.c.jj2016-09-06 
22:29:59.0 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-1.c   2016-11-22 16:20:32.790498858 
+0100
@@ -2,7 +2,9 @@
 /* { dg-do compile } */
 /* { dg-options "-msse4 -mno-avx" } */
 
+#pragma GCC push_options
 #pragma GCC target "avx512vl"
+#pragma GCC pop_options
 #pragma GCC target "no-avx512vl"
 __attribute__ ((__vector_size__ (32))) long long a;
 __attribute__ ((__vector_size__ (16))) int b;
@@ -13,5 +15,5 @@ foo (const long long *p)
   a = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error "needs 
isa option -m32 -mavx512vl" } */
 }
 
-/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
-/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 13 } */
+/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" { 
target *-*-* } 15 } */
+/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" "" { 
target *-*-* } 15 } */
--- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj2016-09-06 
22:29:59.0 +0200
+++ gcc/testsuite/gcc.target/i386/pr69255-2.c   2016-11-22 16:20:44.760346741 

Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread David Malcolm
On Tue, 2016-11-22 at 15:45 +0100, Jakub Jelinek wrote:
> On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote:
> > On 11/22/2016 02:37 PM, Jakub Jelinek wrote:
> > > Can't it be done only if xloc.file contains any fancy characters?
> > 
> > Sure, but why? Strings generally get emitted with quotes around
> > them, I
> > don't see a good reason for filenames to be different, especially
> > if it
> > makes the output easier to parse.
> 
> Because printing common filenames matches what we emit in
> diagnostics,
> what e.g. sanitizers emit at runtime diagnostics, what we emit as
> locations
> in gimple dumps etc.

It sounds like a distinction between human-readable vs machine
-readable.

How about something like the following, which only adds the quotes if
outputting the RTL FE's input format?

Does this fix the failing tests?
From 642d511fdba3a33fb18ce46c549f7c972ed6b14e Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Tue, 22 Nov 2016 11:06:41 -0500
Subject: [PATCH] print-rtl.c: conditionalize quotes for filenames

gcc/ChangeLog:
	* print-rtl.c (rtx_writer::print_rtx_operand_code_i): Only use
	quotes for filenames when in compact mode.
---
 gcc/print-rtl.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 77e6b05..5370602 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -371,7 +371,10 @@ rtx_writer::print_rtx_operand_code_i (const_rtx in_rtx, int idx)
   if (INSN_HAS_LOCATION (in_insn))
 	{
 	  expanded_location xloc = insn_location (in_insn);
-	  fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
+	  if (m_compact)
+	fprintf (m_outfile, " \"%s\":%i", xloc.file, xloc.line);
+	  else
+	fprintf (m_outfile, " %s:%i", xloc.file, xloc.line);
 	}
 #endif
 }
-- 
1.8.5.3



[testsuite,committed] Fix prototype of memset in a test case.

2016-11-22 Thread Georg-Johann Lay
One test case used unsigned long for the 3rd parameter of memset, which 
should be size_t.  This made the test crash for targets where correct 
parameter passing depends on correct prototypes.


Fixed and committed as obvious.

Johann


gcc/testsuite/
* gcc.c-torture/execute/pr30778.c (memset): Use size_t for 3rd
parameter in declaration.

Index: gcc.c-torture/execute/pr30778.c
===
--- gcc.c-torture/execute/pr30778.c (revision 242541)
+++ gcc.c-torture/execute/pr30778.c (working copy)
@@ -1,4 +1,4 @@
-extern void *memset (void *, int, unsigned long);
+extern void *memset (void *, int, __SIZE_TYPE__);
 extern void abort (void);

 struct reg_stat {


[PATCH] Fix PR78472

2016-11-22 Thread Richard Biener

The following fixes a C/C++ interoperability issue with LTO when
zero-sized fields appear in one variant of a struct but not in another.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-11-22  Richard Biener  

PR lto/78472
* tree.c (gimple_canonical_types_compatible_p): Ignore zero-sized
fields.

lto/
* lto.c (hash_canonical_type): Ignore zero-sized fields.

* g++.dg/lto/pr78472_0.c: New testcase.
* g++.dg/lto/pr78472_1.C: Likewise.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 242657)
+++ gcc/tree.c  (working copy)
@@ -13506,10 +13506,12 @@ gimple_canonical_types_compatible_p (con
 f1 || f2;
 f1 = TREE_CHAIN (f1), f2 = TREE_CHAIN (f2))
  {
-   /* Skip non-fields.  */
-   while (f1 && TREE_CODE (f1) != FIELD_DECL)
+   /* Skip non-fields and zero-sized fields.  */
+   while (f1 && (TREE_CODE (f1) != FIELD_DECL
+ || integer_zerop (DECL_SIZE (f1
  f1 = TREE_CHAIN (f1);
-   while (f2 && TREE_CODE (f2) != FIELD_DECL)
+   while (f2 && (TREE_CODE (f2) != FIELD_DECL
+ || integer_zerop (DECL_SIZE (f2
  f2 = TREE_CHAIN (f2);
if (!f1 || !f2)
  break;
Index: gcc/lto/lto.c
===
--- gcc/lto/lto.c   (revision 242657)
+++ gcc/lto/lto.c   (working copy)
@@ -372,7 +372,8 @@ hash_canonical_type (tree type)
   tree f;
 
   for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
-   if (TREE_CODE (f) == FIELD_DECL)
+   if (TREE_CODE (f) == FIELD_DECL
+   && ! integer_zerop (DECL_SIZE (f)))
  {
iterative_hash_canonical_type (TREE_TYPE (f), hstate);
nf++;
Index: gcc/testsuite/g++.dg/lto/pr78472_0.c
===
--- gcc/testsuite/g++.dg/lto/pr78472_0.c(revision 0)
+++ gcc/testsuite/g++.dg/lto/pr78472_0.c(working copy)
@@ -0,0 +1,12 @@
+// { dg-lto-do link }
+
+extern struct S
+{
+  unsigned i:4;
+  unsigned :0;
+} s;
+static void *f(void)
+{
+  return 
+}
+int main() {}
Index: gcc/testsuite/g++.dg/lto/pr78472_1.C
===
--- gcc/testsuite/g++.dg/lto/pr78472_1.C(revision 0)
+++ gcc/testsuite/g++.dg/lto/pr78472_1.C(working copy)
@@ -0,0 +1,9 @@
+struct S
+{
+  unsigned i:4;
+  unsigned :0;
+} s;
+static void *f(void)
+{
+  return 
+}


Re: PR78153

2016-11-22 Thread Richard Biener
On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:

> On 22 November 2016 at 20:18, Richard Biener  wrote:
> > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 21 November 2016 at 15:10, Richard Biener  wrote:
> >> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> Hi,
> >> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
> >> >> PTRDIFF_MAX.
> >> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
> >> >> in the attached patch.
> >> >>
> >> >> However it regressed strlenopt-3.c:
> >> >>
> >> >> Consider fn1() from strlenopt-3.c:
> >> >>
> >> >> __attribute__((noinline, noclone)) size_t
> >> >> fn1 (char *p, char *q)
> >> >> {
> >> >>   size_t s = strlen (q);
> >> >>   strcpy (p, q);
> >> >>   return s - strlen (p);
> >> >> }
> >> >>
> >> >> The optimized dump shows the following:
> >> >>
> >> >> __attribute__((noclone, noinline))
> >> >> fn1 (char * p, char * q)
> >> >> {
> >> >>   size_t s;
> >> >>   size_t _7;
> >> >>   long unsigned int _9;
> >> >>
> >> >>   :
> >> >>   s_4 = strlen (q_3(D));
> >> >>   _9 = s_4 + 1;
> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >> >>   _7 = 0;
> >> >>   return _7;
> >> >>
> >> >> }
> >> >>
> >> >> which introduces the regression, because the test expects "return 0;" 
> >> >> in fn1().
> >> >>
> >> >> The issue seems to be in vrp2:
> >> >>
> >> >> Before the patch:
> >> >> Visiting statement:
> >> >> s_4 = strlen (q_3(D));
> >> >> Found new range for s_4: VARYING
> >> >>
> >> >> Visiting statement:
> >> >> _1 = s_4;
> >> >> Found new range for _1: [s_4, s_4]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> Visiting statement:
> >> >> _7 = s_4 - _1;
> >> >> Applying pattern match.pd:111, gimple-match.c:27997
> >> >> Match-and-simplified s_4 - _1 to 0
> >> >> Intersecting
> >> >>   [0, 0]
> >> >> and
> >> >>   [0, +INF]
> >> >> to
> >> >>   [0, 0]
> >> >> Found new range for _7: [0, 0]
> >> >>
> >> >> __attribute__((noclone, noinline))
> >> >> fn1 (char * p, char * q)
> >> >> {
> >> >>   size_t s;
> >> >>   long unsigned int _1;
> >> >>   long unsigned int _9;
> >> >>
> >> >>   :
> >> >>   s_4 = strlen (q_3(D));
> >> >>   _9 = s_4 + 1;
> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >> >>   _1 = s_4;
> >> >>   return 0;
> >> >>
> >> >> }
> >> >>
> >> >>
> >> >> After the patch:
> >> >> Visiting statement:
> >> >> s_4 = strlen (q_3(D));
> >> >> Intersecting
> >> >>   [0, 9223372036854775806]
> >> >> and
> >> >>   [0, 9223372036854775806]
> >> >> to
> >> >>   [0, 9223372036854775806]
> >> >> Found new range for s_4: [0, 9223372036854775806]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> Visiting statement:
> >> >> _1 = s_4;
> >> >> Intersecting
> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> >> and
> >> >>   [0, 9223372036854775806]
> >> >> to
> >> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> >> Found new range for _1: [0, 9223372036854775806]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> Visiting statement:
> >> >> _7 = s_4 - _1;
> >> >> Intersecting
> >> >>   ~[9223372036854775807, 9223372036854775809]
> >> >> and
> >> >>   ~[9223372036854775807, 9223372036854775809]
> >> >> to
> >> >>   ~[9223372036854775807, 9223372036854775809]
> >> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
> >> >> marking stmt to be not simulated again
> >> >>
> >> >> __attribute__((noclone, noinline))
> >> >> fn1 (char * p, char * q)
> >> >> {
> >> >>   size_t s;
> >> >>   long unsigned int _1;
> >> >>   size_t _7;
> >> >>   long unsigned int _9;
> >> >>
> >> >>   :
> >> >>   s_4 = strlen (q_3(D));
> >> >>   _9 = s_4 + 1;
> >> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >> >>   _1 = s_4;
> >> >>   _7 = s_4 - _1;
> >> >>   return _7;
> >> >>
> >> >> }
> >> >>
> >> >> Then forwprop4 turns
> >> >> _1 = s_4
> >> >> _7 = s_4 - _1
> >> >> into
> >> >> _7 = 0
> >> >>
> >> >> and we end up with:
> >> >> _7 = 0
> >> >> return _7
> >> >> in optimized dump.
> >> >>
> >> >> Running ccp again after forwprop4 trivially solves the issue, however
> >> >> I am not sure if we want to run ccp again ?
> >> >>
> >> >> The issue is probably with extract_range_from_ssa_name():
> >> >> For _1 = s_4
> >> >>
> >> >> Before patch:
> >> >> VR for s_4 is set to varying.
> >> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
> >> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to 
> >> >> s_4,
> >> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
> >> >> match.pd pattern x - x -> 0).
> >> >>
> >> >> After patch:
> >> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
> >> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
> >> >> so IIUC, we then lose the information that _1 is equal to s_4,
> >> >
> >> > We don't lose it, it's in its set of equivalencies.
> >> Ah, I missed that, thanks. For some reason I had mis-conception 

Re: PR78153

2016-11-22 Thread Prathamesh Kulkarni
On 22 November 2016 at 20:18, Richard Biener  wrote:
> On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 21 November 2016 at 15:10, Richard Biener  wrote:
>> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
>> >
>> >> Hi,
>> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
>> >> PTRDIFF_MAX.
>> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
>> >> in the attached patch.
>> >>
>> >> However it regressed strlenopt-3.c:
>> >>
>> >> Consider fn1() from strlenopt-3.c:
>> >>
>> >> __attribute__((noinline, noclone)) size_t
>> >> fn1 (char *p, char *q)
>> >> {
>> >>   size_t s = strlen (q);
>> >>   strcpy (p, q);
>> >>   return s - strlen (p);
>> >> }
>> >>
>> >> The optimized dump shows the following:
>> >>
>> >> __attribute__((noclone, noinline))
>> >> fn1 (char * p, char * q)
>> >> {
>> >>   size_t s;
>> >>   size_t _7;
>> >>   long unsigned int _9;
>> >>
>> >>   :
>> >>   s_4 = strlen (q_3(D));
>> >>   _9 = s_4 + 1;
>> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >>   _7 = 0;
>> >>   return _7;
>> >>
>> >> }
>> >>
>> >> which introduces the regression, because the test expects "return 0;" in 
>> >> fn1().
>> >>
>> >> The issue seems to be in vrp2:
>> >>
>> >> Before the patch:
>> >> Visiting statement:
>> >> s_4 = strlen (q_3(D));
>> >> Found new range for s_4: VARYING
>> >>
>> >> Visiting statement:
>> >> _1 = s_4;
>> >> Found new range for _1: [s_4, s_4]
>> >> marking stmt to be not simulated again
>> >>
>> >> Visiting statement:
>> >> _7 = s_4 - _1;
>> >> Applying pattern match.pd:111, gimple-match.c:27997
>> >> Match-and-simplified s_4 - _1 to 0
>> >> Intersecting
>> >>   [0, 0]
>> >> and
>> >>   [0, +INF]
>> >> to
>> >>   [0, 0]
>> >> Found new range for _7: [0, 0]
>> >>
>> >> __attribute__((noclone, noinline))
>> >> fn1 (char * p, char * q)
>> >> {
>> >>   size_t s;
>> >>   long unsigned int _1;
>> >>   long unsigned int _9;
>> >>
>> >>   :
>> >>   s_4 = strlen (q_3(D));
>> >>   _9 = s_4 + 1;
>> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >>   _1 = s_4;
>> >>   return 0;
>> >>
>> >> }
>> >>
>> >>
>> >> After the patch:
>> >> Visiting statement:
>> >> s_4 = strlen (q_3(D));
>> >> Intersecting
>> >>   [0, 9223372036854775806]
>> >> and
>> >>   [0, 9223372036854775806]
>> >> to
>> >>   [0, 9223372036854775806]
>> >> Found new range for s_4: [0, 9223372036854775806]
>> >> marking stmt to be not simulated again
>> >>
>> >> Visiting statement:
>> >> _1 = s_4;
>> >> Intersecting
>> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> and
>> >>   [0, 9223372036854775806]
>> >> to
>> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
>> >> Found new range for _1: [0, 9223372036854775806]
>> >> marking stmt to be not simulated again
>> >>
>> >> Visiting statement:
>> >> _7 = s_4 - _1;
>> >> Intersecting
>> >>   ~[9223372036854775807, 9223372036854775809]
>> >> and
>> >>   ~[9223372036854775807, 9223372036854775809]
>> >> to
>> >>   ~[9223372036854775807, 9223372036854775809]
>> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
>> >> marking stmt to be not simulated again
>> >>
>> >> __attribute__((noclone, noinline))
>> >> fn1 (char * p, char * q)
>> >> {
>> >>   size_t s;
>> >>   long unsigned int _1;
>> >>   size_t _7;
>> >>   long unsigned int _9;
>> >>
>> >>   :
>> >>   s_4 = strlen (q_3(D));
>> >>   _9 = s_4 + 1;
>> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
>> >>   _1 = s_4;
>> >>   _7 = s_4 - _1;
>> >>   return _7;
>> >>
>> >> }
>> >>
>> >> Then forwprop4 turns
>> >> _1 = s_4
>> >> _7 = s_4 - _1
>> >> into
>> >> _7 = 0
>> >>
>> >> and we end up with:
>> >> _7 = 0
>> >> return _7
>> >> in optimized dump.
>> >>
>> >> Running ccp again after forwprop4 trivially solves the issue, however
>> >> I am not sure if we want to run ccp again ?
>> >>
>> >> The issue is probably with extract_range_from_ssa_name():
>> >> For _1 = s_4
>> >>
>> >> Before patch:
>> >> VR for s_4 is set to varying.
>> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
>> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to 
>> >> s_4,
>> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
>> >> match.pd pattern x - x -> 0).
>> >>
>> >> After patch:
>> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
>> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
>> >> so IIUC, we then lose the information that _1 is equal to s_4,
>> >
>> > We don't lose it, it's in its set of equivalencies.
>> Ah, I missed that, thanks. For some reason I had mis-conception that
>> equivalences stores
>> variables which have same value-ranges but are not necessarily equal.
>> >
>> >> and vrp doesn't transform _7 = s_4 - _1 to _7 = 0.
>> >> forwprop4 does that because it sees that s_4 and _1 are equivalent.
>> >> Does this sound correct ?
>> >
>> > Yes.  So the issue is really that vrp_visit_assignment_or_call calls
>> > gimple_fold_stmt_to_constant_1 with 

Re: [Patch, Fortran, OOP] PR 78443: Incorrect behavior with non_overridable keyword

2016-11-22 Thread Steve Kargl
On Tue, Nov 22, 2016 at 01:14:46PM +0100, Janus Weil wrote:
> 
> here is a patch for a wrong-code problem with non_overridable
> type-bound procedures. For details see the PR. Regtests cleanly. Ok
> for trunk?

OK.

> Since the patch is very simple and it fixes wrong code which can
> silently give bad runtime results, I think backporting to the release
> branches might be a good idea as well. Ok?

OK.

-- 
Steve


[testsuite,committed]: Restrict 2 test cases to big targets.

2016-11-22 Thread Georg-Johann Lay

This adds requirements for 2 test cases:

loop-split.c needs 32-bit int at least.  Use int32plus as I didn't 
intend to change the very test case.


gcc.dg/stack-layout-dynamic-1.c aligns the stack to 16 bits so ptr32plus 
seems reasonable.


Committed  to trunk.

Johann



gcc/testsuite/
* gcc.dg/loop-split.c: Require int32plus.
* gcc.dg/stack-layout-dynamic-1.c: Require ptr32plus.

Index: gcc.dg/loop-split.c
===
--- gcc.dg/loop-split.c (revision 242541)
+++ gcc.dg/loop-split.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+/* { dg-require-effective-target int32plus } */

 #ifdef __cplusplus
 extern "C" int printf (const char *, ...);
Index: gcc.dg/stack-layout-dynamic-1.c
===
--- gcc.dg/stack-layout-dynamic-1.c (revision 242541)
+++ gcc.dg/stack-layout-dynamic-1.c (working copy)
@@ -2,6 +2,7 @@
in one pass together with normal local variables.  */
 /* { dg-do compile } */
 /* { dg-options "-O0 -fomit-frame-pointer" } */
+/* { dg-require-effective-target ptr32plus } */

 extern void bar (void *, void *, void *);
 void foo (void)


Re: [PATCH, testsuite]: Fix detection of -j make argument

2016-11-22 Thread Jeff Law

On 11/22/2016 05:25 AM, Uros Bizjak wrote:

Hello!

New makes (e.g. GNU Make 4.2.1) pass -j argument in MFLAGS is a
different way. While older makes pass only "-j", newer makes pass e.g.
"-j4" when -j is specified on the command line. The detection of "-j"
make argument doesn't work in the later case.

Attached patch reworks this functionality to detect -j correctly in all cases.

gcc/ChangeLog

2016-11-22  Uros Bizjak  

* Makefile.in ($(lang_checks_parallelized)): Fix detection
of -j argument.

gcc/ada/ChangeLog

2016-11-22  Uros Bizjak  

* gcc-interface/Make-lang.in (check-acats): Fix detection
of -j argument.

libstdc++-v3/ChangeLog

2016-11-22  Uros Bizjak  

* testsuite/Makefile.am
(check-DEJAGNU $(check_DEJAGNU_normal_targets)):Fix detection
of -j argument.
* testsuite/Makefile.in: Regenereate.

Patch was bootstrapped and regression tested on x86_64-linux-gnu with
"GNU Make 4.2.1" and "GNU Make 3.81". Ada was not checked, but the
change is consistent with other changes.

OK for mainline SVN and release branches?

OK on the rest of the bits, for the trunk and any release branches.

jeff


Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-22 Thread Jonathan Wakely

On 22/11/16 16:59 +0200, Ville Voutilainen wrote:

On 22 November 2016 at 15:36, Jonathan Wakely  wrote:

+#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or
gnu++11
+  template
+inline
+typename enable_if<__not_<__and_<__is_swappable<_T1>,
+__is_swappable<_T2>>>::value>::type
+swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete;



Is there any advantage to using __not_ here, rather than just:

   typename enable_if,
  __is_swappable<_T2>>::value>::type

?

__not_ is useful as a sub-expression of an __and_ / __or_ expression,
but at the top level doesn't seem to buy anything, and is more typing,
and requires indenting the code further.



There's no particular advantage, it's just a habitual way to write a mixture of
__and_s and __not_s that I suffer from, whichever way the nesting is.
I'm also not consistent:

+inline enable_if_t && is_swappable_v<_Tp>)>
+swap(optional<_Tp>&, optional<_Tp>&) = delete;



Yes, I noticed that :-)


so I can certainly change all these swaps to use operator! rather than
__not_. Is the
patch otherwise ok for trunk? What about the tuple part?


Yes, OK changing the top-level __not_s to operator!

I haven't reviewed the tuple part fully yet.



Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

2016-11-22 Thread Bernd Edlinger
Hi,

does this follow-up patch look reasonable?
See: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01945.html


Is it OK for trunk?


Thanks
Bernd.

On 11/21/16 21:46, Christophe Lyon wrote:
> On 18 November 2016 at 16:50, Bernd Edlinger  
> wrote:
>> On 11/18/16 12:58, Christophe Lyon wrote:
>>> On 17 November 2016 at 10:23, Kyrill Tkachov
>>>  wrote:

 On 09/11/16 12:58, Bernd Edlinger wrote:
>
> Hi!
>
>
> This patch enables the ldrd/strd peephole rules unconditionally.
>
> It is meant to fix cases, where the patch to reduce the sha512
> stack usage splits ldrd/strd instructions into separate ldr/str insns,
> but is technically independent from the other patch:
>
> See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html
>
> It was necessary to change check_effective_target_arm_prefer_ldrd_strd
> to retain the true prefer_ldrd_strd tuning flag.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?


 This is ok.
 Thanks,
 Kyrill

>>>
>>> Hi Bernd,
>>>
>>> Since you committed this patch (r242549), I'm seeing the new test
>>> failing on some arm*-linux-gnueabihf configurations:
>>>
>>> FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10
>>> FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times strd 9
>>>
>>> See 
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html
>>> for a map of failures.
>>>
>>> Am I missing something?
>>
>> Hi Christophe,
>>
>> as always many thanks for your testing...
>>
>> I have apparently only looked at the case -mfloat-abi=soft here, which
>> is what my other patch is going to address.  But all targets with
>> -mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd
>> and vstr.64 instead of strd, which should be accepted as well.
>>
>> So the attached patch should fix at least most of the fallout.
>>
>
> I've tested it, and indeed it fixes the failures I've reported.
>
> Thanks
>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
 >> 2016-11-18  Bernd Edlinger  
 >>
 >> * gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu.
 >>
 >>Index: gcc/testsuite/gcc.target/arm/pr53447-5.c
 >>===
 >>--- gcc/testsuite/gcc.target/arm/pr53447-5.c (revision 242588)
 >>+++ gcc/testsuite/gcc.target/arm/pr53447-5.c (working copy)
 >>@@ -15,5 +15,8 @@ void foo(long long* p)
 >>   p[9] -= p[10];
 >> }
 >>
 >>-/* { dg-final { scan-assembler-times "ldrd" 10 } } */
 >>-/* { dg-final { scan-assembler-times "strd" 9 } } */
 >>+/* We accept neon instructions vldr.64 and vstr.64 as well.
 >>+   Note: DejaGnu counts patterns with alternatives twice,
 >>+   so actually there are only 10 loads and 9 stores.  */
 >>+/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */
 >>+/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */


Re: [v3 PATCH] LWG 2766, LWG 2749

2016-11-22 Thread Ville Voutilainen
On 22 November 2016 at 15:36, Jonathan Wakely  wrote:
>> +#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or
>> gnu++11
>> +  template
>> +inline
>> +typename enable_if<__not_<__and_<__is_swappable<_T1>,
>> +__is_swappable<_T2>>>::value>::type
>> +swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete;
>
>
> Is there any advantage to using __not_ here, rather than just:
>
>typename enable_if,
>   __is_swappable<_T2>>::value>::type
>
> ?
>
> __not_ is useful as a sub-expression of an __and_ / __or_ expression,
> but at the top level doesn't seem to buy anything, and is more typing,
> and requires indenting the code further.


There's no particular advantage, it's just a habitual way to write a mixture of
__and_s and __not_s that I suffer from, whichever way the nesting is.
I'm also not consistent:

+inline enable_if_t && is_swappable_v<_Tp>)>
+swap(optional<_Tp>&, optional<_Tp>&) = delete;

so I can certainly change all these swaps to use operator! rather than
__not_. Is the
patch otherwise ok for trunk? What about the tuple part?


Re: PR78153

2016-11-22 Thread Richard Biener
On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:

> On 21 November 2016 at 15:10, Richard Biener  wrote:
> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> As suggested by Martin in PR78153 strlen's return value cannot exceed
> >> PTRDIFF_MAX.
> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic()
> >> in the attached patch.
> >>
> >> However it regressed strlenopt-3.c:
> >>
> >> Consider fn1() from strlenopt-3.c:
> >>
> >> __attribute__((noinline, noclone)) size_t
> >> fn1 (char *p, char *q)
> >> {
> >>   size_t s = strlen (q);
> >>   strcpy (p, q);
> >>   return s - strlen (p);
> >> }
> >>
> >> The optimized dump shows the following:
> >>
> >> __attribute__((noclone, noinline))
> >> fn1 (char * p, char * q)
> >> {
> >>   size_t s;
> >>   size_t _7;
> >>   long unsigned int _9;
> >>
> >>   :
> >>   s_4 = strlen (q_3(D));
> >>   _9 = s_4 + 1;
> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >>   _7 = 0;
> >>   return _7;
> >>
> >> }
> >>
> >> which introduces the regression, because the test expects "return 0;" in 
> >> fn1().
> >>
> >> The issue seems to be in vrp2:
> >>
> >> Before the patch:
> >> Visiting statement:
> >> s_4 = strlen (q_3(D));
> >> Found new range for s_4: VARYING
> >>
> >> Visiting statement:
> >> _1 = s_4;
> >> Found new range for _1: [s_4, s_4]
> >> marking stmt to be not simulated again
> >>
> >> Visiting statement:
> >> _7 = s_4 - _1;
> >> Applying pattern match.pd:111, gimple-match.c:27997
> >> Match-and-simplified s_4 - _1 to 0
> >> Intersecting
> >>   [0, 0]
> >> and
> >>   [0, +INF]
> >> to
> >>   [0, 0]
> >> Found new range for _7: [0, 0]
> >>
> >> __attribute__((noclone, noinline))
> >> fn1 (char * p, char * q)
> >> {
> >>   size_t s;
> >>   long unsigned int _1;
> >>   long unsigned int _9;
> >>
> >>   :
> >>   s_4 = strlen (q_3(D));
> >>   _9 = s_4 + 1;
> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >>   _1 = s_4;
> >>   return 0;
> >>
> >> }
> >>
> >>
> >> After the patch:
> >> Visiting statement:
> >> s_4 = strlen (q_3(D));
> >> Intersecting
> >>   [0, 9223372036854775806]
> >> and
> >>   [0, 9223372036854775806]
> >> to
> >>   [0, 9223372036854775806]
> >> Found new range for s_4: [0, 9223372036854775806]
> >> marking stmt to be not simulated again
> >>
> >> Visiting statement:
> >> _1 = s_4;
> >> Intersecting
> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> and
> >>   [0, 9223372036854775806]
> >> to
> >>   [0, 9223372036854775806]  EQUIVALENCES: { s_4 } (1 elements)
> >> Found new range for _1: [0, 9223372036854775806]
> >> marking stmt to be not simulated again
> >>
> >> Visiting statement:
> >> _7 = s_4 - _1;
> >> Intersecting
> >>   ~[9223372036854775807, 9223372036854775809]
> >> and
> >>   ~[9223372036854775807, 9223372036854775809]
> >> to
> >>   ~[9223372036854775807, 9223372036854775809]
> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809]
> >> marking stmt to be not simulated again
> >>
> >> __attribute__((noclone, noinline))
> >> fn1 (char * p, char * q)
> >> {
> >>   size_t s;
> >>   long unsigned int _1;
> >>   size_t _7;
> >>   long unsigned int _9;
> >>
> >>   :
> >>   s_4 = strlen (q_3(D));
> >>   _9 = s_4 + 1;
> >>   __builtin_memcpy (p_5(D), q_3(D), _9);
> >>   _1 = s_4;
> >>   _7 = s_4 - _1;
> >>   return _7;
> >>
> >> }
> >>
> >> Then forwprop4 turns
> >> _1 = s_4
> >> _7 = s_4 - _1
> >> into
> >> _7 = 0
> >>
> >> and we end up with:
> >> _7 = 0
> >> return _7
> >> in optimized dump.
> >>
> >> Running ccp again after forwprop4 trivially solves the issue, however
> >> I am not sure if we want to run ccp again ?
> >>
> >> The issue is probably with extract_range_from_ssa_name():
> >> For _1 = s_4
> >>
> >> Before patch:
> >> VR for s_4 is set to varying.
> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name.
> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to 
> >> s_4,
> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using
> >> match.pd pattern x - x -> 0).
> >>
> >> After patch:
> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1]
> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1]
> >> so IIUC, we then lose the information that _1 is equal to s_4,
> >
> > We don't lose it, it's in its set of equivalencies.
> Ah, I missed that, thanks. For some reason I had mis-conception that
> equivalences stores
> variables which have same value-ranges but are not necessarily equal.
> >
> >> and vrp doesn't transform _7 = s_4 - _1 to _7 = 0.
> >> forwprop4 does that because it sees that s_4 and _1 are equivalent.
> >> Does this sound correct ?
> >
> > Yes.  So the issue is really that vrp_visit_assignment_or_call calls
> > gimple_fold_stmt_to_constant_1 with vrp_valueize[_1] which when
> > we do not have a singleton VR_RANGE does not fall back to looking
> > at equivalences (there's not a good cheap way to do that currently because
> > VRP doesn't keep a proper copy lattice but simply IORs equivalences
> 

Re: Fix PR78154

2016-11-22 Thread Richard Biener
On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote:

> On 21 November 2016 at 15:34, Richard Biener  wrote:
> > On Fri, 18 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 17 November 2016 at 15:24, Richard Biener  wrote:
> >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 17 November 2016 at 14:21, Richard Biener  wrote:
> >> >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> Hi Richard,
> >> >> >> Following your suggestion in PR78154, the patch checks if stmt
> >> >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
> >> >> >> and returns true in that case.
> >> >> >>
> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> >> >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress.
> >> >> >> Would it be OK to commit this patch in stage-3 ?
> >> >> >
> >> >> > As people noted we have returns_nonnull for this and that is already
> >> >> > checked.  So please make sure the builtins get this attribute instead.
> >> >> OK thanks, I will add the returns_nonnull attribute to the required
> >> >> string builtins.
> >> >> I noticed some of the string builtins don't have RET1 in builtins.def:
> >> >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF.
> >> >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar
> >> >> to entries for memmove, strcpy ?
> >> >
> >> > Yes, I think so.
> >> Hi,
> >> In the attached patch I added returns_nonnull attribute to
> >> ATTR_RET1_NOTHROW_NONNULL_LEAF,
> >> and changed few builtins like strcat, strncpy, strncat and
> >> corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF.
> >> Does the patch look correct ?
> >
> > Hmm, given you only change ATTR_RET1_NOTHROW_NONNULL_LEAF means that
> > the gimple_stmt_nonzero_warnv_p code is incomplete -- it should
> > infer returns_nonnull itself from RET1 (which is fnspec("1") basically)
> > and the nonnull attribute on the argument.  So
> >
> >   unsigned rf = gimple_call_return_flags (stmt);
> >   if (rf & ERF_RETURNS_ARG)
> >{
> >  tree arg = gimple_call_arg (stmt, rf & ERF_RETURN_ARG_MASK);
> >  if (range of arg is ! VARYING)
> >use range of arg;
> >  else if (infer_nonnull_range_by_attribute (stmt, arg))
> > ... nonnull ...
> >
> Hi,
> Thanks for the suggestions, modified gimple_stmt_nonzero_warnv_p
> accordingly in this version.
> For functions like stpcpy that return nonnull but not one of it's
> arguments, I added new enum ATTR_RETNONNULL_NOTHROW_LEAF.
> Is that OK ?
> Bootstrapped+tested on x86_64-unknown-linux-gnu.
> Cross-testing on arm*-*-*, aarch64*-*-* in progress.

+   value_range *vr = get_value_range (arg);
+   if ((vr && vr->type != VR_VARYING)
+   || infer_nonnull_range_by_attribute (stmt, arg))
+ return true;
+ }

actually that's not quite correct (failed to notice the function
doesn't return a range but whether the range is nonnull).  For
nonnull it's just

  if (infer_nonnull_range_by_attribute (stmt, arg))
return true;

in the extract_range_basic call handling we could handle
ERF_RETURNS_ARG by returning the range of the argument (if not varying).

Thus the patch is ok with the above condition changed.  Please refer
to the recently opened PR from the ChangeLog.

Thanks,
Richard.


[arm-embedded] [PATCH, GCC/ARM 1/2] Add multilib support for embedded bare-metal targets

2016-11-22 Thread Thomas Preudhomme

Hi,

We have decided to backport this patch to add support for multilib for embedded 
bare-metal targets to our embedded-6-branch.


*** gcc/ChangeLog.arm ***

2016-11-22  Thomas Preud'homme  

Backport from mainline
2016-11-22 Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.


Best regards,

Thomas
--- Begin Message ---

Ping?

Best regards,

Thomas

On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 27/10/16 15:26, Thomas Preudhomme wrote:

Hi Kyrill,

On 27/10/16 10:45, Kyrill Tkachov wrote:

Hi Thomas,

On 24/10/16 09:06, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile
ARM
architectures. This specific patch adds the t-rmprofile multilib Makefile
fragment for the former objective. Multilib are built for all M profile
architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7
multilib is used for R profile architectures but also A profile
architectures.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme 

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile
value
for ARM.


Testing:

== aprofile ==
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* default spec (gcc -dumpspecs) is the same before and after the patchset for
aprofile
* No difference in --print-multi-directory between before and after the
patchset
for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU
and
float ABI

== rmprofile ==
* aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
ABI) and directory naming
* Difference in --print-multi-directory between before [1] and after the
patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
CPU, FPU and float ABI modulo the name and directory structure changes

[1] as per patch applied in ARM embedded branches
https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup






== aprofile + rmprofile ==
* aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is
not supported


Is this ok for master branch?

Best regards,

Thomas


+# Arch Matches
+MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES   += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES   += march?armv7=march?armv7-a
+MULTILIB_MATCHES   += march?armv7=march?armv7ve
+MULTILIB_MATCHES   += march?armv7=march?armv8-a
+MULTILIB_MATCHES   += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a+crc
+endif

I think you want to update the patch to handle -march=armv8.2-a and
armv8.2-a+fp16
Thanks,
Kyrill


Indeed. Please find updated ChangeLog and patch (attached):

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.

Ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 # pragmatic.
 tmake_profile_file="arm/t-aprofile"
 ;;
+			rmprofile)
+# Note that arm/t-rmprofile is a
+# stand-alone make file fragment to be
+# used only with itself.  We do not
+# specifically use the
+# TM_MULTILIB_OPTION framework because
+# this shorthand is more
+# pragmatic.
+tmake_profile_file="arm/t-rmprofile"
+;;
 			default)
 ;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-# arm/t-aprofile is only designed to work
-# without any with-cpu, with-arch, with-mode,
-# with-fpu or with-float options.
+# arm/t-aprofile and arm/t-rmprofile are only
+# designed to work without any with-cpu,
+# with-arch, 

Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 03:38:04PM +0100, Bernd Schmidt wrote:
> On 11/22/2016 02:37 PM, Jakub Jelinek wrote:
> >Can't it be done only if xloc.file contains any fancy characters?
> 
> Sure, but why? Strings generally get emitted with quotes around them, I
> don't see a good reason for filenames to be different, especially if it
> makes the output easier to parse.

Because printing common filenames matches what we emit in diagnostics,
what e.g. sanitizers emit at runtime diagnostics, what we emit as locations
in gimple dumps etc.

Jakub


Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally

2016-11-22 Thread Kyrill Tkachov


On 22/11/16 14:42, Bernd Edlinger wrote:

Hi,

does this follow-up patch look reasonable?
See: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01945.html


Is it OK for trunk?



Ah yes, this one slipped my attention.
This is ok.
Thanks,
Kyrill


Thanks
Bernd.

On 11/21/16 21:46, Christophe Lyon wrote:

On 18 November 2016 at 16:50, Bernd Edlinger  wrote:

On 11/18/16 12:58, Christophe Lyon wrote:

On 17 November 2016 at 10:23, Kyrill Tkachov
 wrote:

On 09/11/16 12:58, Bernd Edlinger wrote:

Hi!


This patch enables the ldrd/strd peephole rules unconditionally.

It is meant to fix cases, where the patch to reduce the sha512
stack usage splits ldrd/strd instructions into separate ldr/str insns,
but is technically independent from the other patch:

See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html

It was necessary to change check_effective_target_arm_prefer_ldrd_strd
to retain the true prefer_ldrd_strd tuning flag.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


This is ok.
Thanks,
Kyrill


Hi Bernd,

Since you committed this patch (r242549), I'm seeing the new test
failing on some arm*-linux-gnueabihf configurations:

FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10
FAIL:  gcc.target/arm/pr53447-5.c scan-assembler-times strd 9

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html
for a map of failures.

Am I missing something?

Hi Christophe,

as always many thanks for your testing...

I have apparently only looked at the case -mfloat-abi=soft here, which
is what my other patch is going to address.  But all targets with
-mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd
and vstr.64 instead of strd, which should be accepted as well.

So the attached patch should fix at least most of the fallout.


I've tested it, and indeed it fixes the failures I've reported.

Thanks


Is it OK for trunk?


Thanks
Bernd.

  >> 2016-11-18  Bernd Edlinger  
  >>
  >>  * gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu.
  >>
  >>Index: gcc/testsuite/gcc.target/arm/pr53447-5.c
  >>===
  >>--- gcc/testsuite/gcc.target/arm/pr53447-5.c  (revision 242588)
  >>+++ gcc/testsuite/gcc.target/arm/pr53447-5.c  (working copy)
  >>@@ -15,5 +15,8 @@ void foo(long long* p)
  >>   p[9] -= p[10];
  >> }
  >>
  >>-/* { dg-final { scan-assembler-times "ldrd" 10 } } */
  >>-/* { dg-final { scan-assembler-times "strd" 9 } } */
  >>+/* We accept neon instructions vldr.64 and vstr.64 as well.
  >>+   Note: DejaGnu counts patterns with alternatives twice,
  >>+   so actually there are only 10 loads and 9 stores.  */
  >>+/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */
  >>+/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */




Re: [arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

2016-11-22 Thread Thomas Preudhomme

Hi,

We decided to also apply this patch to the ARM embedded 6 branch.

Best regards,

Thomas

On 17/12/15 09:32, Thomas Preud'homme wrote:

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
Sent: Wednesday, December 16, 2015 7:59 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
Kyrylo Tkachov
Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM
multilibs

Currently in config.gcc, only the first multilib in a multilib list is checked 
for
validity and the following elements are ignored due to the break which
only breaks out of loop in shell. A loop is also done over the multilib list
elements despite no combination being legal. This patch rework the code
to address both issues.

ChangeLog entry is as follows:


2015-11-24  Thomas Preud'homme  

* config.gcc: Error out when conflicting multilib is detected.  Do not
loop over multilibs since no combination is legal.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 59aee2c..be3c720 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3772,38 +3772,40 @@ case "${target}" in
# Add extra multilibs
if test "x$with_multilib_list" !=; then
arm_multilibs=cho $with_multilib_list | sed -e
's/,/ /g'`
-   for arm_multilib in ${arm_multilibs}; do
-   case ${arm_multilib} in
-   aprofile)
+   case ${arm_multilibs} in
+   aprofile)
# Note that arm/t-aprofile is a
# stand-alone make file fragment to be
# used only with itself.  We do not
# specifically use the
# TM_MULTILIB_OPTION framework
because
# this shorthand is more
-   # pragmatic. Additionally it is only
-   # designed to work without any
-   # with-cpu, with-arch with-mode
+   # pragmatic.
+   tmake_profile_file=rm/t-aprofile"
+   ;;
+   default)
+   ;;
+   *)
+   echo "Error: --with-multilib-
list=with_multilib_list} not supported." 1>&2
+   exit 1
+   ;;
+   esac
+
+   if test "x${tmake_profile_file}" != ; then
+   # arm/t-aprofile is only designed to work
+   # without any with-cpu, with-arch, with-
mode,
# with-fpu or with-float options.
-   if test "x$with_arch" != \
-   || test "x$with_cpu" != \
-   || test "x$with_float" != \
-   || test "x$with_fpu" != \
-   || test "x$with_mode" != ;
then
-   echo "Error: You cannot use
any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=rofile"
1>&2
-   exit 1
-   fi
-   tmake_file={tmake_file}
arm/t-aprofile"
-   break
-   ;;
-   default)
-   ;;
-   *)
-   echo "Error: --with-multilib-
list=with_multilib_list} not supported." 1>&2
-   exit 1
-   ;;
-   esac
-   done
+   if test "x$with_arch" != \
+   || test "x$with_cpu" != \
+   || test "x$with_float" != \
+   || test "x$with_fpu" != \
+   || test "x$with_mode" != ; then
+   echo "Error: You cannot use any of --
with-arch/cpu/fpu/float/mode with --with-multilib-list=arm_multilib}"
1>&2
+   exit 1
+   fi
+
+   tmake_file={tmake_file}
${tmake_profile_file}"
+   fi
fi
;;


Tested with the following multilib lists:
  + foo -> "Error: --with-multilib-list=o not supported" as 

Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Dominik Vogt
On Tue, Nov 22, 2016 at 09:25:03AM -0500, David Malcolm wrote:
> On Tue, 2016-11-22 at 14:37 +0100, Jakub Jelinek wrote:
> > On Tue, Nov 22, 2016 at 02:32:39PM +0100, Bernd Schmidt wrote:
> > > On 11/22/2016 02:18 PM, Dominik Vogt wrote:
> > > 
> > > > > @@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx,
> > > > > int idx)
> > > > >   if (INSN_HAS_LOCATION (in_insn))
> > > > >   {
> > > > > expanded_location xloc = insn_location (in_insn);
> > > > > -   fprintf (outfile, " %s:%i", xloc.file, xloc.line);
> > > > > +   fprintf (outfile, " \"%s\":%i", xloc.file,
> > > > > xloc.line);
> > > > 
> > > > Was this change intentional?  We've got to update a scan
> > > > -assembler
> > > > statement in an s390 test to reflect the additional double quotes
> > > > in the output string.  Not a big deal, just wanted to make sure
> > > > this is not an accident.
> 
> Sorry about the breakage.
> 
> How widespread is the problem?

In the s390 tests, it is only a single scan-assembler.  Not sure
whether these are affected or not:

gcc.dg/debug/dwarf2/pr29609-1.c:/* { dg-final { scan-assembler "pr29609-1.c:18" 
} } */
gcc.dg/debug/dwarf2/pr29609-2.c:/* { dg-final { scan-assembler "pr29609-2.c:27" 
} } */
...
gcc.dg/debug/dwarf2/pr36690-1.c:/* { dg-final { scan-assembler "pr36690-1.c:11" 
} } */
gcc.dg/debug/dwarf2/pr36690-2.c:/* { dg-final { scan-assembler "pr36690-2.c:24" 
} } */
gcc.dg/debug/dwarf2/pr36690-3.c:/* { dg-final { scan-assembler "pr36690-3.c:19" 
} } */
...
gcc.dg/debug/dwarf2/pr37616.c:/* { dg-final { scan-assembler "pr37616.c:17" } } 
*/
...
gcc.dg/debug/dwarf2/short-circuit.c:/* { dg-final { scan-assembler 
"short-circuit.c:11" } } */
...

(List generated with

  $ cd testsuite
  $ grep -r "scan-assembler.*[.]c.\?.\?.\?:" .
)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread Bernd Schmidt

On 11/22/2016 02:37 PM, Jakub Jelinek wrote:

Can't it be done only if xloc.file contains any fancy characters?


Sure, but why? Strings generally get emitted with quotes around them, I 
don't see a good reason for filenames to be different, especially if it 
makes the output easier to parse.



If it does (where fancy should be anything other than [a-zA-Z/_0-9.-] or
some other reasonable definition, certainly space, quotes, backslash, etc. 
would count),
shouldn't we adjust it (e.g. use \" instead of ", handle control characters
etc.)?


The way I see it, spaces in filenames are regrettably somewhat common. 
Backslashes and quotes rather less so, to the point I really don't see a 
need to worry about them at the moment, and the necessary quoting could 
be added later if really necessary.



Bernd



Re: [PATCH] (v2) Add a "compact" mode to print_rtx_function

2016-11-22 Thread David Malcolm
On Tue, 2016-11-22 at 14:37 +0100, Jakub Jelinek wrote:
> On Tue, Nov 22, 2016 at 02:32:39PM +0100, Bernd Schmidt wrote:
> > On 11/22/2016 02:18 PM, Dominik Vogt wrote:
> > 
> > > > @@ -284,7 +292,7 @@ print_rtx_operand_code_i (const_rtx in_rtx,
> > > > int idx)
> > > >   if (INSN_HAS_LOCATION (in_insn))
> > > > {
> > > >   expanded_location xloc = insn_location (in_insn);
> > > > - fprintf (outfile, " %s:%i", xloc.file, xloc.line);
> > > > + fprintf (outfile, " \"%s\":%i", xloc.file,
> > > > xloc.line);
> > > 
> > > Was this change intentional?  We've got to update a scan
> > > -assembler
> > > statement in an s390 test to reflect the additional double quotes
> > > in the output string.  Not a big deal, just wanted to make sure
> > > this is not an accident.

Sorry about the breakage.

How widespread is the problem?

> > The idea was to make the output less ambiguous for file names with
> > spaces.
> 
> Can't it be done only if xloc.file contains any fancy characters?
> If it does (where fancy should be anything other than [a-zA-Z/_0-9.-]
> or
> some other reasonable definition, certainly space, quotes, backslash,
> etc. would count),
> shouldn't we adjust it (e.g. use \" instead of ", handle control
> characters
> etc.)?

The idea was that quotes also make the output somewhat easier for the
RTL frontend to parse, though reading the latest version of the RTL
frontend patches, it looks like I don't make use of them yet.

Another approach would be to only use the quotes when the dump is in
"compact" mode, since compact mode is the format that the RTL frontend
parses: the RTL dumps emitted by DejaGnu don't use it, instead using
the older style.



Re: [PATCH 1/4] Remove build dependence on HSA run-time

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 02:27:44PM +0100, Martin Jambor wrote:
> I have basically copied what libgfortran did, with additional checking
> for HAVE_UNISTD_H when attempting to implement secure_getenv in its
> absence (which is maybe unnecessary but should not do any harm) and I
> also needed to add -D_GNU_SOURCE to plugin compilation flags.
> Finally, I have changed all getenv users in the plugin to use
> secure_getenv.

I'm not sure about the all getenv users to secure_getenv, for the
specification of the library to dlopen it is essential, for the rest it
is debatable; but it is your choice.

> +hsa_status_t hsa_executable_validate(hsa_executable_t executable,
> + uint32_t *result);
> +uint64_t hsa_queue_add_write_index_acq_rel(const hsa_queue_t *queue,
> +   uint64_t value);
...
> +hsa_status_t hsa_executable_readonly_variable_define(
> +hsa_executable_t executable, hsa_agent_t agent, const char 
> *variable_name,
> +void *address);

If hsa.h is our header rather than one imported from somewhere else,
can you tweak the formatting (space before (, in the last above case
wrap after type to allow more arguments on a line?
If it is just imported from somewhere else, please disregard.

Otherwise LGTM.

Jakub


[testsuite,committed]: Fix a test that assumed int is 32 bits.

2016-11-22 Thread Georg-Johann Lay
Committed as obvious because the test case is clearly about a vector of 
4 * int.


Johann

gcc/testsuite/
* c-c++-common/builtin-shuffle-1.c (V): Use 4 * int in vector.


Index: c-c++-common/builtin-shuffle-1.c
===
--- c-c++-common/builtin-shuffle-1.c(revision 242541)
+++ c-c++-common/builtin-shuffle-1.c(working copy)
@@ -1,7 +1,7 @@
 /* PR c++/78089 */
 /* { dg-do run } */

-typedef int V __attribute__((vector_size (16)));
+typedef int V __attribute__((vector_size (4 * __SIZEOF_INT__)));
 V a, b, c;

 int


Re: [PATCH 3/4] OpenMP lowering changes from the hsa branch

2016-11-22 Thread Martin Jambor
Hi,

On Fri, Nov 18, 2016 at 11:38:56AM +0100, Jakub Jelinek wrote:
> On Sun, Nov 13, 2016 at 10:42:01PM +0100, Martin Jambor wrote:
> > +  size_t collapse = gimple_omp_for_collapse (for_stmt);
> > +  struct omp_for_data_loop *loops
> > += (struct omp_for_data_loop *)
> > +alloca (gimple_omp_for_collapse (for_stmt)
> > +   * sizeof (struct omp_for_data_loop));
> 
> Use
>   struct omp_for_data_loop *loops
> = XALLOCAVEC (struct omp_for_data_loop,
> gimple_omp_for_collapse (for_stmt));
> instead?

I have changed it as you suggested.

> 
> > @@ -14133,7 +14183,7 @@ const pass_data pass_data_expand_omp =
> >  {
> >GIMPLE_PASS, /* type */
> >"ompexp", /* name */
> > -  OPTGROUP_NONE, /* optinfo_flags */
> > +  OPTGROUP_OPENMP, /* optinfo_flags */
> >TV_NONE, /* tv_id */
> >PROP_gimple_any, /* properties_required */
> >PROP_gimple_eomp, /* properties_provided */
> 
> What about the simdclone, omptargetlink, diagnose_omp_blocks passes?  What 
> about
> openacc specific passes (oaccdevlow)?  And Alex is hopefully going to add
> ompdevlow pass soon.

I was not sure about those at first, but I suppose all of them should
also be in the same group (though I hope the name is still fine), so I
added them.  I will make sure that ompdevlow pass will be in it as
well, whether it gets in before or after this.

> 
> Otherwise LGTM.

Thanks,  the updated patch is below.  I have tested the whole patch
set by by bootstrapping, lto-bootstrapping and testing on x86_64-linux
and bootstrapping and testing on aarch64-linux.  I will commit it when
the first patch is approved.

Thank you very much for the review,

Martin


2016-11-21  Martin Jambor  

gcc/
* dumpfile.h (OPTGROUP_OPENMP): Define.
* dumpfile.c (optgroup_options): Added OPTGROUP_OPENMP.
* gimple.h (gf_mask): Added elements GF_OMP_FOR_GRID_INTRA_GROUP and
GF_OMP_FOR_GRID_GROUP_ITER.
(gimple_omp_for_grid_phony): Added checking assert.
(gimple_omp_for_set_grid_phony): Likewise.
(gimple_omp_for_grid_intra_group): New function.
(gimple_omp_for_set_grid_intra_group): Likewise.
(gimple_omp_for_grid_group_iter): Likewise.
(gimple_omp_for_set_grid_group_iter): Likewise.
* omp-low.c (check_omp_nesting_restrictions): Allow GRID loop where
previosuly only distribute loop was permitted.
(lower_lastprivate_clauses): Allow non tcc_comparison predicates.
(grid_get_kernel_launch_attributes): Support multiple HSA grid
dimensions.
(grid_expand_omp_for_loop): Likewise and also support standalone
distribute constructs.  New parameter INTRA_GROUP, updated both users.
(grid_expand_target_grid_body): Support standalone distribute
constructs.
(pass_data_expand_omp): Changed optinfo_flags to OPTGROUP_OPENMP.
(pass_data_expand_omp_ssa): Likewise.
(pass_data_lower_omp): Likewise.
(pass_data_diagnose_omp_blocks): Likewise.
(pass_data_oacc_device_lower): Likewise.
(pass_data_omp_target_link): Likewise.
(grid_lastprivate_predicate): New function.
(lower_omp_for_lastprivate): Call grid_lastprivate_predicate for
gridified loops.
(lower_omp_for): Support standalone distribute constructs.
(grid_prop): New type.
(grid_safe_assignment_p): Check for assignments to group_sizes, new
parameter GRID.
(grid_seq_only_contains_local_assignments): New parameter GRID, pass
it to callee.
(grid_find_single_omp_among_assignments_1): Likewise, improve missed
optimization info messages.
(grid_find_single_omp_among_assignments): Likewise.
(grid_find_ungridifiable_statement): Do not bail out for SIMDs.
(grid_parallel_clauses_gridifiable): New function.
(grid_inner_loop_gridifiable_p): Likewise.
(grid_dist_follows_simple_pattern): Likewise.
(grid_gfor_follows_tiling_pattern): Likewise.
(grid_call_permissible_in_distribute_p): Likewise.
(grid_handle_call_in_distribute): Likewise.
(grid_dist_follows_tiling_pattern): Likewise.
(grid_target_follows_gridifiable_pattern): Support standalone distribute
constructs.
(grid_var_segment): New enum.
(grid_mark_variable_segment): New function.
(grid_copy_leading_local_assignments): Call grid_mark_variable_segment
if a new argument says so.
(grid_process_grid_body): New function.
(grid_eliminate_combined_simd_part): Likewise.
(grid_mark_tiling_loops): Likewise.
(grid_mark_tiling_parallels_and_loops): Likewise.
(grid_process_kernel_body_copy): Support standalone distribute
constructs.
(grid_attempt_target_gridification): New grid variable holding overall
gridification state.  Support standalone distribute constructs and
collapse clauses.
* 

Re: [PATCH] Propagate cv qualifications in variant_alternative

2016-11-22 Thread Jonathan Wakely

On 21/11/16 22:46 -0800, Tim Shen wrote:

PR libstdc++/78441
* include/std/variant: Propagate cv qualifications to types returned
by variant_alternative.
* testsuite/20_util/variant/compile.cc: Tests.


OK for trunk, thanks.



  1   2   >