Re: [PATCH v2] combine: Improve change_zero_ext, call simplify_set afterwards.

2016-12-22 Thread Georg-Johann Lay

On 21.12.2016 18:46, Segher Boessenkool wrote:

On Wed, Dec 21, 2016 at 01:58:18PM +0100, Georg-Johann Lay wrote:

$ avr-gcc
/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c -S
-O1 -mmcu=avr4 -S -v

/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c: In
function 'yasm_lc3b__parse_insn':
/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c:19:1:
error: insn does not satisfy its constraints:
 }
 ^
(jump_insn 58 98 59 9 (set (pc)
(if_then_else (eq (and:HI (reg:HI 31 r31)
(const_int 1 [0x1]))
(const_int 0 [0]))
(label_ref 70)
(pc)))
"/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c":11
415 {*sbrx_and_branchhi}
 (int_list:REG_BR_PROB 375 (nil))
 -> 70)




Combine comes up with the following insn:
(jump_insn 58 57 59 7 (set (pc)
(if_then_else (eq (and:HI (subreg:HI (mem:QI (reg/v/f:HI 75 [
operands ]) [1 *operands_17(D)+0 S1 A8]) 0)
(const_int 1 [0x1]))
(const_int 0 [0]))
(label_ref 70)
(pc)))
"/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c":11
415 {*sbrx_and_branchhi}
 (int_list:REG_BR_PROB 375 (nil))
 -> 70)

which cannot be correct because avr.md::*sbrx_and_branchhi reads:

(define_insn "*sbrx_and_branch"
  [(set (pc)
(if_then_else
 (match_operator 0 "eqne_operator"
 [(and:QISI
   (match_operand:QISI 1 "register_operand" "r")
   (match_operand:QISI 2 "single_one_operand" "n"))
  (const_int 0)])
 (label_ref (match_operand 3 "" ""))
 (pc)))]
  "" { ... } ...)

Hence we have a memory operand (subreg of mem)) where only a register is
allowed.  Reg alloc then reloads the mem:QI into R31, but R31 is the
last hard reg, i.e. R31 cannot hold HImode.


If you don't have instruction scheduling subregs of mem are allowed (and
are counted as registers).  Combine asks recog, and it think this is fine.

Why does reload use r31 here?  Why does it think that is valid for HImode?


I can't tell you, all I'm seeing bunch of new FAILs after r243578.

For a simpler test case like the following

unsigned u;
int volatile v;

void btest1 (void)
{
if (u & 1)
v = 0;
v = 0;
}

the combine pattern is also used but the register pressure is smaller.
After quite some spills, reload then comes up with

(insn 18 7 8 2 (set (reg:QI 24 r24)
(mem/c:QI (symbol_ref:HI ("u")  ) [1 
u+0 S1 A8])) "testbit.c":6 56 {movqi_insn}

 (nil))
(jump_insn 8 18 9 2 (set (pc)
(if_then_else (eq (and:HI (reg:HI 24 r24)
(const_int 1 [0x1]))
(const_int 0 [0]))
(label_ref 11)
(pc))) "testbit.c":6 415 {*sbrx_and_branchhi}
 (int_list:REG_BR_PROB 4600 (nil))
 -> 11)

i.e. the paradoxical subreg could be resolved somehow and R25 is
uninitialized.  It this the purpose of the combine change to come
up with uninitialized regs because it is known that just the
initialized parts are used by the code?

Even if subregs are allowed, paradoxical subregs look really odd here.

The ICE'ing test case, reload comes up with

(insn 97 57 98 9 (set (reg:HI 30 r30)
(reg/v/f:HI 20 r20 [orig:75 operands ] [75])) "pr26833.c":11 68 
{*movhi}

 (nil))
(insn 98 97 58 9 (set (reg:QI 31 r31)
(mem:QI (reg:HI 30 r30) [1 *operands_17(D)+0 S1 A8])) 
"pr26833.c":11 56 {movqi_insn}

 (nil))
(jump_insn 58 98 59 9 (set (pc)
(if_then_else (eq (and:HI (reg:HI 31 r31)
(const_int 1 [0x1]))
(const_int 0 [0]))
(label_ref 70)
(pc))) "pr26833.c":11 415 {*sbrx_and_branchhi}
 (int_list:REG_BR_PROB 375 (nil))
 -> 70)

insn 98 is valid but overrides parts of HI:30 set by insn 97.

As it appears, the QI memory is loaded to R31 which is valid, and
then reload drops in that register as replacement for the paradoxical
subreg and comes up with HI:31 in insn 58.

Actually a zero-extend would be needed, does it?

As far as I know, reload generates only a very limited subset of insns,
namely additions to compute frame addresses and mov insns.  But
reload is not supposed to generate extends or other arithmetic.

If the purpose of the change to combine.c was to optimize code,
then it would be preferred to avoid HI altogether and just use QImode.
avr.md has a similar insn for QI, so that would really give a
reasonable match.

Johann



[gimplefe] change dump format of realpart_expr and imagpart_expr with TDF_GIMPLE

2016-12-22 Thread Prathamesh Kulkarni
Hi,
For TDF_GIMPLE, the attached patch changes dump
format of realpart_expr from REALPART_EXPR to __real var
since the latter form is accepted by gimplefe. Similarly for imagpart_expr.
Is this OK after bootstrap+test ?

Thanks,
Prathamesh
diff --git a/gcc/testsuite/gcc.dg/gimplefe-20.c 
b/gcc/testsuite/gcc.dg/gimplefe-20.c
new file mode 100644
index 000..99b3180
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-20.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */
+
+_Complex a;
+
+double __GIMPLE() f()
+{
+  double t1;
+  double t2;
+  double _1;
+
+bb1:
+  t1_2 = __real a;
+  t2_3 = __imag a;
+  _1 = t1_2 + t2_3;
+  return _1;
+}
+
+/* { dg-final { scan-tree-dump "__real a" "ssa" } } */
+/* { dg-final { scan-tree-dump "__imag a" "ssa" } } */
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 5b3e23e..40c0bc6 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2451,15 +2451,31 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, int flags,
   break;
 
 case REALPART_EXPR:
-  pp_string (pp, "REALPART_EXPR <");
-  dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-  pp_greater (pp);
+  if (flags & TDF_GIMPLE)
+   {
+ pp_string (pp, "__real ");
+ dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+   }
+  else
+   {
+ pp_string (pp, "REALPART_EXPR <");
+ dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+ pp_greater (pp);
+   }
   break;
 
 case IMAGPART_EXPR:
-  pp_string (pp, "IMAGPART_EXPR <");
-  dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-  pp_greater (pp);
+  if (flags & TDF_GIMPLE)
+   {
+ pp_string (pp, "__imag ");
+ dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+   }
+  else
+   {
+ pp_string (pp, "IMAGPART_EXPR <");
+ dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+ pp_greater (pp);
+   }
   break;
 
 case VA_ARG_EXPR:


[committed] Add two new -Wnonnull tests

2016-12-22 Thread Jakub Jelinek
Hi!

I've committed following patch as obvious to add two tests that were
discussed in the recent thread.

2016-12-22  Jakub Jelinek  

PR middle-end/78858
* c-c++-common/ubsan/pr78858.c: New test.
* gcc.dg/nonnull-5.c: New test.

--- gcc/testsuite/c-c++-common/ubsan/pr78858.c.jj   2016-12-22 
12:43:45.347067480 +0100
+++ gcc/testsuite/c-c++-common/ubsan/pr78858.c  2016-12-22 12:43:34.0 
+0100
@@ -0,0 +1,10 @@
+/* PR middle-end/78858 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -Wnonnull" } */
+
+void
+foo (char **x, const char *y)
+{
+  *x = (char *) __builtin_malloc (__builtin_strlen (y) + 1);   /* { dg-bogus 
"argument 1 null where non-null expected" } */
+  __builtin_strcpy (*x, y);
+}
--- gcc/testsuite/gcc.dg/nonnull-5.c.jj 2016-12-22 12:34:34.875208653 +0100
+++ gcc/testsuite/gcc.dg/nonnull-5.c2016-12-22 12:34:27.924298825 +0100
@@ -0,0 +1,11 @@
+/* Reduced from https://sourceware.org/bugzilla/show_bug.cgi?id=20978 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wnonnull" } */
+
+int
+foo (const char *name)
+{
+  if (name)
+return 6;
+  return __builtin_strlen (name);  /* { dg-warning "argument 1 null where 
non-null expected" } */
+}

Jakub


[testsuite,patch,committed] ad PR52641: More case for int32plus, size32plus, etc.

2016-12-22 Thread Georg-Johann Lay

http://gcc.gnu.org/r243885

Committed this addition of restrictions, mostly for 16-bit int or
16-bit size targtes.

Johann


gcc/testsuite/
PR testsuite/52641
* gcc.dg/pr35258.c (main) : Use an integer value that has
at least a size of 4.
* gcc.dg/Walloca-1.c (foo1): Use alloca with 3 (instead of 9).
* gcc.dg/graphite/pr46185.c: Require int32plus, size32plus.
* gcc.dg/graphite/isl-ast-op-select.c: Same.
* gcc.dg/graphite/pr42205-1.c: Require int32plus.
* gcc.dg/graphite/pr42221.c: Same.
* gcc.dg/tree-ssa/pr65136.c: Same.
* gcc.dg/tree-ssa/sra-20.c: Same.
* gcc.dg/graphite/scop-0.c: Require size32plus.
* gcc.dg/graphite/scop-22.c: Same.
* gcc.dg/graphite/scop-3.c: Same.
* gcc.dg/graphite/scop-dsyr2k.c: Same.
* gcc.dg/graphite/scop-dsyrk.c: Same.
* gcc.dg/graphite/scop-mvt.c: Same.
* gcc.dg/graphite/scop-sor.c: Same.
* gcc.dg/tree-ssa/pr68529-3.c: Same.
* gcc.dg/tree-ssa/pr66449.c [long != pointer]: Use intptr_t if.
* gcc.dg/tree-ssa/pr70919.c [int <= 2]: Use 32-bit int as needed.
* gcc.dg/tree-ssa/pr71408.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-8.c (f2) [long != pointer]: Use
uintptr_t instead of long for int representation of address.
* gcc.dg/tree-ssa/tailcall-7-run.c: Require trampolines.

Index: gcc.dg/Walloca-1.c
===
--- gcc.dg/Walloca-1.c	(revision 243884)
+++ gcc.dg/Walloca-1.c	(working copy)
@@ -27,7 +27,7 @@ void foo1 (size_t len, size_t len2, size
   // { dg-warning "unbounded use of 'alloca'" "" { target { ! lp64 } } 26 }
   useit (s);
 
-  s = alloca(9);		/* { dg-warning "is too large" } */
+  s = alloca (3);		/* { dg-warning "is too large" } */
   useit (s);
 
   if (len < 2000)
Index: gcc.dg/graphite/isl-ast-op-select.c
===
--- gcc.dg/graphite/isl-ast-op-select.c	(revision 243884)
+++ gcc.dg/graphite/isl-ast-op-select.c	(working copy)
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target size32plus } */
+/* { dg-require-effective-target int32plus } */
 /* { dg-options "-O2 -floop-nest-optimize" } */
 
 static void kernel_gemm(int ni, int nj, int nk, double alpha, double beta, double C[1024][1024], double A[1024][1024], double B[1024][1024])
Index: gcc.dg/graphite/pr42205-1.c
===
--- gcc.dg/graphite/pr42205-1.c	(revision 243884)
+++ gcc.dg/graphite/pr42205-1.c	(working copy)
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target int32plus } */
 /* { dg-options "-O1 -ffast-math -floop-interchange" } */
 
 int adler32(int adler, char *buf, int n)
Index: gcc.dg/graphite/pr42221.c
===
--- gcc.dg/graphite/pr42221.c	(revision 243884)
+++ gcc.dg/graphite/pr42221.c	(working copy)
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target int32plus } */
 /* { dg-options "-Os -fgraphite-identity" } */
 
 static void b2w(unsigned int *out, const unsigned char *in, unsigned int len)
Index: gcc.dg/graphite/pr46185.c
===
--- gcc.dg/graphite/pr46185.c	(revision 243884)
+++ gcc.dg/graphite/pr46185.c	(working copy)
@@ -1,4 +1,6 @@
 /* { dg-do run } */
+/* { dg-require-effective-target size32plus } */
+/* { dg-require-effective-target int32plus } */
 /* { dg-options "-O2 -floop-interchange -ffast-math -fno-ipa-cp" } */
 
 #define DEBUG 0
Index: gcc.dg/graphite/scop-0.c
===
--- gcc.dg/graphite/scop-0.c	(revision 243884)
+++ gcc.dg/graphite/scop-0.c	(working copy)
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target size32plus } */
 int foo (void);
 void bar (void);
 
Index: gcc.dg/graphite/scop-22.c
===
--- gcc.dg/graphite/scop-22.c	(revision 243884)
+++ gcc.dg/graphite/scop-22.c	(working copy)
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target size32plus } */
 double u[1782225];
 
 void foo(int N, int *res)
Index: gcc.dg/graphite/scop-3.c
===
--- gcc.dg/graphite/scop-3.c	(revision 243884)
+++ gcc.dg/graphite/scop-3.c	(working copy)
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target size32plus } */
 int toto()
 {
   int i, j, k;
Index: gcc.dg/graphite/scop-dsyr2k.c
===
--- gcc.dg/graphite/scop-dsyr2k.c	(revision 243884)
+++ gcc.dg/graphite/scop-dsyr2k.c	(working copy)
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target size32plus } */
 #define NMAX 3000
 
 static double a[NMAX][NMAX], b[NMAX][NMAX], c[NMAX][NMAX];
Index: gcc.dg/graphite/scop-dsyrk.c
===
--- gcc.dg/graphite/scop-dsyrk.c	(revision 243884)

Re: [PATCH] varasm: Propagate litpool decl alignment to generated RTX.

2016-12-22 Thread Andreas Krebbel
On 12/20/2016 11:38 AM, Richard Biener wrote:
> On Fri, Dec 16, 2016 at 9:29 PM, Andreas Krebbel
>  wrote:
>> When pushing a value into the literal pool the resulting decl might
>> get a higher alignment than the original expression depending on how a
>> target defines CONSTANT_ALIGNMENT.  Generating an RTX for the constant
>> pool access we currently use the alignment from the original
>> expression.  Changed with the attached patch.
> 
> And it might be even smaller alignment...  or do we not allow that?
I did assume that this is not supposed to happen. Adding an assertion 
triggering in that case
survived bootstrap and testsuite. s390x only. It basically boils down to 
whether align_variable and
set_mem_attributes/get_object_alignment come to different conclusions about the 
alignment starting
at either the var decl or the original expression.

...
>> Bootstrapped and regtested on x86_64 and s390x.
>>
>> No regressions.
>>
>> Ok?
> 
> Ok.
> 
> Richard.

Ok for GCC 6 branch as well?

-Andreas-




[PATCH] Fix tree-optimization/78886.

2016-12-22 Thread Martin Liška
Patch is pre-approved by Jakub. I guess the same patch can be install to both
release branches after it finishes regression tests, right?

Thanks,
Martin
>From b0efc394e047ebc25386f8cb99f595b8d4c7f83a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 21 Dec 2016 16:21:45 +0100
Subject: [PATCH] Fix tree-optimization/78886.

gcc/testsuite/ChangeLog:

2016-12-22  Martin Liska  

	PR tree-optimization/78886
	* gcc.dg/tree-ssa/pr78886.c: New test.

gcc/ChangeLog:

2016-12-22  Martin Liska  

	PR tree-optimization/78886
	* tree-ssa-strlen.c (handle_builtin_malloc): Return when LHS
	is equal to NULL.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr78886.c | 10 ++
 gcc/tree-ssa-strlen.c   |  3 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr78886.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr78886.c b/gcc/testsuite/gcc.dg/tree-ssa/pr78886.c
new file mode 100644
index 000..97799301547
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr78886.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+void *malloc(unsigned long x);
+
+void foo(void)
+{
+ volatile int i;
+ malloc(1);
+ i;
+}
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 67075f07e29..4a05725ca7c 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -1869,6 +1869,9 @@ handle_builtin_malloc (enum built_in_function bcode, gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   tree lhs = gimple_call_lhs (stmt);
+  if (lhs == NULL_TREE)
+return;
+
   gcc_assert (get_stridx (lhs) == 0);
   int idx = new_stridx (lhs);
   tree length = NULL_TREE;
-- 
2.11.0



Re: [PATCH] Fix tree-optimization/78886.

2016-12-22 Thread Jakub Jelinek
On Thu, Dec 22, 2016 at 02:08:35PM +0100, Martin Liška wrote:
> Patch is pre-approved by Jakub. I guess the same patch can be install to both
> release branches after it finishes regression tests, right?

Yes, thanks.

> >From b0efc394e047ebc25386f8cb99f595b8d4c7f83a Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 21 Dec 2016 16:21:45 +0100
> Subject: [PATCH] Fix tree-optimization/78886.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-12-22  Martin Liska  
> 
>   PR tree-optimization/78886
>   * gcc.dg/tree-ssa/pr78886.c: New test.
> 
> gcc/ChangeLog:
> 
> 2016-12-22  Martin Liska  
> 
>   PR tree-optimization/78886
>   * tree-ssa-strlen.c (handle_builtin_malloc): Return when LHS
>   is equal to NULL.

Jakub


Re: [gimplefe] change dump format of realpart_expr and imagpart_expr with TDF_GIMPLE

2016-12-22 Thread Richard Biener
On December 22, 2016 12:27:38 PM GMT+01:00, Prathamesh Kulkarni 
 wrote:
>Hi,
>For TDF_GIMPLE, the attached patch changes dump
>format of realpart_expr from REALPART_EXPR to __real var
>since the latter form is accepted by gimplefe. Similarly for
>imagpart_expr.
>Is this OK after bootstrap+test ?

OK.

Richard.

>Thanks,
>Prathamesh




Re: [PATCH, gcc/MIPS] Add options to disable/enable madd.fmt/msub.fmt instructions

2016-12-22 Thread Yunqiang Su

> 在 2016年12月22日,13:11,Paul Hua  写道:
> 
> Hi,
> 
>> +On MIPS targets, set the @option{-mno-unfused-madd4} option by default.
>> +On some platform, like Loongson 3A/3B 1000/2000/3000, madd.fmt/msub.fmt is
>> +broken, which may which may generate wrong calculator result.
> 
> The Loongson 3A/3B 1000/2000/3000 madd.fmt/msub.fmt are fused madd 
> instructions.
> Can you try this patch:
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00255.html .
> 
> Is this patch testing on trunk? I build the trunk failure for a long time.

I don’t think your patch is correct.
No matter Loongson’s madd.fmt are fused or unfused, they give wrong result.
So for Loongson, madd.fmt should be disabled at all.
you can test it with:

#include 

double a = 0.6;
double b = 0.4;
double c = 0.6;
double d = 0.4;

int main(void)
{
double x = a * b - c * d;
printf("%le\n", x);
return 0;
}

[PATCH] Implement no_sanitize function attribute

2016-12-22 Thread Martin Liška
Hello.

As I previously agreed with Jakub, I prepared patch which adds
no_sanitize function attribute (same what clang support).

That encompasses following changes:
1) all no_sanitize_* function attributes are parsed and stored to 
no_sanitize_flags
in DECL_ATTRIBUTES
2) instead of flag_sanitize & X, let's call sanitize_flags_p (X), where the 
DECL_ATTRIBUTES
are checked within the function
3) I prepared many test-cases which test every (almost) single sub-option of 
UBSAN and all
functions are decorated with no_sanitize attribute disabling all sanitization 
except the one
tested in a particular test
4) documentation entry is introduced

Misc changes:
a) I would like to rename SANITIZE_NONDEFAULT to SANITIZE_UNDEFINED_NONDEFAULT
b) Documentation fix for -sanitize=bounds-strict is added

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Martin
>From 046ce41931ea1d4cb51c14b74f2bcddc2cb4c6ca Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 21 Dec 2016 11:13:34 +0100
Subject: [PATCH] Implement no_sanitize function attribute

gcc/cp/ChangeLog:

2016-12-22  Martin Liska  

	* class.c (build_base_path): Use sanitize_flags_p predicate.
	* cp-gimplify.c (cp_genericize_r): Likewise.
	(cp_genericize_tree): Likewise.
	(cp_genericize): Likewise.
	* cp-ubsan.c (cp_ubsan_instrument_vptr_p): Likewise.
	* decl.c (compute_array_index_type): Likewise.
	(start_preparsed_function): Likewise.
	* decl2.c (one_static_initialization_or_destruction): Likewise.
	* init.c (build_vec_init): Likewise.
	* lambda.c (maybe_add_lambda_conv_op): Use
	add_no_sanitize_value.
	* typeck.c (cp_build_binary_op): Use sanitize_flags_p.
	(build_static_cast_1): Likewise.

gcc/c-family/ChangeLog:

2016-12-22  Martin Liska  

	* c-attribs.c (add_no_sanitize_value): New function.
	(handle_no_sanitize_attribute): Likewise.
	(handle_no_sanitize_address_attribute): Use
	add_no_sanitize_value.
	(handle_no_sanitize_thread_attribute): New function.
	(handle_no_address_safety_analysis_attribute): Use
	add_no_sanitize_value.
	(handle_no_sanitize_undefined_attribute): Likewise.
	* c-common.h (add_no_sanitize_value): Declare.
	* c-ubsan.c (ubsan_instrument_division): Use sanitize_flags_p.
	(ubsan_instrument_shift): Likewise.
	(ubsan_instrument_bounds): Likewise.
	(ubsan_maybe_instrument_array_ref): Likewise.
	(ubsan_maybe_instrument_reference_or_call): Likewise.
	* c-ubsan.h (do_ubsan_in_current_function): Remove declaration.

gcc/testsuite/ChangeLog:

2016-12-22  Martin Liska  

	* c-c++-common/ubsan/align-10.c: New test.
	* c-c++-common/ubsan/attrib-5.c: New test.
	* c-c++-common/ubsan/attrib-6.c: New test.
	* c-c++-common/ubsan/bounds-14.c: New test.
	* c-c++-common/ubsan/div-by-zero-8.c: New test.
	* c-c++-common/ubsan/float-cast-overflow-11.c: New test.
	* c-c++-common/ubsan/float-div-by-zero-2.c: New test.
	* c-c++-common/ubsan/load-bool-enum-2.c: New test.
	* c-c++-common/ubsan/nonnull-6.c: New test.
	* c-c++-common/ubsan/null-12.c: New test.
	* c-c++-common/ubsan/object-size-11.c: New test.
	* c-c++-common/ubsan/overflow-3.c: New test.
	* c-c++-common/ubsan/unreachable-5.c: New test.
	* c-c++-common/ubsan/vla-5.c: New test.
	* g++.dg/ubsan/return-8.C: New test.
	* g++.dg/ubsan/vptr-12.C: New test.
	* gcc.dg/ubsan/bounds-4.c: New test.
	* gcc.dg/ubsan/c99-shift-7.c: New test.
	* gcc.dg/ubsan/c99-shift-8.c: New test.

gcc/ChangeLog:

2016-12-22  Martin Liska  

	* asan.c (asan_sanitize_stack_p): Use sanitize_flags_p.
	(gate_asan): Likewise.
	* asan.h (asan_no_sanitize_address_p): Remove.
	* builtins.def: Use renamed SANITIZE_UNDEFINED_NONDEFAULT.
	* common.opt: Likewise.
	* convert.c (convert_to_integer_1): Use sanitize_flags_p.
	* doc/extend.texi: Document the new attribute.
	* doc/invoke.texi: Improve documentation of
	-sanitize=bounds-strict.
	* flag-types.h (enum sanitize_code): Rename
	SANITIZE_NONDEFAULT to SANITIZE_UNDEFINED_NONDEFAULT.
	* gcc.c (sanitize_spec_function): Use the renamed enum value.
	* gimple-fold.c (optimize_atomic_compare_exchange_p): Use
	sanitize_flags_p.
	* gimplify.c (gimplify_decl_expr): Likewise.
	(gimplify_function_tree): Likewise.
	* ipa-inline.c (sanitize_attrs_match_for_inline_p): Likewise.
	* opts.c (parse_no_sanitize_attribute): New function.
	(common_handle_option): Use the renamed enum value.
	* opts.h (parse_no_sanitize_attribute): Declare.
	* tree.c (sanitize_flags_p): New function.
	* tree.h (sanitize_flags_p): New declaration.
	* tsan.c (gate): Use sanitize_flags_p.
	* ubsan.c (ubsan_expand_null_ifn): Likewise.
	(instrument_mem_ref): Likewise.
	(instrument_bool_enum_load): Likewise.
	(do_ubsan_in_current_function): Remove.
	(pass_ubsan::execute): Use sanitize_flags_p.
	* ubsan.h (do_ubsan_in_current_function): Remove.

gcc/c/ChangeLog:

2016-12-22  Martin Liska  

	* c-convert.c (convert): Use sanitize_flags_p.
	* c-decl.c (grokdeclarator): Likewise.
	* c-typeck.c (convert_for_assignment): Likewise.
	(c_finish_return): Likewise.
	(build_binary_op): Likewise.
---
 gcc/asan.c  

Re: [PATCH v2] combine: Improve change_zero_ext, call simplify_set afterwards.

2016-12-22 Thread Dominik Vogt
On Thu, Dec 22, 2016 at 12:00:37PM +0100, Georg-Johann Lay wrote:
> On 21.12.2016 18:46, Segher Boessenkool wrote:
> >On Wed, Dec 21, 2016 at 01:58:18PM +0100, Georg-Johann Lay wrote:
> >>$ avr-gcc
> >>/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c -S
> >>-O1 -mmcu=avr4 -S -v
> >>
> >>/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c: In
> >>function 'yasm_lc3b__parse_insn':
> >>/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c:19:1:
> >>error: insn does not satisfy its constraints:
> >> }
> >> ^
> >>(jump_insn 58 98 59 9 (set (pc)
> >>(if_then_else (eq (and:HI (reg:HI 31 r31)
> >>(const_int 1 [0x1]))
> >>(const_int 0 [0]))
> >>(label_ref 70)
> >>(pc)))
> >>"/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c":11
> >>415 {*sbrx_and_branchhi}
> >> (int_list:REG_BR_PROB 375 (nil))
> >> -> 70)
> >
> >
> >>Combine comes up with the following insn:
> >>(jump_insn 58 57 59 7 (set (pc)
> >>(if_then_else (eq (and:HI (subreg:HI (mem:QI (reg/v/f:HI 75 [
> >>operands ]) [1 *operands_17(D)+0 S1 A8]) 0)
> >>(const_int 1 [0x1]))
> >>(const_int 0 [0]))
> >>(label_ref 70)
> >>(pc)))
> >>"/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c":11
> >>415 {*sbrx_and_branchhi}
> >> (int_list:REG_BR_PROB 375 (nil))
> >> -> 70)
> >>
> >>which cannot be correct because avr.md::*sbrx_and_branchhi reads:
> >>
> >>(define_insn "*sbrx_and_branch"
> >>  [(set (pc)
> >>(if_then_else
> >> (match_operator 0 "eqne_operator"
> >> [(and:QISI
> >>   (match_operand:QISI 1 "register_operand" "r")
> >>   (match_operand:QISI 2 "single_one_operand" "n"))
> >>  (const_int 0)])
> >> (label_ref (match_operand 3 "" ""))
> >> (pc)))]
> >>  "" { ... } ...)
> >>
> >>Hence we have a memory operand (subreg of mem)) where only a register is
> >>allowed.  Reg alloc then reloads the mem:QI into R31, but R31 is the
> >>last hard reg, i.e. R31 cannot hold HImode.
> >
> >If you don't have instruction scheduling subregs of mem are allowed (and
> >are counted as registers).  Combine asks recog, and it think this is fine.
> >
> >Why does reload use r31 here?  Why does it think that is valid for HImode?
> 
> I can't tell you, all I'm seeing bunch of new FAILs after r243578.
> For a simpler test case like the following
> 
> unsigned u;
> int volatile v;
> 
> void btest1 (void)
> {
> if (u & 1)
> v = 0;
> v = 0;
> }
> 
> the combine pattern is also used but the register pressure is smaller.
> After quite some spills, reload then comes up with
> 
> (insn 18 7 8 2 (set (reg:QI 24 r24)
> (mem/c:QI (symbol_ref:HI ("u")  )
> [1 u+0 S1 A8])) "testbit.c":6 56 {movqi_insn}
>  (nil))
> (jump_insn 8 18 9 2 (set (pc)
> (if_then_else (eq (and:HI (reg:HI 24 r24)
> (const_int 1 [0x1]))
> (const_int 0 [0]))
> (label_ref 11)
> (pc))) "testbit.c":6 415 {*sbrx_and_branchhi}
>  (int_list:REG_BR_PROB 4600 (nil))
>  -> 11)
> 
> i.e. the paradoxical subreg could be resolved somehow and R25 is
> uninitialized.  It this the purpose of the combine change to come
> up with uninitialized regs because it is known that just the
> initialized parts are used by the code?

If a

  (zero_extract (foo) (const_int) (const_int)

does not match in combine, it calls change_zero_ext to replace it
with

  (and (lshiftrt (foo) (const_int)) (const_int)

(The zero_extract expression is generated by combine for insn
55+56->57.)  Prior to the patch, this was done only if the
zero_extract has the same mode as "foo".  With the patch, it also
deals with mode expanding zero_extracts, e.g.

  (zero_extract:DI (foo:SI) (const_int) (const_int))

->

  (and:DI (subreg:DI (lshiftrt:SI (const_int) 0) (const_int))

This is what combine has done:

Trying 55, 56 -> 57:
...
Failed to match this instruction:
(set (reg:HI 78)
(zero_extract:HI (mem:QI (reg/v/f:HI 75 [ operands ]) [1
*operands_17(D)+0 S1 A8])
(const_int 1 [0x1])
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:HI 78)
(and:HI (subreg:HI (mem:QI (reg/v/f:HI 75 [ operands ]) [1
*operands_17(D)+0 S1 A8]) 0)
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (cc0)
(compare (reg:HI 78)
(const_int 0 [0])))
allowing combination of insns 55, 56 and 57

Later it combinded insn 57 -> 58 to the expression that has
triggered the bad register allocation.  This looks all correct to
me.

> Even if subregs are allowed, paradoxical subregs look really odd here.
> 
> The ICE'ing test case, reload comes up with
> 
> (insn 97 57 98 9 (set (reg:HI 30 r30)
> (reg/v/f:HI 20 r20 [orig:75 operands ] [75])) "pr26833.c":11
> 68 {*movhi}

Re: [PATCH v4] Run tests only if the machine supports the instruction set.

2016-12-22 Thread Andreas Krebbel
> gcc/ChangeLog-archlevel
> 
>   * config/s390/s390-c.c (s390_cpu_cpp_builtins_internal): Define
>   __S390_ARCH_LEVEL__.
> gcc/testsuite/ChangeLog-setmem
> 
>   * gcc.target/s390/md/setmem_long-1.c: Use "s390_useable_hw".
>   * gcc.target/s390/md/rXsbg_mode_sXl.c: Likewise.
>   * gcc.target/s390/md/andc-splitter-1.c: Likewise.
>   * gcc.target/s390/md/andc-splitter-2.c: Likewise.
>   * lib/gcc-dg.exp (gcc-dg-runtest): Export torture_current_flags.
>   * gcc.target/s390/s390.exp: Import torture_current_flags.
>   (check_effective_target_s390_useable_hw): New.
>   (check_effective_target_s390_z900_hw): New.
>   (check_effective_target_s390_z990_hw): New.
>   (check_effective_target_s390_z9_ec_hw): New.
>   (check_effective_target_s390_z10_hw): New.
>   (check_effective_target_s390_z196_hw): New.
>   (check_effective_target_s390_zEC12_hw): New.
>   (check_effective_target_s390_z13_hw): New.
>   (check_effective_target_z10_instructions): Removed.
>   (torture tests): Add optimization level without -march=.
>   Reorder torture tests for good cache usage.

Applied. Thanks!

-Andreas-



Re: [PATCH v2] combine: Improve change_zero_ext, call simplify_set afterwards.

2016-12-22 Thread Georg-Johann Lay

On 22.12.2016 15:27, Dominik Vogt wrote:

On Thu, Dec 22, 2016 at 12:00:37PM +0100, Georg-Johann Lay wrote:

On 21.12.2016 18:46, Segher Boessenkool wrote:

On Wed, Dec 21, 2016 at 01:58:18PM +0100, Georg-Johann Lay wrote:

$ avr-gcc
/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c -S
-O1 -mmcu=avr4 -S -v

/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c: In
function 'yasm_lc3b__parse_insn':
/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c:19:1:
error: insn does not satisfy its constraints:
}
^
(jump_insn 58 98 59 9 (set (pc)
   (if_then_else (eq (and:HI (reg:HI 31 r31)
   (const_int 1 [0x1]))
   (const_int 0 [0]))
   (label_ref 70)
   (pc)))
"/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c":11
415 {*sbrx_and_branchhi}
(int_list:REG_BR_PROB 375 (nil))
-> 70)




Combine comes up with the following insn:
(jump_insn 58 57 59 7 (set (pc)
   (if_then_else (eq (and:HI (subreg:HI (mem:QI (reg/v/f:HI 75 [
operands ]) [1 *operands_17(D)+0 S1 A8]) 0)
   (const_int 1 [0x1]))
   (const_int 0 [0]))
   (label_ref 70)
   (pc)))
"/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.c-torture/compile/pr26833.c":11
415 {*sbrx_and_branchhi}
(int_list:REG_BR_PROB 375 (nil))
-> 70)

which cannot be correct because avr.md::*sbrx_and_branchhi reads:

(define_insn "*sbrx_and_branch"
 [(set (pc)
   (if_then_else
(match_operator 0 "eqne_operator"
[(and:QISI
  (match_operand:QISI 1 "register_operand" "r")
  (match_operand:QISI 2 "single_one_operand" "n"))
 (const_int 0)])
(label_ref (match_operand 3 "" ""))
(pc)))]
 "" { ... } ...)

Hence we have a memory operand (subreg of mem)) where only a register is
allowed.  Reg alloc then reloads the mem:QI into R31, but R31 is the
last hard reg, i.e. R31 cannot hold HImode.


If you don't have instruction scheduling subregs of mem are allowed (and
are counted as registers).  Combine asks recog, and it think this is fine.

Why does reload use r31 here?  Why does it think that is valid for HImode?


I can't tell you, all I'm seeing bunch of new FAILs after r243578.
For a simpler test case like the following

unsigned u;
int volatile v;

void btest1 (void)
{
if (u & 1)
v = 0;
v = 0;
}

the combine pattern is also used but the register pressure is smaller.
After quite some spills, reload then comes up with

(insn 18 7 8 2 (set (reg:QI 24 r24)
(mem/c:QI (symbol_ref:HI ("u")  )
[1 u+0 S1 A8])) "testbit.c":6 56 {movqi_insn}
 (nil))
(jump_insn 8 18 9 2 (set (pc)
(if_then_else (eq (and:HI (reg:HI 24 r24)
(const_int 1 [0x1]))
(const_int 0 [0]))
(label_ref 11)
(pc))) "testbit.c":6 415 {*sbrx_and_branchhi}
 (int_list:REG_BR_PROB 4600 (nil))
 -> 11)

i.e. the paradoxical subreg could be resolved somehow and R25 is
uninitialized.  It this the purpose of the combine change to come
up with uninitialized regs because it is known that just the
initialized parts are used by the code?


If a

  (zero_extract (foo) (const_int) (const_int)

does not match in combine, it calls change_zero_ext to replace it
with

  (and (lshiftrt (foo) (const_int)) (const_int)

(The zero_extract expression is generated by combine for insn
55+56->57.)  Prior to the patch, this was done only if the
zero_extract has the same mode as "foo".  With the patch, it also
deals with mode expanding zero_extracts, e.g.

  (zero_extract:DI (foo:SI) (const_int) (const_int))

->

  (and:DI (subreg:DI (lshiftrt:SI (const_int) 0) (const_int))

This is what combine has done:

Trying 55, 56 -> 57:
...
Failed to match this instruction:
(set (reg:HI 78)
(zero_extract:HI (mem:QI (reg/v/f:HI 75 [ operands ]) [1
*operands_17(D)+0 S1 A8])
(const_int 1 [0x1])
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:HI 78)
(and:HI (subreg:HI (mem:QI (reg/v/f:HI 75 [ operands ]) [1
*operands_17(D)+0 S1 A8]) 0)
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (cc0)
(compare (reg:HI 78)
(const_int 0 [0])))
allowing combination of insns 55, 56 and 57

Later it combinded insn 57 -> 58 to the expression that has
triggered the bad register allocation.  This looks all correct to
me.


Even if subregs are allowed, paradoxical subregs look really odd here.

The ICE'ing test case, reload comes up with

(insn 97 57 98 9 (set (reg:HI 30 r30)
(reg/v/f:HI 20 r20 [orig:75 operands ] [75])) "pr26833.c":11
68 {*movhi}
 (nil))
(insn 98 97 58 9 (set (reg:QI 31 r31)
(mem:QI (reg:HI 30 r30) [1 *operands_17(D)+0 S1 A8]))
"pr26833.c":11 56 {movqi_insn}
 (nil))
(jump_insn 58 98 59 9 (set (pc)
(if_then_else (eq (and:HI (reg:HI 31 r31)
(const_int 1 [0x1]))

Re: C++ PATCH for c++/42329, P0522 and other template template parm issues

2016-12-22 Thread Jason Merrill
On Wed, Dec 21, 2016 at 4:06 PM, Jason Merrill  wrote:
> The fourth patch fixes 42329, where we were failing to bind a template
> template-parameter to a base class of the argument type.

We need to make sure that the argument is a class before we try this.

Tested x86_64-pc-linux-gnu, applied to trunk.
commit 9c6ba5bf60d35afce20f8bee7e2bb2ac2efe37a4
Author: jason 
Date:   Thu Dec 22 15:19:54 2016 +

PR c++/78898 - ICE on constructor with TTP

PR c++/42329
* pt.c (unify): Don't look for a class template from a non-class.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@243890 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 301eb52..7711546 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20292,7 +20292,8 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
 
   if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM)
{
- if (strict_in & UNIFY_ALLOW_DERIVED)
+ if ((strict_in & UNIFY_ALLOW_DERIVED)
+ && CLASS_TYPE_P (arg))
{
  /* First try to match ARG directly.  */
  tree t = try_class_unification (tparms, targs, parm, arg,
diff --git a/gcc/testsuite/g++.dg/template/ttp30.C 
b/gcc/testsuite/g++.dg/template/ttp30.C
new file mode 100644
index 000..f7b9ce7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp30.C
@@ -0,0 +1,6 @@
+// PR c++/78898
+
+struct A {
+  template  A(T);
+  template  class SmartPtr> A(SmartPtr) { A(0); }
+};


Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Richard Sandiford
Matthew Fortune  writes:
> Sandra Loosemore  writes:
>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
>> > By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
>> > and runtime option -m(no-)unfused-madd4,
>> > to disable generate madd.fmt instructions.
>> 
>> Your patch also needs a documentation change so that the new
>> command-line option is listed in the GCC manual with other MIPS target
>> options.
>
> Any opinions on option names to control this? Is it best to target the 
> specific
> feature that is non-compliant on loongson or apply a general -mfix-loongson
> type option?
>
> I'm not sure I have a strong opinion either way but there do seem to be
> multiple possible variants.

Wasn't sure from this thread whether Loongson simply had a fused
implementation (without intermediate rounding) or whether the
instructions gave numerically incorrect results for some inputs.
It sounds from a later thread like it's generating incorrect results,
is that right?  If so, then FWIW I agree an -mfix option would be more
consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
integer division result and one of the -mfix-sb1 errata was an incorrect
single-precision float division result.  The latter case could have been
handled by an option to disable DIV.S and DIV.PS, but the -mfix option
gave more control.

If instead the problem is that the instructions are fused then that's
also what the original MIPS 4 parts did, so maybe an option to control
fusedness would make sense.

Thanks,
Richard


Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Yunqiang Su

> 在 2016年12月22日,23:31,Richard Sandiford  写道:
> 
> Matthew Fortune  writes:
>> Sandra Loosemore  writes:
>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
 By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
 and runtime option -m(no-)unfused-madd4,
 to disable generate madd.fmt instructions.
>>> 
>>> Your patch also needs a documentation change so that the new
>>> command-line option is listed in the GCC manual with other MIPS target
>>> options.
>> 
>> Any opinions on option names to control this? Is it best to target the 
>> specific
>> feature that is non-compliant on loongson or apply a general -mfix-loongson
>> type option?
>> 
>> I'm not sure I have a strong opinion either way but there do seem to be
>> multiple possible variants.
> 
> Wasn't sure from this thread whether Loongson simply had a fused
> implementation (without intermediate rounding) or whether the
> instructions gave numerically incorrect results for some inputs.

I test to define ISA_HAS_FUSED_MADD4 true and 
define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
both of the are incorrect (the expect value is 0).

The test case is 

#include 

double a = 0.6;
double b = 0.4;
double c = 0.6;
double d = 0.4;

int main(void)
{
double x = a * b - c * d;
printf("%le\n", x);
return 0;
}


> It sounds from a later thread like it's generating incorrect results,
> is that right?  If so, then FWIW I agree an -mfix option would be more
> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
> integer division result and one of the -mfix-sb1 errata was an incorrect
> single-precision float division result.  The latter case could have been
> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
> gave more control.
> 
> If instead the problem is that the instructions are fused then that's
> also what the original MIPS 4 parts did, so maybe an option to control
> fusedness would make sense.

The result to thread it fused or unfused, is different, while neither of them
is correct.

> 
> Thanks,
> Richard



Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Yunqiang Su

> 在 2016年12月22日,23:48,Yunqiang Su  写道:
> 
>> 
>> 在 2016年12月22日,23:31,Richard Sandiford  写道:
>> 
>> Matthew Fortune  writes:
>>> Sandra Loosemore  writes:
 On 12/21/2016 11:54 AM, Yunqiang Su wrote:
> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
> and runtime option -m(no-)unfused-madd4,
> to disable generate madd.fmt instructions.
 
 Your patch also needs a documentation change so that the new
 command-line option is listed in the GCC manual with other MIPS target
 options.
>>> 
>>> Any opinions on option names to control this? Is it best to target the 
>>> specific
>>> feature that is non-compliant on loongson or apply a general -mfix-loongson
>>> type option?
>>> 
>>> I'm not sure I have a strong opinion either way but there do seem to be
>>> multiple possible variants.
>> 
>> Wasn't sure from this thread whether Loongson simply had a fused
>> implementation (without intermediate rounding) or whether the
>> instructions gave numerically incorrect results for some inputs.
> 
> I test to define ISA_HAS_FUSED_MADD4 true and 
> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
> both of the are incorrect (the expect value is 0).
> 
> The test case is 
> 
> #include 
> 
> double a = 0.6;
> double b = 0.4;
> double c = 0.6;
> double d = 0.4;
> 
> int main(void)
> {
>double x = a * b - c * d;
>printf("%le\n", x);
>return 0;
> }
> 
> 
>> It sounds from a later thread like it's generating incorrect results,
>> is that right?  If so, then FWIW I agree an -mfix option would be more
>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
>> integer division result and one of the -mfix-sb1 errata was an incorrect
>> single-precision float division result.  The latter case could have been
>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
>> gave more control.
>> 
>> If instead the problem is that the instructions are fused then that's
>> also what the original MIPS 4 parts did, so maybe an option to control
>> fusedness would make sense.
> 
> The result to thread it fused or unfused, is different, while neither of them
> is correct.

ohh, the result are same, and neither is correct.
both of them are 1.332268e-17.

> 
>> 
>> Thanks,
>> Richard



Re: [PATCH,rs6000] Fix PR11488 for rs6000 target

2016-12-22 Thread Richard Sandiford
Pat Haugen  writes:
> This patch attempts to fix problems with the first scheduling pass
> creating too much register pressure. It does this by enabling the target
> hook to compute the pressure classes for rs6000 target since the first
> thing I observed while investigating the testcase in the subject PR is
> that IRA was picking NON_SPECIAL_REGS as a pressure class which led to
> the sched-pressure code computing too high of a value for number of regs
> available for pseudos preferring GENERAL_REGS. It also enables
> -fsched-pressure by default, using the 'model' algorithm.
>
> I ran various runs of cpu20006 to determine the set of pressure classes
> and which sched-pressure algorithm to use. Net result is that with these
> patches I see 6 benchmarks improve in the 2.4-6% range but there are
> also a couple 2% degradations which will need follow up in GCC 8. There
> was also one benchmark that showed a much bigger improvement with the
> 'weighted' sched-pressure algorithm that also needs follow up
> ('weighted' was not chosen as default since it showed more
> degradations).

I remember trying this on either Power 8 or Power 7 (not sure which) and
seeing something similar.  I think the problem was that both algorithms
use pressure classes to detect pressure points.  For -mvsx we used
VSX_REGS as a pressure class, so all scalar floating-point operations
would be counted against that class.  Since only FLOAT_REGS can be used
for scalar operations, we'd be overoptimistic about the number of scalar
floating-point values that could be live at once.

I think both algorithms were affected by this.  The difference was that
the "weighted" approach was in general more pessimistic (particularly in
its handling of register deaths) and so was less sensitive to the accuracy
of the pressure calculation.  The "model" approach was more optimistic and
and so needed a fairly accurate pressure calculation.  In this situation it
would overcommit and try to use twice as many floating-point registers as
were available.

I think the fix would be to keep tallies for both FLOAT_REGS and VSX_REGS.
The pressure on FLOAT_REGS would then be the maximum of the FLOAT_REGS tally
and (the VSX_REGS tally - the number of registers outside FLOAT_REGS).
The pressure on VSX_REGS would be the sum of the VSX_REGS and FLOAT_REGS
tallies.  In the end I never had time to try that though.

Thanks,
Richard


Re: Reorganise machmode.h headers

2016-12-22 Thread Richard Sandiford
Jeff Law  writes:
> On 11/16/2016 09:32 AM, Richard Sandiford wrote:
>> Later patches will make machmode.h rely on wide-int.h and the
>> new poly-int.h, so it needs to appear later in the coretypes.h
>> include list.
>>
>> Previously machmode.h included insn-modes.h, which as well as
>> the main mode enum contains configuration information like
>> MAX_BITSIZE_MODE_ANY_INT.  This still needs to come first,
>> since files like wide-int.h depend on the configuration
>> information.
>>
>> Similarly, later patches will make the auto-generated inline
>> mode size functions use poly-int.h, so the patch splits them
>> out into their own header file and includes it after the
>> integer utilities.
>>
>> The patch also makes the generator files include machmode.h
>> via coretypes.h.  Previously they did it by more indirect means.
>>
>> Finally, the patch makes wide-int-print.h available via coretypes.h
>> too.  There didn't seem to be any reason to force only the print
>> routines to be included directly, and it would be painful to extend
>> that approach to the new polynomial integer classes.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> [ This patch is part of the SVE series posted here:
>>   https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]
>>
>> gcc/
>> 2016-11-16  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>>
>>  * Makefile.in (MACHMODE_H): Remove insn-modes.h
>>  (CORETYPES_H): New define.
>>  (MOSTLYCLEANFILES): Add insn-modes-inline.h.
>>  (insn-modes-inline.h, s-modes-inline-h): New rules.
>>  (generated_files): Add insn-modes-inline.h.
>>  (RTL_BASE_H, TREE_CORE_H): Use CORETYPES_H instead of coretypes.h.
>>  (build/gensupport.o, build/print-rtl.o, build/read-md.o): Likewise.
>>  (build/read-rtl.o, build/rtl.o, build/vec.o, build/hash-table.o)
>>  (build/inchash.o, build/gencondmd.o, build/genattr.o): Likewise.
>>  (build/genattr-common.o, build/genattrtab.o, build/genautomata.o)
>>  (build/gencheck.o, build/gencodes.o, build/genconditions.o): Likewise.
>>  (build/genconfig.o, build/genconstants.o, build/genemit.o): Likewise.
>>  (build/genenums.o, build/genextract.o, build/genflags.o): Likewise.
>>  (build/gentarget-def.o, build/genmddeps.o, build/genopinit.o)
>>  (build/genoutput.o, build/genpeep.o, build/genpreds.o): Likewise.
>>  (build/genrecog.o, build/genmddump.o, build/genmatch.o): Likewise.
>>  (build/gencfn-macros.o, build/gcov-iov.o): Likewise.
>>  * coretypes.h: Include everything up to real.h for generators.
>>  Include insn-modes.h first.  Include wide-int-print.h after
>>  wide-int.h.  Include insn-modes-inline.h and then machmode.h.
>>  * machmode.h: Don't include insn-modes.h here.
>>  * function-tests.c: Remove includes of signop.h, machmode.h,
>>  double-int.h and wide-int.h.
>>  * rtl.h: Likewise.
>>  * gcc-rich-location.c: Remove includes of machmode.h, double-int.h
>>  and wide-int.h.
>>  * optc-save-gen.awk: Likewise.
>>  * gencheck.c (BITS_PER_UNIT): Delete dummy definition.
>>  * godump.c: Remove include of wide-int-print.h.
>>  * pretty-print.h: Likewise.
>>  * wide-int-print.cc: Likewise.
>>  * wide-int.cc: Likewise.
>>  * hash-map-tests.c: Remove include of signop.h.
>>  * hash-set-tests.c: Likewise.
>>  * rtl-tests.c: Likewise.
>>  * mkconfig.sh: Remove include of machmode.h.
>>  * genmodes.c (emit_insn_modes_h): Split emission of inline functions
>>  into...
>>  (emit_insn_modes_inline_h): ...this new function.  Emit the code
>>  into an insn-modes-inline.h header file, adding appropriate
>>  include guards and end comments.
>>  (emit_insn_modes_c_header): Remove include of machmode.h.
>>  (emit_min_insn_modes_c_header): Include coretypes.h rather than
>>  machmode.h.
>>  (main): Handle -i flag and call emit_insn_modes_inline_h when
>>  it is passed.
> So I don't see anything here particularly problematical.  My question is 
> whether or not there's anything significant to be gained to moving 
> forward with this kit, assuming the 67 piece kit is not likely to move 
> forward.

Yeah, I agree it's not worth it now that those patches won't go in.

> I do think you'll need some tweaks to the contrib/header-tools which 
> know about the core headers and dependencies.  Hopefully what's in there 
> is easy enough to figure out how to twiddle appropriately.

Thanks for the pointer, will make a note to look at that for GCC 8.

Richard


Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Richard Sandiford
Yunqiang Su  writes:
>> 在 2016年12月22日,23:48,Yunqiang Su  写道:
>> 
>>> 
>>> 在 2016年12月22日,23:31,Richard Sandiford
>>>  写道:
>>> 
>>> Matthew Fortune  writes:
 Sandra Loosemore  writes:
> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
>> and runtime option -m(no-)unfused-madd4,
>> to disable generate madd.fmt instructions.
> 
> Your patch also needs a documentation change so that the new
> command-line option is listed in the GCC manual with other MIPS target
> options.
 
 Any opinions on option names to control this? Is it best to target
 the specific
 feature that is non-compliant on loongson or apply a general -mfix-loongson
 type option?
 
 I'm not sure I have a strong opinion either way but there do seem to be
 multiple possible variants.
>>> 
>>> Wasn't sure from this thread whether Loongson simply had a fused
>>> implementation (without intermediate rounding) or whether the
>>> instructions gave numerically incorrect results for some inputs.
>> 
>> I test to define ISA_HAS_FUSED_MADD4 true and 
>> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
>> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
>> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
>> both of the are incorrect (the expect value is 0).
>> 
>> The test case is 
>> 
>> #include 
>> 
>> double a = 0.6;
>> double b = 0.4;
>> double c = 0.6;
>> double d = 0.4;
>> 
>> int main(void)
>> {
>>double x = a * b - c * d;
>>printf("%le\n", x);
>>return 0;
>> }
>> 
>> 
>>> It sounds from a later thread like it's generating incorrect results,
>>> is that right?  If so, then FWIW I agree an -mfix option would be more
>>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
>>> integer division result and one of the -mfix-sb1 errata was an incorrect
>>> single-precision float division result.  The latter case could have been
>>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
>>> gave more control.
>>> 
>>> If instead the problem is that the instructions are fused then that's
>>> also what the original MIPS 4 parts did, so maybe an option to control
>>> fusedness would make sense.
>> 
>> The result to thread it fused or unfused, is different, while neither of them
>> is correct.
>
> ohh, the result are same, and neither is correct.
> both of them are 1.332268e-17.

That's the expected result for an implementation in which the subtraction
is fused with the first multiplication without intermediate rounding.
The second 0.4 * 0.6 isn't exactly representable and rounds down, to a
value slightly less than 0.24.  Then the fused operation subtracts this
value from the exact result of the first 0.4 * 0.6 (0.24), giving a
value slightly greater than 0.

I see Paul Hua's patch does add Loongson to the list of targets
with a fused implementation (should have checked earlier, sorry).
So I think after that patch we would do the right thing.  (In particular,
-ffp-contract=off would then disable the fusing.)

Thanks,
Richard


[PATCH][x86_64] Enable AVX512 VPOPCNTD/VPOPCNTQ instructions

2016-12-22 Thread Andrew Senkevich
Hi,

this patch enables AVX512 VPOPCNTD/VPOPCNTQ instructions recently
added in Instruction Set Extensions
(https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf).

gcc/
* common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET,
OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET): New.
* config.gcc: Add avx512vpopcntdqintrin.h.
* config/i386/avx512vpopcntdqintrin.h: New.
* config/i386/cpuid.h (bit_AVX512VPOPCNTDQ): New.
* config/i386/i386-builtin-types.def: Add new types.
* config/i386/i386-builtin.def (__builtin_ia32_vpopcountd_v16si,
__builtin_ia32_vpopcountd_v16si_mask,
__builtin_ia32_vpopcountd_v16si_maskz, __builtin_ia32_vpopcountq_v8di,
__builtin_ia32_vpopcountq_v8di_mask,
__builtin_ia32_vpopcountq_v8di_maskz): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVX512VPOPCNTDQ__.
* config/i386/i386.c (ix86_target_string): Add -mavx512vpopcntdq.
(PTA_AVX512VPOPCNTDQ): Define.
* config/i386/i386.h (TARGET_AVX512VPOPCNTDQ,
TARGET_AVX512VPOPCNTDQ_P): Define.
* config/i386/i386.opt: Add mavx512vpopcntdq.
* config/i386/immintrin.h: Include avx512vpopcntdqintrin.h.
* config/i386/sse.md (unspec): Add UNSPEC_VPOPCNTDQ.
(define_insn "vpopcount"): New.
(define_insn "vpopcountv16si_mask"): Ditto.
(define_insn "vpopcountv16si_maskz"): Ditto.
(define_insn "vpopcountv8di_mask"): Ditto.
(define_insn "vpopcountv8di_maskz"): Ditto.
(define_mode_iterator VI_AVX512F): Ditto.

gcc/testsuite/
* g++.dg/other/i386-2.C: Add -mavx512vpopcntdq.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/builtin_target.c: Handle new option.
* gcc.target/i386/funcspec-56.inc: Test new attributes.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: New test.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.

libgcc/
* config/i386/cpuinfo.h (processor_features): Add
FEATURE_AVX512VPOPCNTDQ.
* config/i386/cpuinfo.c (get_available_features): Habdle new
feature.


diff --git a/gcc/common/config/i386/i386-common.c
b/gcc/common/config/i386/i386-common.c
index 98224f5..a425af5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
   (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET)
 #define OPTION_MASK_ISA_AVX5124FMAPS_SET OPTION_MASK_ISA_AVX5124FMAPS
 #define OPTION_MASK_ISA_AVX5124VNNIW_SET OPTION_MASK_ISA_AVX5124VNNIW
+#define OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET OPTION_MASK_ISA_AVX512VPOPCNTDQ
 #define OPTION_MASK_ISA_RTM_SET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
@@ -183,6 +184,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI
 #define OPTION_MASK_ISA_AVX5124FMAPS_UNSET OPTION_MASK_ISA_AVX5124FMAPS
 #define OPTION_MASK_ISA_AVX5124VNNIW_UNSET OPTION_MASK_ISA_AVX5124VNNIW
+#define OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET OPTION_MASK_ISA_AVX512VPOPCNTDQ
 #define OPTION_MASK_ISA_RTM_UNSET OPTION_MASK_ISA_RTM
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
@@ -409,6 +411,8 @@ ix86_handle_option (struct gcc_options *opts,
   opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
   opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
   opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
+  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET;
  }
   return true;

@@ -481,6 +485,21 @@ ix86_handle_option (struct gcc_options *opts,
  }
   return true;

+case OPT_mavx512vpopcntdq:
+  if (value)
+ {
+  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX512VPOPCNTDQ_SET;
+  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
+ }
+  else
+ {
+  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET;
+  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX512VPOPCNTDQ_UNSET;
+ }
+  return true;
+
 case OPT_mavx512dq:
   if (value)
  {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7afbc54..f9e9399 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -375,7 +375,8 @@ i[34567]86-*-*)
avx512vlintrin.h avx512vlbwintrin.h avx512vldqintrin.h
avx512ifmaintrin.h avx512ifmavlintrin.h avx512vbmiintrin.h
avx512vbmivlintrin.h avx5124fmapsintrin.h avx5124vnniwintrin.h
-   clwbintrin

Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Yunqiang Su

> 在 2016年12月23日,00:18,Richard Sandiford  写道:
> 
> Yunqiang Su  writes:
>>> 在 2016年12月22日,23:48,Yunqiang Su  写道:
>>> 
 
 在 2016年12月22日,23:31,Richard Sandiford
  写道:
 
 Matthew Fortune  writes:
> Sandra Loosemore  writes:
>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
>>> By this patch, I add a build-time option ` --with-unfused-madd4=yes/no',
>>> and runtime option -m(no-)unfused-madd4,
>>> to disable generate madd.fmt instructions.
>> 
>> Your patch also needs a documentation change so that the new
>> command-line option is listed in the GCC manual with other MIPS target
>> options.
> 
> Any opinions on option names to control this? Is it best to target
> the specific
> feature that is non-compliant on loongson or apply a general 
> -mfix-loongson
> type option?
> 
> I'm not sure I have a strong opinion either way but there do seem to be
> multiple possible variants.
 
 Wasn't sure from this thread whether Loongson simply had a fused
 implementation (without intermediate rounding) or whether the
 instructions gave numerically incorrect results for some inputs.
>>> 
>>> I test to define ISA_HAS_FUSED_MADD4 true and 
>>> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
>>> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
>>> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
>>> both of the are incorrect (the expect value is 0).
>>> 
>>> The test case is 
>>> 
>>> #include 
>>> 
>>> double a = 0.6;
>>> double b = 0.4;
>>> double c = 0.6;
>>> double d = 0.4;
>>> 
>>> int main(void)
>>> {
>>>   double x = a * b - c * d;
>>>   printf("%le\n", x);
>>>   return 0;
>>> }
>>> 
>>> 
 It sounds from a later thread like it's generating incorrect results,
 is that right?  If so, then FWIW I agree an -mfix option would be more
 consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
 integer division result and one of the -mfix-sb1 errata was an incorrect
 single-precision float division result.  The latter case could have been
 handled by an option to disable DIV.S and DIV.PS, but the -mfix option
 gave more control.
 
 If instead the problem is that the instructions are fused then that's
 also what the original MIPS 4 parts did, so maybe an option to control
 fusedness would make sense.
>>> 
>>> The result to thread it fused or unfused, is different, while neither of 
>>> them
>>> is correct.
>> 
>> ohh, the result are same, and neither is correct.
>> both of them are 1.332268e-17.
> 
> That's the expected result for an implementation in which the subtraction
> is fused with the first multiplication without intermediate rounding.
> The second 0.4 * 0.6 isn't exactly representable and rounds down, to a
> value slightly less than 0.24.  Then the fused operation subtracts this
> value from the exact result of the first 0.4 * 0.6 (0.24), giving a
> value slightly greater than 0.

I see. But I think 1.332268e-17 is too big than expected.
1-e17 for double is totally unacceptable.

> 
> I see Paul Hua's patch does add Loongson to the list of targets
> with a fused implementation (should have checked earlier, sorry).
> So I think after that patch we would do the right thing.  (In particular,
> -ffp-contract=off would then disable the fusing.)

will -ffp-contract=off disable some other optimization?
If so, I don’t think that will be an ideal choice for distributions, like 
Debian. 

> 
> Thanks,
> Richard



Re: [PATCH] PR78879

2016-12-22 Thread Jeff Law

On 12/21/2016 11:08 PM, Yuan, Pengfei wrote:

Hi,

The following patch fixes
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78879

There are some other invocations of unordered_remove in
tree-ssa-threadupdate.c, which may also need to be replaced
with ordered_remove.

Regards,
Yuan, Pengfei


2016-12-22  Yuan Pengfei  

PR middle-end/78879
* tree-ssa-threadupdate.c (mark_threaded_blocks): Replace
unordered_remove with ordered_remove.
How specifically does this fix the problem.  I suspect you're just 
papering over the bug by changing the order of thread processing.


You'll note that when a thread is canceled we call the 
paths.unordered_remove without incrementing i.  AFAICT you're just 
changing order in which paths out of the array.



Jeff



Re: [v3 PATCH] Implement LWG 2842, in_place_t check for optional::optional(U&&) should decay U.

2016-12-22 Thread Jonathan Wakely

On 19/12/16 16:07 +0200, Ville Voutilainen wrote:

Tested on Linux-x64.

The perfect forwarder needs to sfinae out of the way of the in_place_t
signature,
and the in_place_t signature needs to check is_constructible. I also
did some housekeeping
to get rid of the int-pack constraints, because in case of empty packs
it's unclear
what a compiler is supposed to do for them.


OK for trunk.



Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2016-12-22 Thread Martin Liška
On 12/21/2016 09:52 AM, Jakub Jelinek wrote:
> On Tue, Dec 20, 2016 at 12:26:41PM +0100, Martin Liška wrote:
>> Ok, llvm folks are unwilling to accept the new API function, thus I've 
>> decided to come up
>> with approach suggested by Jakub. Briefly, when expanding ASAN_POISON 
>> internal function,
>> we create a new variable (with the same name as the original one). The 
>> variable is poisoned
>> at the location of the ASAN_POISON and all usages just call ASAN_CHECK that 
>> would trigger
>> use-after-scope run-time error. Situation where ASAN_POISON has a LHS is 
>> very rare and
>> is very likely to be a bug. Thus suggested not super-optimized approach 
>> should not be
>> problematic.
> 
> Do you have a testcase for the case where there is a write to the var after
> poison that is then made non-addressable?  use-after-scope-9.c only covers
> the read.
> 
>> I'm not sure about the introduction of 'create_var' function, maybe we would 
>> need some
>> refactoring. Thoughts?
> 
> It doesn't belong to gimple-expr.c and the name is way too generic, we have
> many create var functions already.  And this one is very specialized.
> 
>> 2016-12-19  Martin Liska  
>>
>>  * asan.c (asan_expand_poison_ifn): New function.
>>  * asan.h (asan_expand_poison_ifn):  Likewise.
> 
> Too many spaces.
> 
>> +  tree shadow_var = create_var (TREE_TYPE (poisoned_var),
>> +IDENTIFIER_POINTER (DECL_NAME (var_decl)));
> 
> For the shadow var creation, IMHO you should
> 1) use a hash table, once you add a shadow variable for a certain variable
>for the first time, reuse it for all the other cases; you can have many
>ASAN_POISON () calls for the same underlying variable

Thanks for review.

Done by hash_map.

> 2) as I said, use just a function in sanopt.c for this,
>create_asan_shadow_var or whatever

Also done.

> 3) I think you just want to do copy_node, plus roughly what
>copy_decl_for_dup_finish does (and set DECL_ARTIFICIAL and
>DECL_IGNORED_P) - except that you don't have copy_body_data
>so you can't use it directly (well, you could create copy_body_data
>just for that purpose and set src_fn and dst_fn to current_function_decl
>and the rest to NULL)

I decided to use the function with prepared copy_body_data ;)

> 
> I'd really like to see the storing to poisoned var becoming non-addressable
> in action (if it can ever happen, so it isn't just theoretical) to look at
> what it does.

Well, having following sample:

int
main (int argc, char **argv)
{
  int *ptr = 0;

  {
int a;
ptr = &a;
*ptr = 12345;
  }

  *ptr = 12345;
  return *ptr;
}

Right after rewriting into SSA it looks as follows:

main (int argc, char * * argv)
{
  int a;
  int * ptr;
  int _8;

   [0.00%]:
  a_9 = 12345;
  a_10 = ASAN_POISON ();
  a_11 = 12345;
  _8 = a_11;
  return _8;

}

Thus, I guess it not possible to do a write.

Following v2 can bootstrap on ppc64le-redhat-linux and survives regression 
tests.

Martin

> 
>   Jakub
> 

>From 66fabb9d15ebfb21e25b4fc81bad8deb6877e198 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 19 Dec 2016 15:36:11 +0100
Subject: [PATCH] Speed up use-after-scope (v2): rewrite into SSA

gcc/ChangeLog:

2016-12-22  Martin Liska  

	* asan.c (create_asan_shadow_var): New function.
	(asan_expand_poison_ifn): Likewise.
	* asan.h (asan_expand_poison_ifn): New declaration.
	* internal-fn.c (expand_ASAN_POISON): Likewise.
	* internal-fn.def (ASAN_POISON): New builtin.
	* sanopt.c (pass_sanopt::execute): Expand
	asan_expand_poison_ifn.
	* tree-inline.c (copy_decl_for_dup_finish): Make function
	external.
	* tree-inline.h (copy_decl_for_dup_finish): Likewise.
	* tree-ssa.c (is_asan_mark_p): New function.
	(execute_update_addresses_taken): Rewrite local variables
	(identified just by use-after-scope as addressable) into SSA.

gcc/testsuite/ChangeLog:

2016-12-22  Martin Liska  

	* gcc.dg/asan/use-after-scope-3.c: Add additional flags.
	* gcc.dg/asan/use-after-scope-9.c: Likewise and grep for
	sanopt optimization for ASAN_POISON.
---
 gcc/asan.c| 109 +-
 gcc/asan.h|   2 +
 gcc/internal-fn.c |   7 ++
 gcc/internal-fn.def   |   1 +
 gcc/sanopt.c  |  11 +++
 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c |   1 +
 gcc/testsuite/gcc.dg/asan/use-after-scope-9.c |   2 +
 gcc/tree-inline.c |   2 +-
 gcc/tree-inline.h |   1 +
 gcc/tree-ssa.c|  69 +---
 10 files changed, 193 insertions(+), 12 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 53acff0a2fb..187934ad11b 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -32,8 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "memmodel.h"
 #include "tm_p.h"
+#include "ssa.h"
 #include "stringpo

Re: [v3 PATCH] Implement 2801, Default-constructibility of unique_ptr.

2016-12-22 Thread Jonathan Wakely

On 20/12/16 22:52 +0200, Ville Voutilainen wrote:

diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 56e6ec0..63dff37 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -175,10 +175,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Constructors.

  /// Default constructor, creates a unique_ptr that owns nothing.
+  template >,
+is_default_constructible<_Up>>::value,
+ bool>::type = false>


Instead of repeating this condition half a dozen times, we could put
it in the __uniq_ptr_impl class template and reuse it, as in the
attached patch (and similarly for the unique_ptr specialization).
What do you think?


  constexpr unique_ptr() noexcept
  : _M_t()
-  { static_assert(!is_pointer::value,
-"constructed with null function pointer deleter"); }
+  { }


The bodies of these constructors should be indented now that they're
templates.



--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/default.cc
@@ -0,0 +1,40 @@
+// { dg-do compile { target c++11 } }
+
+// Copyright (C) 2011-2016 Free Software Foundation, Inc.


Is this substantially copied from an existing file, or should it just
be 2016? (Not that it really matters, as I don't think we should have
copyright notices on tests like this at all, but that's something to
worry about another time.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h
index 56e6ec0..f9ab9a6 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -130,6 +130,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	};
 
 public:
+  using _DeleterConstraint = enable_if<
+	__and_<__not_, is_default_constructible<_Dp>>>::value>;
+
   using pointer = typename _Ptr<_Tp, _Dp>::type;
 
   __uniq_ptr_impl() = default;
@@ -152,6 +155,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template >
 class unique_ptr
 {
+  using _DeleterConstraint = __uniq_ptr_impl<_Tp, _Dp>::_DeleterConstraint;
+
   __uniq_ptr_impl<_Tp, _Dp> _M_t;
 
 public:
@@ -175,10 +180,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Constructors.
 
   /// Default constructor, creates a unique_ptr that owns nothing.
-  constexpr unique_ptr() noexcept
-  : _M_t()
-  { static_assert(!is_pointer::value,
-		 "constructed with null function pointer deleter"); }
+  template
+	constexpr unique_ptr() noexcept
+	: _M_t()
+	{ }
 
   /** Takes ownership of a pointer.
*
@@ -186,11 +191,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*
* The deleter will be value-initialized.
*/
-  explicit
-  unique_ptr(pointer __p) noexcept
-  : _M_t(__p)
-  { static_assert(!is_pointer::value,
-		 "constructed with null function pointer deleter"); }
+  template
+	explicit
+	unique_ptr(pointer __p) noexcept
+	: _M_t(__p)
+	{ }
 
   /** Takes ownership of a pointer.
*
@@ -218,7 +223,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  "rvalue deleter bound to reference"); }
 
   /// Creates a unique_ptr that owns nothing.
-  constexpr unique_ptr(nullptr_t) noexcept : unique_ptr() { }
+  template
+	constexpr unique_ptr(nullptr_t) noexcept : unique_ptr() { }
 
   // Move constructors.
 


Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2016-12-22 Thread Jakub Jelinek
On Thu, Dec 22, 2016 at 06:03:50PM +0100, Martin Liška wrote:
> Done by hash_map.

Ok.

> > 3) I think you just want to do copy_node, plus roughly what
> >copy_decl_for_dup_finish does (and set DECL_ARTIFICIAL and
> >DECL_IGNORED_P) - except that you don't have copy_body_data
> >so you can't use it directly (well, you could create copy_body_data
> >just for that purpose and set src_fn and dst_fn to current_function_decl
> >and the rest to NULL)
> 
> I decided to use the function with prepared copy_body_data ;)

Ok.

> > I'd really like to see the storing to poisoned var becoming non-addressable
> > in action (if it can ever happen, so it isn't just theoretical) to look at
> > what it does.
> 
> Well, having following sample:
> 
> int
> main (int argc, char **argv)
> {
>   int *ptr = 0;
> 
>   {
> int a;
> ptr = &a;
> *ptr = 12345;
>   }
> 
>   *ptr = 12345;
>   return *ptr;
> }
> 
> Right after rewriting into SSA it looks as follows:
> 
> main (int argc, char * * argv)
> {
>   int a;
>   int * ptr;
>   int _8;
> 
>[0.00%]:
>   a_9 = 12345;
>   a_10 = ASAN_POISON ();
>   a_11 = 12345;
>   _8 = a_11;
>   return _8;
> 
> }

But we do not want to rewrite into SSA that way, but instead as

main (int argc, char * * argv)
{
  int a;
  int * ptr;
  int _8;

   [0.00%]:
  a_9 = 12345;
  a_10 = ASAN_POISON ();
  ASAN_POISON (a_10);
  a_11 = 12345;
  _8 = a_11;
  return _8;

}

or something similar, so that you can 1) emit a diagnostics at the spot
where the out of scope store happens 2) differentiate between reads from
out of scope var and stores to out of scope var

What we need is to hook into tree-into-ssa.c for this, where a_11 is
created, find out that there is a store to a var that has ASAN_POISON result
as currently active definition.  Something like if we emit ASAN_POISON
for some var, during tree-into-ssa.c if we see a store to that var that we
need to rewrite into SSA pretend there is a read from that var first at
that location and if it is result of ASAN_POISON, emit the additional
stmt.

Jakub


Re: [PATCH v2] combine: Improve change_zero_ext, call simplify_set afterwards.

2016-12-22 Thread Segher Boessenkool
On Thu, Dec 22, 2016 at 04:18:34PM +0100, Georg-Johann Lay wrote:
> >>>If you don't have instruction scheduling subregs of mem are allowed (and
> >>>are counted as registers).  Combine asks recog, and it think this is 
> >>>fine.
> >>>
> >>>Why does reload use r31 here?  Why does it think that is valid for 
> >>>HImode?
> >>
> >>I can't tell you, all I'm seeing bunch of new FAILs after r243578.

The avr port does not define CANNOT_CHANGE_MODE_CLASS.  It probably
should.

> >>i.e. the paradoxical subreg could be resolved somehow and R25 is
> >>uninitialized.  It this the purpose of the combine change to come
> >>up with uninitialized regs because it is known that just the
> >>initialized parts are used by the code?

The purpose of the combine change is to write widening extracts in a
more general form, so that backends for processors that can do such
more general things do not have to write hundreds (literally) extra
patterns for all the cases that could be written as zero_extract.

> >>(insn 98 97 58 9 (set (reg:QI 31 r31)
> >>(mem:QI (reg:HI 30 r30) [1 *operands_17(D)+0 S1 A8]))
> >>"pr26833.c":11 56 {movqi_insn}
> >> (nil))
> >>(jump_insn 58 98 59 9 (set (pc)
> >>(if_then_else (eq (and:HI (reg:HI 31 r31)
> >>(const_int 1 [0x1]))
> >>(const_int 0 [0]))
> >>(label_ref 70)
> >>(pc))) "pr26833.c":11 415 {*sbrx_and_branchhi}
> >> (int_list:REG_BR_PROB 375 (nil))
> >> -> 70)
> >>
> >>insn 98 is valid but overrides parts of HI:30 set by insn 97.
> >>
> >>As it appears, the QI memory is loaded to R31 which is valid, and
> >>then reload drops in that register as replacement for the paradoxical
> >>subreg and comes up with HI:31 in insn 58.
> >>
> >>Actually a zero-extend would be needed, does it?

The AND clears the top bits already.

> >Please correct me if I'm wrong, but I see no bugs in the combined
> >expressions.
> 
> Algebraically it looks correct, but I am unsure if reload is supposed
> to handle such situations.

It is valid RTL.  reload should handle it.

> And I must admit I never saw "paradoxical"
> extractions before; I would expected an extension in that case, not an
> extraction.

An extension _of an extraction_.  Widening extractions are valid (and I
didn't see them until a few weeks ago either, heh -- that's why Dominik
needed this patch ;-) )


Segher


Re: [testsuite,committed] Fix prototype of memset in a test case.

2016-12-22 Thread Richard Sandiford
Georg-Johann Lay  writes:
> One test case used unsigned long for the 3rd parameter of memset, which 
> should be size_t.  This made the test crash for targets where correct 
> parameter passing depends on correct prototypes.
>
> Fixed and committed as obvious.

Catching up on backlog, but... I'm not sure this counts as obvious
if the crash was an ICE.  We should at least fail gracefully.

Thanks,
Richard

>
> Johann
>
>
> gcc/testsuite/
>   * gcc.c-torture/execute/pr30778.c (memset): Use size_t for 3rd
>   parameter in declaration.
>
> Index: gcc.c-torture/execute/pr30778.c
> ===
> --- gcc.c-torture/execute/pr30778.c (revision 242541)
> +++ gcc.c-torture/execute/pr30778.c (working copy)
> @@ -1,4 +1,4 @@
> -extern void *memset (void *, int, unsigned long);
> +extern void *memset (void *, int, __SIZE_TYPE__);
>   extern void abort (void);
>
>   struct reg_stat {


Re: [PATCH v2,rs6000] PR78056: Finish fixing build failure on Power7

2016-12-22 Thread Segher Boessenkool
Hi Kelvin,

On Fri, Dec 16, 2016 at 04:57:12PM -0700, Kelvin Nilsen wrote:
> 2016-12-16  Kelvin Nilsen  
> 
>   PR target/78056
>   * gcc.target/powerpc/pr78056-1.c: New test.
>   * gcc.target/powerpc/pr78056-2.c: New test.
>   * gcc.target/powerpc/pr78056-3.c: New test.
>   * gcc.target/powerpc/pr78056-4.c: New test.
>   * gcc.target/powerpc/pr78056-5.c: New test.
>   * gcc.target/powerpc/pr78056-6.c: New test.
>   * gcc.target/powerpc/pr78056-7.c: New test.
>   * gcc.target/powerpc/pr78056-8.c: New test.
>   * lib/target-supports.exp
>   (check_effective_target_powerpc_popcntb_ok): New procedure to test
>   whether the effective target supports the popcntb instruction.
> 
> gcc/ChangeLog:
> 
> 2016-12-16  Kelvin Nilsen  
> 
>   PR target/78056
>   * doc/sourcebuild.texi (PowerPC-specific attributes): Add
>   documentation of the powerpc_popcntb_ok attribute.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
>   code to issue warning messages if a requested CPU configuration is
>   not supported by the binary (assembler and loader) toolchain.
>   (spe_init_builtins): Add two assertions to prevent ICE if attempt is
>   made to define a built-in function that has been disabled.
>   (paired_init_builtins): Add assertion to prevent ICE if attempt is
>   made to define a built-in function that has been disabled.
>   (altivec_init_builtins): Add comment explaining why definition
>   of the DST built-in functions is not preceded by an assertion
>   check.  Add assertions to prevent ICE if attempts are made to
>   define an altivec predicate or an abs* built-in function that has
>   been disabled.
>   (htm_init_builtins): Add comment explaining why definition of the
>   htm built-in functions is not preceded by an assertion check.

Approved for trunk, please apply.  I don't think things are perfect yet,
but this is a step forward :-)


Segher


Use aligned SSE movs for re-aligned MS ABI pro/epilogues

2016-12-22 Thread Daniel Santos
According to the Microsoft 64-bit ABI specification, registers RDI, RSI 
and XMM6-15 are non-volatile and the stack alignment is 16 bytes.  In 
practice, the Windows implementation appears to not be so picky about 
the 16-byte alignment requirement, probably because it never to save SSE 
registers and instead just never uses them.  This led to a large list 
(https://bugs.winehq.org/show_bug.cgi?id=27680) of Win64 programs 
violating the ABI with impunity, but crashing in Wine until 
force_align_arg_pointer was added to gcc and used in Wine.


Stack re-alignment was originally done prior to int register saves, but 
was moved to after SSE saves in 2010 to better facilitate 
parallelization, and for simplicity's sake, the stack pointer was 
considered invalid after stack re-alignment and SSE movs were emitted 
unaligned relative to the frame pointer.  But now that forced stack 
re-alignment is the new normal for Wine64, it means that it always gets 
the unaligned movs in Wine. This patch set fixes the problem while 
preserving the improved parallelization of int register saves of Richard 
Henderson's patch in 2010.


This patchset is a prerequisite to another I'm still refining that 
out-of-lines these pro/epilogues. I'm still pretty new to this project, 
so I hope I haven't missed anything. (No additional failures in tests.)


Daniel Santos

2016-12-21  Daniel Santos  

* config/i386/i386.h (struct machine_frame_state): New fields
sp_realigned and sp_realigned_offset.

* config/i386/i386.c
(struct ix86_frame): New fields stack_realign_allocate_offset and
stack_realign_offset.
(ix86_compute_frame_layout): Modify re-alignment calculations.
(sp_valid_at, fp_valid_at): New inline functions.
(choose_basereg): New function.
(choose_baseaddr): Add align parameter, use choose_basereg and modify
all callers.
(ix86_emit_save_reg_using_mov, ix86_emit_restore_sse_regs_using_mov):
Use align parameter of choose_baseaddr to generated aligned SSE movs
when possible.
(pro_epilogue_adjust_stack): Modify to track
machine_frame_state::sp_realigned.
(ix86_expand_prologue): Modify stack re-alignment code.
(ix86_emit_leave): Clear machine_frame_state::sp_realigned.
(ix86_expand_epilogue): Modify validity checks of frame and stack
pointers.




C++ PATCH for c++/78906 (ICE with member template variable)

2016-12-22 Thread Jason Merrill
If we're adding the enclosing template args, we also need to switch to
looking at the most general template.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 35fe583ade362f018f095435980424f1244ab92f
Author: Jason Merrill 
Date:   Thu Dec 22 13:27:26 2016 -0500

PR c++/78906 - ICE with member variable template

* pt.c (finish_template_variable): Use most_general_template.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7711546..3fa2ce9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9000,6 +9000,7 @@ finish_template_variable (tree var, tsubst_flags_t 
complain)
   tree tmpl_args = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (templ));
   arglist = add_outermost_template_args (tmpl_args, arglist);
 
+  templ = most_general_template (templ);
   tree parms = DECL_TEMPLATE_PARMS (templ);
   arglist = coerce_innermost_template_parms (parms, arglist, templ, complain,
 /*req_all*/true,
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ54.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ54.C
new file mode 100644
index 000..f52f764
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ54.C
@@ -0,0 +1,13 @@
+// PR c++/78906
+// { dg-do compile { target c++14 } }
+
+template  struct A { static constexpr int digits = 0; };
+template  struct B {
+  template ::digits>
+  static constexpr int XBitMask = 0;
+};
+struct C {
+  using ReferenceHost = B;
+  template  static decltype(ReferenceHost::XBitMask<0>) XBitMask;
+};
+int main() { C::XBitMask<0>; }


Re: C++ PATCH for c++/42329, P0522 and other template template parm issues

2016-12-22 Thread Jason Merrill
On Wed, Dec 21, 2016 at 4:06 PM, Jason Merrill  wrote:
> The last patch implements paper P0522, which resolves DR150 to clarify
> that default arguments do make a template suitable as an argument to a
> template template-parameter based on a new rule that the
> template-parameter must be more specialized (as newly defined) than
> the argument template.  This is a defect report that will apply to all
> standard levels, but since we're in stage3 I limited it by default to
> C++17 for GCC 7; it can also be explicitly enabled with
> -fnew-ttp-matching.

And this adds a feature-test macro I proposed to SG10.
commit 2396126078c49cc17dacc905969febd1bdd7caa6
Author: Jason Merrill 
Date:   Thu Dec 22 13:11:26 2016 -0500

Feature-test macro for P0522R0, matching of template template arguments.

* c-cppbuiltin.c (c_cpp_builtins): Define
__cpp_template_template_args.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index e2419e8..a841e53 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -985,6 +985,8 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define_formatted (pfile, "__STDCPP_DEFAULT_NEW_ALIGNMENT__=%d",
aligned_new_threshold);
}
+  if (flag_new_ttp)
+   cpp_define (pfile, "__cpp_template_template_args=201611");
 }
   /* Note that we define this for C as well, so that we know if
  __attribute__((cleanup)) will interface with EH.  */
diff --git a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C 
b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
index 086fd25..f61b9f5 100644
--- a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
+++ b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
@@ -398,6 +398,12 @@
 #  error "__cpp_structured_bindings != 201606"
 #endif
 
+#ifndef __cpp_template_template_args
+#  error "__cpp_template_template_args"
+#elif __cpp_template_template_args != 201611
+#  error "__cpp_template_template_args != 201611"
+#endif
+
 #ifdef __has_cpp_attribute
 
 #  if ! __has_cpp_attribute(maybe_unused)


[PATCH 2/3] [i386] Keep stack pointer valid after after re-alignment.

2016-12-22 Thread Daniel Santos
This stage adds the fields sp_realigned and sp_realigned_offset to
struct machine_frame_state and adds the concept of the stack pointer
being re-aligned rather than invalid.  The inline functions sp_valid_at
and fp_valid_at are added to test if a given location relative to the
CFA can be accessed with the stack or frame pointer, respectively.

Stack allocation prior to re-alignment is modified so that we allocate
what is needed, but don't allocate unneeded space in the event that no
SSE registers are saved, but frame.sse_reg_save_offset is increased for
alignment.

As this change only alters how SSE registers are saved, moving the
re-alignment AND should not hinder parallelization of int register saves.
---
 gcc/config/i386/i386.c | 69 --
 gcc/config/i386/i386.h | 12 +
 2 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7f7389cbe31..b5f9f36094f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12604,6 +12604,24 @@ choose_baseaddr_len (unsigned int regno, HOST_WIDE_INT 
offset)
   return len;
 }
 
+/* Determine if the stack pointer is valid for accessing the cfa_offset.  */
+
+static inline bool sp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.sp_valid && !(fs.sp_realigned
+ && cfa_offset < fs.sp_realigned_offset);
+}
+
+/* Determine if the frame pointer is valid for accessing the cfa_offset.  */
+
+static inline bool fp_valid_at (HOST_WIDE_INT cfa_offset)
+{
+  const struct machine_frame_state &fs = cfun->machine->fs;
+  return fs.fp_valid && !(fs.sp_valid && fs.sp_realigned
+ && cfa_offset >= fs.sp_realigned_offset);
+}
+
 /* Return an RTX that points to CFA_OFFSET within the stack frame.
The valid base registers are taken from CFUN->MACHINE->FS.  */
 
@@ -12902,15 +12920,18 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
 {
   HOST_WIDE_INT ooffset = m->fs.sp_offset;
   bool valid = m->fs.sp_valid;
+  bool realigned = m->fs.sp_realigned;
 
   if (src == hard_frame_pointer_rtx)
{
  valid = m->fs.fp_valid;
+ realigned = false;
  ooffset = m->fs.fp_offset;
}
   else if (src == crtl->drap_reg)
{
  valid = m->fs.drap_valid;
+ realigned = false;
  ooffset = 0;
}
   else
@@ -12924,6 +12945,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
 
   m->fs.sp_offset = ooffset - INTVAL (offset);
   m->fs.sp_valid = valid;
+  m->fs.sp_realigned = realigned;
 }
 }
 
@@ -13673,6 +13695,7 @@ ix86_expand_prologue (void)
  this is fudged; we're interested to offsets within the local frame.  */
   m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET;
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
 
   ix86_compute_frame_layout (&frame);
 
@@ -13889,11 +13912,10 @@ ix86_expand_prologue (void)
 that we must allocate the size of the register save area before
 performing the actual alignment.  Otherwise we cannot guarantee
 that there's enough storage above the realignment point.  */
-  if (m->fs.sp_offset != frame.sse_reg_save_offset)
+  allocate = frame.stack_realign_allocate_offset - m->fs.sp_offset;
+  if (allocate)
 pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-  GEN_INT (m->fs.sp_offset
-   - frame.sse_reg_save_offset),
-  -1, false);
+  GEN_INT (-allocate), -1, false);
 
   /* Align the stack.  */
   insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
@@ -13901,11 +13923,14 @@ ix86_expand_prologue (void)
GEN_INT (-align_bytes)));
 
   /* For the purposes of register save area addressing, the stack
- pointer is no longer valid.  As for the value of sp_offset,
-see ix86_compute_frame_layout, which we need to match in order
-to pass verification of stack_pointer_offset at the end.  */
+pointer can no longer be used to access anything in the frame
+below m->fs.sp_realigned_offset and the frame pointer cannot be
+used for anything at or above.  */
+  gcc_assert (m->fs.sp_offset == frame.stack_realign_allocate_offset);
   m->fs.sp_offset = ROUND_UP (m->fs.sp_offset, align_bytes);
-  m->fs.sp_valid = false;
+  m->fs.sp_realigned = true;
+  m->fs.sp_realigned_offset = m->fs.sp_offset - frame.nsseregs * 16;
+  gcc_assert (m->fs.sp_realigned_offset == frame.stack_realign_offset);
 }
 
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
@@ -14244,6 +14269,7 @@ ix86_emit_leave (void)
 
   gcc_assert (m->fs.fp_valid);
   m->fs.sp_valid = true;
+  m->fs.sp_realigned = false;
   m->fs.sp_offset = 

[PATCH 1/3] [i386] Move stack frame re-alignment to before SSE saves.

2016-12-22 Thread Daniel Santos
This step adds new fields to struct ix86_frame to track where we started
the stack re-alignment and what we need to allocate prior to
re-alignment.  In ix86_compute_frame_layout, we do the stack frame
re-alignment computation prior to computing the SSE save area so that it
we have an aligned SSE save area.

This new also assures that the SSE save area is properly aligned when
DRAP is used.
---
 gcc/config/i386/i386.c | 40 +---
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 792e8ec232d..7f7389cbe31 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2453,7 +2453,7 @@ struct GTY(()) stack_local_entry {
[saved regs]
<- regs_save_offset
[padding0]
-
+   <- stack_realign_offset
[saved SSE regs]
<- sse_regs_save_offset
[padding1]  |
@@ -2479,6 +2479,8 @@ struct ix86_frame
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
   HOST_WIDE_INT reg_save_offset;
+  HOST_WIDE_INT stack_realign_allocate_offset;
+  HOST_WIDE_INT stack_realign_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
   /* When save_regs_using_mov is set, emit prologue using
@@ -12457,28 +12459,36 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (TARGET_SEH)
 frame->hard_frame_pointer_offset = offset;
 
+  /* When re-aligning the stack frame, but not saving SSE registers, this
+ is the offset we want to allocate memory for.  */
+  frame->stack_realign_allocate_offset = offset;
+
+  /* The re-aligned stack starts here.  Values before this point are not
+ directly comparable with values below this point.  Use sp_valid_at
+ to determine if the stack pointer is valid for a given offset and
+ fp_valid_at for the frame pointer.  */
+  if (stack_realign_fp)
+offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->stack_realign_offset = offset;
+
   /* Align and set SSE register save area.  */
   if (frame->nsseregs)
 {
   /* The only ABI that has saved SSE registers (Win64) also has a
-16-byte aligned default stack, and thus we don't need to be
-within the re-aligned local stack frame to save them.  In case
-incoming stack boundary is aligned to less than 16 bytes,
-unaligned move of SSE register will be emitted, so there is
-no point to round up the SSE register save area outside the
-re-aligned local stack frame to 16 bytes.  */
-  if (ix86_incoming_stack_boundary >= 128)
+16-byte aligned default stack.  However, many programs violate
+the ABI, and Wine64 forces stack realignment to compensate.
+
+If the incoming stack boundary is at least 16 bytes, or DRAP is
+required and the DRAP re-alignment boundary is at least 16 bytes,
+then we want the SSE register save area properly aligned.  */
+  if (ix86_incoming_stack_boundary >= 128
+  || (stack_realign_drap && stack_alignment_needed >= 16))
offset = ROUND_UP (offset, 16);
   offset += frame->nsseregs * 16;
+  frame->stack_realign_allocate_offset = offset;
 }
-  frame->sse_reg_save_offset = offset;
 
-  /* The re-aligned stack starts here.  Values before this point are not
- directly comparable with values below this point.  In order to make
- sure that no value happens to be the same before and after, force
- the alignment computation below to add a non-zero value.  */
-  if (stack_realign_fp)
-offset = ROUND_UP (offset, stack_alignment_needed);
+  frame->sse_reg_save_offset = offset;
 
   /* Va-arg area */
   frame->va_arg_size = ix86_varargs_gpr_size + ix86_varargs_fpr_size;
-- 
2.11.0



[PATCH 3/3] [i386] Use re-aligned stack pointer for aligned SSE movs

2016-12-22 Thread Daniel Santos
This adds an optional `align' parameter to choose_baseaddr allowing the
caller to request an address that is aligned to some boundary.  Then
ix86_emit_save_regs_using_mov and ix86_emit_restore_regs_using_mov are
modified so that optimally aligned memory is used when such a base
register is available.
---
 gcc/config/i386/i386.c | 110 ++---
 1 file changed, 87 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b5f9f36094f..e60267a903d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12622,15 +12622,40 @@ static inline bool fp_valid_at (HOST_WIDE_INT 
cfa_offset)
  && cfa_offset >= fs.sp_realigned_offset);
 }
 
-/* Return an RTX that points to CFA_OFFSET within the stack frame.
-   The valid base registers are taken from CFUN->MACHINE->FS.  */
+/* Choose a base register based upon alignment requested, speed and/or
+   size.  */
 
-static rtx
-choose_baseaddr (HOST_WIDE_INT cfa_offset)
+static void choose_basereg (HOST_WIDE_INT cfa_offset, rtx &base_reg,
+   HOST_WIDE_INT &base_offset,
+   unsigned int align_reqested, unsigned int *align)
 {
   const struct machine_function *m = cfun->machine;
-  rtx base_reg = NULL;
-  HOST_WIDE_INT base_offset = 0;
+  unsigned int hfp_align;
+  unsigned int drap_align;
+  unsigned int sp_align;
+  bool hfp_ok  = fp_valid_at (cfa_offset);
+  bool drap_ok = m->fs.drap_valid;
+  bool sp_ok   = sp_valid_at (cfa_offset);
+
+  hfp_align = drap_align = sp_align = INCOMING_STACK_BOUNDARY;
+
+  /* Filter out any registers that don't meet the requested alignment
+ criteria.  */
+  if (align_reqested)
+{
+  /* Make sure we weren't given a cfa_offset incongruent with the
+align_reqested.  */
+  gcc_assert (!(cfa_offset & (align_reqested / BITS_PER_UNIT - 1)));
+
+  if (m->fs.realigned)
+   hfp_align = drap_align = sp_align = crtl->stack_alignment_needed;
+  else if (m->fs.sp_realigned)
+   sp_align = crtl->stack_alignment_needed;
+
+  hfp_ok = hfp_ok && hfp_align >= align_reqested;
+  drap_ok = drap_ok && drap_align >= align_reqested;
+  sp_ok = sp_ok && sp_align >= align_reqested;
+}
 
   if (m->use_fast_prologue_epilogue)
 {
@@ -12639,17 +12664,17 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
  while DRAP must be reloaded within the epilogue.  But choose either
  over the SP due to increased encoding size.  */
 
-  if (m->fs.fp_valid)
+  if (hfp_ok)
{
  base_reg = hard_frame_pointer_rtx;
  base_offset = m->fs.fp_offset - cfa_offset;
}
-  else if (m->fs.drap_valid)
+  else if (drap_ok)
{
  base_reg = crtl->drap_reg;
  base_offset = 0 - cfa_offset;
}
-  else if (m->fs.sp_valid)
+  else if (sp_ok)
{
  base_reg = stack_pointer_rtx;
  base_offset = m->fs.sp_offset - cfa_offset;
@@ -12662,13 +12687,13 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
 
   /* Choose the base register with the smallest address encoding.
  With a tie, choose FP > DRAP > SP.  */
-  if (m->fs.sp_valid)
+  if (sp_ok)
{
  base_reg = stack_pointer_rtx;
  base_offset = m->fs.sp_offset - cfa_offset;
   len = choose_baseaddr_len (STACK_POINTER_REGNUM, base_offset);
}
-  if (m->fs.drap_valid)
+  if (drap_ok)
{
  toffset = 0 - cfa_offset;
  tlen = choose_baseaddr_len (REGNO (crtl->drap_reg), toffset);
@@ -12679,7 +12704,7 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
  len = tlen;
}
}
-  if (m->fs.fp_valid)
+  if (hfp_ok)
{
  toffset = m->fs.fp_offset - cfa_offset;
  tlen = choose_baseaddr_len (HARD_FRAME_POINTER_REGNUM, toffset);
@@ -12691,8 +12716,40 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset)
}
}
 }
-  gcc_assert (base_reg != NULL);
 
+/* Set the align return value.  */
+if (align)
+  {
+   if (base_reg == stack_pointer_rtx)
+ *align = sp_align;
+   else if (base_reg == crtl->drap_reg)
+ *align = drap_align;
+   else if (base_reg == hard_frame_pointer_rtx)
+ *align = hfp_align;
+  }
+}
+
+/* Return an RTX that points to CFA_OFFSET within the stack frame and
+   the alignment of address.  If align is non-null, it should point to
+   an alignment value (in bits) that is preferred or zero and will
+   recieve the alignment of the base register that was selected.  The
+   valid base registers are taken from CFUN->MACHINE->FS.  */
+
+static rtx
+choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned int *align)
+{
+  rtx base_reg = NULL;
+  HOST_WIDE_INT base_offset = 0;
+
+  /* If a specific alignment is requested, try to get a base register
+ with that alignment first.  */
+  if (align && *align)
+choose_basereg (cfa_offset

Re: [PATCH, Fortran, alloc_poly, v2] Fix allocation of memory for polymorphic assignment

2016-12-22 Thread Janus Weil
2016-12-20 17:07 GMT+01:00 Andre Vehreschild :
> Hi Janus,
>
>> 1) After adding that code block in gfc_trans_assignment_1, it seems
>> like the comment above is outdated, right?
>
> Thanks for noting.
>
>> 2) Wouldn't it be better to move this block, which does the correct
>> allocation for CLASS variables, into
>> "alloc_scalar_allocatable_for_assignment", where the allocation for
>> all other cases is done?
>
> I tried to, but that would have meant to extend the interface of
> alloc_scalar_allocatable_for_assignment significantly, while at the location
> where I finally added the code, I could use the data available. Secondly
> putting the malloc at the correct location is not possible at alloc_scalar_...
> because the pre-blocks have already been joined to the body. That way the
> malloc was always placed either before even the vptr was set, or after the 
> data
> was copied. Both options were quite hazardous.
>
> I now went to add the allocation into trans_class_assignment (). This allows
> even more reuse of already present and needed data, e.g., the vptr.
>
> Bootstrapped and regtested ok on x86_64-linux/f23. Ok for trunk?

Thanks for the explanations. The patch is ok with me in this form.

Cheers,
Janus


Go patch committed: Fix spurious redefinition error

2016-12-22 Thread Ian Lance Taylor
This patch to the Go frontend by Than McIntosh changes
Struct_type::do_mangled_name to incorporate the field names even for
hidden symbols. This is needed in cases where a package imports a type
"S" that has an anonymous struct, e.g.

  // imported from some other package
  type S struct {
X struct{ _ struct{} }
  }

and then defines a local type that uses a structurally identical
anonymous struct, e.g.

  // defined locally
  type T struct {
U struct{ _ struct{} }
  }

In the case above both types triggered the creation of hash/equal
methods, but the method names were clashing (since both structs had
the same mangled name).

This fixes https://golang.org/issue/18414.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 243805)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-4a0bb435bbb1d1516b486d1998e8dc184576db61
+9a89f32811e6b3a29e22dda46e9c23811f562876
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.h
===
--- gcc/go/gofrontend/gogo.h(revision 243766)
+++ gcc/go/gofrontend/gogo.h(working copy)
@@ -202,6 +202,27 @@ class Gogo
   }
 
   // Given a name which may or may not have been hidden, return the
+  // name to use within a mangled symbol name.
+  static std::string
+  mangle_possibly_hidden_name(const std::string& name)
+  { 
+// FIXME: This adds in pkgpath twice for hidden symbols, which is
+// less than ideal.
+std::string n;
+if (!Gogo::is_hidden_name(name))
+  n = name;
+else
+  {
+n = ".";
+std::string pkgpath = Gogo::hidden_name_pkgpath(name);
+n.append(Gogo::pkgpath_for_symbol(pkgpath));
+n.append(1, '.');
+n.append(Gogo::unpack_hidden_name(name));
+  }
+return n;
+  }
+
+  // Given a name which may or may not have been hidden, return the
   // name to use in an error message.
   static std::string
   message_name(const std::string& name);
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 243735)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -1337,18 +1337,8 @@ Type::type_descriptor_var_name(Gogo* gog
}
 }
 
-  // FIXME: This adds in pkgpath twice for hidden symbols, which is
-  // pointless.
-  const std::string& name(no->name());
-  if (!Gogo::is_hidden_name(name))
-ret.append(name);
-  else
-{
-  ret.append(1, '.');
-  ret.append(Gogo::pkgpath_for_symbol(Gogo::hidden_name_pkgpath(name)));
-  ret.append(1, '.');
-  ret.append(Gogo::unpack_hidden_name(name));
-}
+  std::string mname(Gogo::mangle_possibly_hidden_name(no->name()));
+  ret.append(mname);
 
   return ret;
 }
@@ -5638,8 +5628,9 @@ Struct_type::do_mangled_name(Gogo* gogo,
  if (p->is_anonymous())
ret->append("0_");
  else
-   {
- std::string n = Gogo::unpack_hidden_name(p->field_name());
+{
+
+  std::string 
n(Gogo::mangle_possibly_hidden_name(p->field_name()));
  char buf[20];
  snprintf(buf, sizeof buf, "%u_",
   static_cast(n.length()));
@@ -8712,17 +8703,7 @@ Interface_type::do_mangled_name(Gogo* go
{
  if (!p->name().empty())
{
- std::string n;
- if (!Gogo::is_hidden_name(p->name()))
-   n = p->name();
- else
-   {
- n = ".";
- std::string pkgpath = Gogo::hidden_name_pkgpath(p->name());
- n.append(Gogo::pkgpath_for_symbol(pkgpath));
- n.append(1, '.');
- n.append(Gogo::unpack_hidden_name(p->name()));
-   }
+ std::string n(Gogo::mangle_possibly_hidden_name(p->name()));
  char buf[20];
  snprintf(buf, sizeof buf, "%u_",
   static_cast(n.length()));


RE: [PATCH, testsuite] MIPS: Cleanup the forcing of assembly output in error tests.

2016-12-22 Thread Moore, Catherine


> -Original Message-
> From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> Sent: Wednesday, December 14, 2016 9:56 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Matthew Fortune ; Moore,
> Catherine 
> Subject: [PATCH, testsuite] MIPS: Cleanup the forcing of assembly
> output in error tests.
> 
> Hi,
> 
> Some error tests were forcing assembly output incorrectly, and none
> of them had
> an explanation for why they were using -ffat-lto-objects.
> 
> This patch removes dg-skip-if's for -fno-fat-lto-objects and
> adds -ffat-lto-objects to the test options instead.
> It also adds an explanation for the purpose of -ffat-lto-objects in each
> test.
> 
> Tested with mips-mti-elf.
> 
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/mips/oddspreg-2.c (dg-options): Remove dg-skip-if
> for
>   -fno-fat-lto-objects and add the -ffat-lto-objects option, along
> with
>   an explanation for its purpose.
>   * gcc.target/mips/oddspreg-3.c (dg-options): Likewise.
>   * gcc.target/mips/oddspreg-6.c (dg-options): Likewise.
>   * gcc.target/mips/no-dsp-1.c: Add an explanation for the
> purpose of
>   -ffat-lto-objects.
>   * gcc.target/mips/pr54240.c: Likewise.
>   * gcc.target/mips/r10k-cache-barrier-14.c: Likewise.
>   * gcc.target/mips/soft-float-1.c: Likewise.
> 

This is OK.


New Spanish PO file for 'gcc' (version 6.2.0)

2016-12-22 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/es.po

(This file, 'gcc-6.2.0.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Paul Hua
On aarch64 target the result are 1.332268e-17.
On x86 with fma target the result are also 1.332268e-17.

so, I don't think the Loongson's madd.fmt/msub.fmt is incorrect.

We should do something for usage of fused madd, the all things has
been tested an fedora21 remix for loongson(1).

1, gcc:add loongson to the list of targets with a fused implementation.
2, glibc:__fma() function,we should use madd.fmt like aarch64.
see patch: 
http://www.loongnix.org/cgit/glibc-2.20/commit/?id=14023742e6ef571b61439d0d7bb7939e663fe624
3, kernel: the emulation  when a float exception taken.

(1),http://www.loongnix.org/index.php/Loongnix-20161130%E7%89%88%E6%9C%AC%E5%9C%A8Fedora21_remix%E7%B3%BB%E7%BB%9F%E4%B8%8A%E5%8F%91%E5%B8%83

Thanks,
Paul

On Fri, Dec 23, 2016 at 12:38 AM, Yunqiang Su  wrote:
>
>> 在 2016年12月23日,00:18,Richard Sandiford  写道:
>>
>> Yunqiang Su  writes:
 在 2016年12月22日,23:48,Yunqiang Su  写道:

>
> 在 2016年12月22日,23:31,Richard Sandiford
>  写道:
>
> Matthew Fortune  writes:
>> Sandra Loosemore  writes:
>>> On 12/21/2016 11:54 AM, Yunqiang Su wrote:
 By this patch, I add a build-time option ` 
 --with-unfused-madd4=yes/no',
 and runtime option -m(no-)unfused-madd4,
 to disable generate madd.fmt instructions.
>>>
>>> Your patch also needs a documentation change so that the new
>>> command-line option is listed in the GCC manual with other MIPS target
>>> options.
>>
>> Any opinions on option names to control this? Is it best to target
>> the specific
>> feature that is non-compliant on loongson or apply a general 
>> -mfix-loongson
>> type option?
>>
>> I'm not sure I have a strong opinion either way but there do seem to be
>> multiple possible variants.
>
> Wasn't sure from this thread whether Loongson simply had a fused
> implementation (without intermediate rounding) or whether the
> instructions gave numerically incorrect results for some inputs.

 I test to define ISA_HAS_FUSED_MADD4 true and
 define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
 With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
 and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
 both of the are incorrect (the expect value is 0).

 The test case is

 #include 

 double a = 0.6;
 double b = 0.4;
 double c = 0.6;
 double d = 0.4;

 int main(void)
 {
   double x = a * b - c * d;
   printf("%le\n", x);
   return 0;
 }


> It sounds from a later thread like it's generating incorrect results,
> is that right?  If so, then FWIW I agree an -mfix option would be more
> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
> integer division result and one of the -mfix-sb1 errata was an incorrect
> single-precision float division result.  The latter case could have been
> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
> gave more control.
>
> If instead the problem is that the instructions are fused then that's
> also what the original MIPS 4 parts did, so maybe an option to control
> fusedness would make sense.

 The result to thread it fused or unfused, is different, while neither of 
 them
 is correct.
>>>
>>> ohh, the result are same, and neither is correct.
>>> both of them are 1.332268e-17.
>>
>> That's the expected result for an implementation in which the subtraction
>> is fused with the first multiplication without intermediate rounding.
>> The second 0.4 * 0.6 isn't exactly representable and rounds down, to a
>> value slightly less than 0.24.  Then the fused operation subtracts this
>> value from the exact result of the first 0.4 * 0.6 (0.24), giving a
>> value slightly greater than 0.
>
> I see. But I think 1.332268e-17 is too big than expected.
> 1-e17 for double is totally unacceptable.
>
>>
>> I see Paul Hua's patch does add Loongson to the list of targets
>> with a fused implementation (should have checked earlier, sorry).
>> So I think after that patch we would do the right thing.  (In particular,
>> -ffp-contract=off would then disable the fusing.)
>
> will -ffp-contract=off disable some other optimization?
> If so, I don’t think that will be an ideal choice for distributions, like 
> Debian.
>
>>
>> Thanks,
>> Richard
>


Re: [Patch ,gcc/MIPS] add an build-time/runtime option to disable madd.fmt

2016-12-22 Thread Yunqiang Su

> 在 2016年12月23日,10:47,Paul Hua  写道:
> 
> On aarch64 target the result are 1.332268e-17.
> On x86 with fma target the result are also 1.332268e-17.
> 
> so, I don't think the Loongson's madd.fmt/msub.fmt is incorrect.

OMG. Will this behavior make some app wrong working abnormal?


> 
> We should do something for usage of fused madd, the all things has
> been tested an fedora21 remix for loongson(1).

We met lots of ftbfs on Debian with madd.fmt enabled.
So maybe FMA should not be used for normal applications at all?
and for generic linux distributions FMA shouldn’t be enabled by default.

> 
> 1, gcc:add loongson to the list of targets with a fused implementation.

If so, I think we should do this.

> 2, glibc:__fma() function,we should use madd.fmt like aarch64.
> see patch: 
> http://www.loongnix.org/cgit/glibc-2.20/commit/?id=14023742e6ef571b61439d0d7bb7939e663fe624

Your patch shouldn’t be apply to upstream, as you don’t consider old CPUs 
without FMA.
In fact all MIPS <= MIPSr5 don’t have FMA.

> 3, kernel: the emulation  when a float exception taken.

The big problem is that Loongson use the same encode for (unfused) madd.fmt to 
(fused) madd.fmt.
We cannot trap this in kernel.

> 
> (1),http://www.loongnix.org/index.php/Loongnix-20161130%E7%89%88%E6%9C%AC%E5%9C%A8Fedora21_remix%E7%B3%BB%E7%BB%9F%E4%B8%8A%E5%8F%91%E5%B8%83
> 
> Thanks,
> Paul
> 
> On Fri, Dec 23, 2016 at 12:38 AM, Yunqiang Su  wrote:
>> 
>>> 在 2016年12月23日,00:18,Richard Sandiford  写道:
>>> 
>>> Yunqiang Su  writes:
> 在 2016年12月22日,23:48,Yunqiang Su  写道:
> 
>> 
>> 在 2016年12月22日,23:31,Richard Sandiford
>>  写道:
>> 
>> Matthew Fortune  writes:
>>> Sandra Loosemore  writes:
 On 12/21/2016 11:54 AM, Yunqiang Su wrote:
> By this patch, I add a build-time option ` 
> --with-unfused-madd4=yes/no',
> and runtime option -m(no-)unfused-madd4,
> to disable generate madd.fmt instructions.
 
 Your patch also needs a documentation change so that the new
 command-line option is listed in the GCC manual with other MIPS target
 options.
>>> 
>>> Any opinions on option names to control this? Is it best to target
>>> the specific
>>> feature that is non-compliant on loongson or apply a general 
>>> -mfix-loongson
>>> type option?
>>> 
>>> I'm not sure I have a strong opinion either way but there do seem to be
>>> multiple possible variants.
>> 
>> Wasn't sure from this thread whether Loongson simply had a fused
>> implementation (without intermediate rounding) or whether the
>> instructions gave numerically incorrect results for some inputs.
> 
> I test to define ISA_HAS_FUSED_MADD4 true and
> define ISA_HAS_UNFUSED_MADD4 false, and try to build a test case.
> With ISA_HAS_FUSED_MADD4, the result is about 1e-17,
> and with ISA_HAS_UNFUSED_MADD4, the result is about 1e-17,
> both of the are incorrect (the expect value is 0).
> 
> The test case is
> 
> #include 
> 
> double a = 0.6;
> double b = 0.4;
> double c = 0.6;
> double d = 0.4;
> 
> int main(void)
> {
>  double x = a * b - c * d;
>  printf("%le\n", x);
>  return 0;
> }
> 
> 
>> It sounds from a later thread like it's generating incorrect results,
>> is that right?  If so, then FWIW I agree an -mfix option would be more
>> consistent.  E.g. one of the -mfix-vr4120 errata was an incorrect
>> integer division result and one of the -mfix-sb1 errata was an incorrect
>> single-precision float division result.  The latter case could have been
>> handled by an option to disable DIV.S and DIV.PS, but the -mfix option
>> gave more control.
>> 
>> If instead the problem is that the instructions are fused then that's
>> also what the original MIPS 4 parts did, so maybe an option to control
>> fusedness would make sense.
> 
> The result to thread it fused or unfused, is different, while neither of 
> them
> is correct.
 
 ohh, the result are same, and neither is correct.
 both of them are 1.332268e-17.
>>> 
>>> That's the expected result for an implementation in which the subtraction
>>> is fused with the first multiplication without intermediate rounding.
>>> The second 0.4 * 0.6 isn't exactly representable and rounds down, to a
>>> value slightly less than 0.24.  Then the fused operation subtracts this
>>> value from the exact result of the first 0.4 * 0.6 (0.24), giving a
>>> value slightly greater than 0.
>> 
>> I see. But I think 1.332268e-17 is too big than expected.
>> 1-e17 for double is totally unacceptable.
>> 
>>> 
>>> I see Paul Hua's patch does add Loongson to the list of targets
>>> with a fused implementation (should have checked earlier, sorry).
>>> So I think after that patch we would do the right thing.  (In particular,
>>> -ffp-contract=off would then disable the 

Re: [PATCH] PR78879

2016-12-22 Thread Yuan, Pengfei
> How specifically does this fix the problem.  I suspect you're just 
> papering over the bug by changing the order of thread processing.
> 
> You'll note that when a thread is canceled we call the 
> paths.unordered_remove without incrementing i.  AFAICT you're just 
> changing order in which paths out of the array.
> 
> 
> Jeff

Please check my explanation in
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78879#c11

Thanks!

Yuan, Pengfei