Re: [PATCH][AArch64] Allow const0_rtx operand for atomic compare-exchange patterns

2017-06-19 Thread Andrew Pinski
On Tue, Feb 28, 2017 at 4:29 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> For the testcase in this patch we currently generate:
> foo:
> mov w1, 0
> ldaxr   w2, [x0]
> cmp w2, 3
> bne .L2
> stxrw3, w1, [x0]
> cmp w3, 0
> .L2:
> csetw0, eq
> ret
>
> Note that the STXR could have been storing the WZR register instead of
> moving zero into w1.
> This is due to overly strict predicates and constraints in the store
> exclusive pattern and the
> atomic compare exchange expanders and splitters.
> This simple patch fixes that in the patterns concerned and with it we can
> generate:
> foo:
> ldaxr   w1, [x0]
> cmp w1, 3
> bne .L2
> stxrw2, wzr, [x0]
> cmp w2, 0
> .L2:
> csetw0, eq
> ret
>
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for GCC 8?


This patch broke compiling with -march=+lse

./home/apinski/src/local5/gcc/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c:9:1:
error: unrecognizable insn:
 }
 ^
(insn 6 3 7 2 (parallel [
(set (reg:CC 66 cc)
(unspec_volatile:CC [
(const_int 0 [0])
] UNSPECV_ATOMIC_CMPSW))
(set (reg:SI 78)
(mem/v:SI (reg/v/f:DI 77 [ a ]) [-1  S4 A32]))
(set (mem/v:SI (reg/v/f:DI 77 [ a ]) [-1  S4 A32])
(unspec_volatile:SI [
(const_int 3 [0x3])
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 2 [0x2])
] UNSPECV_ATOMIC_CMPSW))
]) 
"/home/apinski/src/local5/gcc/gcc/testsuite/gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c":8
-1
 (nil))
during RTL pass: vregs

Note also your new testcase is broken even for defaulting to +lse as
it is not going to match stxr.  I might be the only person who tests
+lse by default :).

Thanks,
Andrew Pinski

>
> Thanks,
> Kyrill
>
> 2017-02-28  Kyrylo Tkachov  
>
> * config/aarch64/atomics.md (atomic_compare_and_swap expander):
> Use aarch64_reg_or_zero predicate for operand 4.
> (aarch64_compare_and_swap define_insn_and_split):
> Use aarch64_reg_or_zero predicate for operand 3.  Add 'Z' constraint.
> (aarch64_store_exclusive): Likewise for operand 2.
>
> 2017-02-28  Kyrylo Tkachov  
>
> * gcc.target/aarch64/atomic_cmp_exchange_zero_reg_1.c: New test.


Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-19 Thread Sebastian Huber

Hello,

would someone mind reviewing this patch please. It was already sent for 
review on January this year and got no attention. Now we are in a 
different development stage.


https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01354.html

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] Fix x86 ICE with -mtune=amdfam10 -mno-sse2 (PR target/81121)

2017-06-19 Thread Uros Bizjak
On Mon, Jun 19, 2017 at 5:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> This testcase started to ICE when PR70873 fix changed the splitter:
> @@ -5153,11 +5147,11 @@
>  ;; slots when !TARGET_INTER_UNIT_MOVES_TO_VEC disables the general_regs
>  ;; alternative in sse2_loadld.
>  (define_split
> -  [(set (match_operand:MODEF 0 "register_operand")
> +  [(set (match_operand:MODEF 0 "sse_reg_operand")
> (float:MODEF (match_operand:SI 1 "nonimmediate_operand")))]
> -  "TARGET_SSE2 && TARGET_SSE_MATH
> -   && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun)
> -   && reload_completed && SSE_REG_P (operands[0])
> +  "TARGET_USE_VECTOR_CONVERTS
> +   && optimize_function_for_speed_p (cfun)
> +   && reload_completed
> && (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
> && (!EXT_REX_SSE_REG_P (operands[0])
> || TARGET_AVX512VL)"
> Having sse_reg_operand match the output operand does not imply
> TARGET_SSE2 is enabled, but we need it for both the
>   if (mode == V4SFmode)
> emit_insn (gen_floatv4siv4sf2 (operands[3], operands[4]));
>   else
> emit_insn (gen_sse2_cvtdq2pd (operands[3], operands[4]));
> instructions that we want to use in the splitter.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> (or do you want TARGET_SSE2 first or right after
> TARGET_USE_VECTOR_CONVERTS)?

Please put TARGET_SSE2 first in the insn condition.

> 2017-06-19  Jakub Jelinek  
>
> PR target/81121
> * config/i386/i386.md (TARGET_USE_VECTOR_CONVERTS float si->{sf,df}
> splitter): Require TARGET_SSE2 in the condition.
>
> * gcc.target/i386/pr81121.c: New test.

OK with the above change.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2017-06-08 20:50:46.0 +0200
> +++ gcc/config/i386/i386.md 2017-06-19 11:30:38.937491668 +0200
> @@ -5294,6 +5294,7 @@ (define_split
> && optimize_function_for_speed_p (cfun)
> && reload_completed
> && (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
> +   && TARGET_SSE2
> && (!EXT_REX_SSE_REG_P (operands[0])
> || TARGET_AVX512VL)"
>[(const_int 0)]
> --- gcc/testsuite/gcc.target/i386/pr81121.c.jj  2017-06-19 11:36:06.545501324 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr81121.c 2017-06-19 11:35:40.0 
> +0200
> @@ -0,0 +1,10 @@
> +/* PR target/81121 */
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -march=amdfam10 -mno-sse2" } */
> +
> +void
> +foo (short *x, short *y)
> +{
> +  float a = 0;
> +  y[0] = x[0] * a;
> +}
>
> Jakub


Backport of r244010 to gcc-6-branch

2017-06-19 Thread Jack Howarth
  The following change from gcc-7-branch and trunk needs to be backported
to gcc-6-branch to allow the Xcode 9 clang compiler to bootstrap it. Tested
on 10.12 with Xcode 9 beta. Okay for gcc-6-branch?
 Jack


r244010-gcc_6_branch-backport.patch
Description: Binary data


Re: [PATCH] Fix multi-versioning issues (PR ipa/80732).

2017-06-19 Thread Martin Liška
On 06/19/2017 12:35 PM, Jan Hubicka wrote:
>> Hello.
>>
>> Following patch tries to resolve following 2 issues:
>>
>> a) When one takes address of a function that uses target_clones attribute,
>>default implementation is always returned.
>>
>> b) Using dlsym("foo") should work and thus the resolver function should
>>use the default name. Because of that, default implementation must be
>>renamed.
>>
>> Unfortunately, we currently do not support redirection of ipa_refs, thus
>> walk_tree is needed to resolve that. Hopefully there should not be any
>> different IPA_REF that needs to be handled.
> 
> The cgraph interface for redirection is meant mostly for full IPA passes
> that can not touch bodies directly, in this case I think it is fine to
> walk all references.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Martin
> 
>> >From 198c8464978c21cd68d4743de5648ecfefd2e09c Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Wed, 17 May 2017 15:56:22 +0200
>> Subject: [PATCH] Fix multi-versioning issues (PR ipa/80732).
>>
>> gcc/ChangeLog:
>>
>> 2017-05-19  Martin Liska  
>>
>>  PR ipa/80732
>>  * attribs.c (make_dispatcher_decl): Do not append '.ifunc'
>>  to dispatcher function name.
>>  * multiple_target.c (replace_function_decl): New function.
>>  (create_dispatcher_calls): Redirect both edges and references.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-05-19  Martin Liska  
>>
>>  PR ipa/80732
>>  * gcc.target/i386/mvc5.c: Scan indirect_function.
>>  * gcc.target/i386/mvc7.c: Likewise.
>>  * gcc.target/i386/pr80732.c: New test.
>> ---
>>  gcc/attribs.c   |   6 +-
>>  gcc/multiple_target.c   | 105 
>> ++--
>>  gcc/testsuite/gcc.target/i386/mvc5.c|   2 +-
>>  gcc/testsuite/gcc.target/i386/mvc7.c|   2 +-
>>  gcc/testsuite/gcc.target/i386/pr80732.c |  85 ++
>>  5 files changed, 161 insertions(+), 39 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr80732.c
>>
>> diff --git a/gcc/attribs.c b/gcc/attribs.c
>> index 4ba0eab8899..5eb19e82795 100644
>> --- a/gcc/attribs.c
>> +++ b/gcc/attribs.c
>> @@ -888,12 +888,8 @@ make_dispatcher_decl (const tree decl)
>>tree func_decl;
>>char *func_name;
>>tree fn_type, func_type;
>> -  bool is_uniq = false;
>>  
>> -  if (TREE_PUBLIC (decl) == 0)
>> -is_uniq = true;
>> -
>> -  func_name = make_unique_name (decl, "ifunc", is_uniq);
>> +  func_name = xstrdup (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
>>  
>>fn_type = TREE_TYPE (decl);
>>func_type = build_function_type (TREE_TYPE (fn_type),
>> diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
>> index 2ee6a9591ba..fba2636ba16 100644
>> --- a/gcc/multiple_target.c
>> +++ b/gcc/multiple_target.c
>> @@ -34,6 +34,27 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "target.h"
>>  #include "attribs.h"
>>  #include "pretty-print.h"
>> +#include "gimple-iterator.h"
>> +#include "gimple-walk.h"
>> +
>> +/* Walker callback that replaces all FUNCTION_DECL of a function that's
>> +   going to be versioned.  */
>> +
>> +static tree
>> +replace_function_decl (tree *op, int *walk_subtrees, void *data)
>> +{
>> +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
>> +  cgraph_function_version_info *info = (cgraph_function_version_info 
>> *)wi->info;
>> +
>> +  if (TREE_CODE (*op) == FUNCTION_DECL
>> +  && info->this_node->decl == *op)
>> +{
>> +  *op = info->dispatcher_resolver;
>> +  *walk_subtrees = 0;
>> +}
>> +
>> +  return NULL;
>> +}
>>  
>>  /* If the call in NODE has multiple target attribute with multiple fields,
>> replace it with dispatcher call and create dispatcher (once).  */
>> @@ -41,51 +62,48 @@ along with GCC; see the file COPYING3.  If not see
>>  static void
>>  create_dispatcher_calls (struct cgraph_node *node)
>>  {
>> -  cgraph_edge *e;
>> -  cgraph_edge *e_next = NULL;
>> +  ipa_ref *ref;
>> +
>> +  if (!DECL_FUNCTION_VERSIONED (node->decl))
>> +return;
>> +
>> +  auto_vec edges_to_redirect;
>> +  auto_vec references_to_redirect;
>> +
>> +  for (unsigned i = 0; node->iterate_referring (i, ref); i++)
>> +references_to_redirect.safe_push (ref);
>>  
>>/* We need to remember NEXT_CALLER as it could be modified in the loop.  
>> */
>> -  for (e = node->callers; e ;e = (e == NULL) ? e_next : e->next_caller)
>> -{
>> -  tree resolver_decl;
>> -  tree idecl;
>> -  tree decl;
>> -  gimple *call = e->call_stmt;
>> -  struct cgraph_node *inode;
>> -
>> -  /* Checking if call of function is call of versioned function.
>> - Versioned function are not inlined, so there is no need to
>> - check for inline.  */
>> -  if (!call
>> -  || !(decl = gimple_call_fndecl (call))
>> -  || !DECL_FUNCTION_VERSIONED (decl))
>> -   

Re: [PATCH GCC][08/13]Refactoring structure partition for distribution

2017-06-19 Thread Bin.Cheng
On Wed, Jun 14, 2017 at 2:47 PM, Richard Biener
 wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> This patch refactors struct partition for later distribution.  It records
>> bitmap of data references in struct partition rather than vertices' data in
>> partition dependence graph.  It simplifies code as well as enables following
>> rewriting.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Ok.
Hi,
I updated patch by merging read/write data references together in
struct partition.  This helps remove code duplication.  Is it OK?
Thanks,
bin
2017-06-07  Bin Cheng  

* tree-loop-distribution.c (struct partition): New field recording
its data reference.
(partition_alloc, partition_free): Init and release data refs.
(partition_merge_into): Merge data refs.
(build_rdg_partition_for_vertex): Collect data refs for partition.
(pg_add_dependence_edges): Change parameters from vector to bitmap.
Update uses.
(distribute_loop): Remve data refs from vertice data of partition
graph.
From 9a3e3e96703fd792d71d964b31a12a3ce7dc5448 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 12:29:24 +0100
Subject: [PATCH 07/13] struct-partition-refactoring-20170608.txt

---
 gcc/tree-loop-distribution.c | 179 +++
 1 file changed, 94 insertions(+), 85 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index a013556..03bb735 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -500,30 +500,40 @@ enum partition_kind {
 PKIND_NORMAL, PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
 };
 
+/* Partition for loop distribution.  */
 struct partition
 {
+  /* Statements of the partition.  */
   bitmap stmts;
+  /* Loops of the partition.  */
   bitmap loops;
+  /* True if the partition defines variable which is used outside of loop.  */
   bool reduction_p;
+  /* For builtin partition, true if it executes one iteration more than
+ number of loop (latch) iterations.  */
   bool plus_one;
   enum partition_kind kind;
   /* data-references a kind != PKIND_NORMAL partition is about.  */
   data_reference_p main_dr;
   data_reference_p secondary_dr;
+  /* Number of loop (latch) iterations.  */
   tree niter;
+  /* Data references in the partition.  */
+  bitmap datarefs;
 };
 
 
 /* Allocate and initialize a partition from BITMAP.  */
 
 static partition *
-partition_alloc (bitmap stmts, bitmap loops)
+partition_alloc (void)
 {
   partition *partition = XCNEW (struct partition);
-  partition->stmts = stmts ? stmts : BITMAP_ALLOC (NULL);
-  partition->loops = loops ? loops : BITMAP_ALLOC (NULL);
+  partition->stmts = BITMAP_ALLOC (NULL);
+  partition->loops = BITMAP_ALLOC (NULL);
   partition->reduction_p = false;
   partition->kind = PKIND_NORMAL;
+  partition->datarefs = BITMAP_ALLOC (NULL);
   return partition;
 }
 
@@ -534,6 +544,7 @@ partition_free (partition *partition)
 {
   BITMAP_FREE (partition->stmts);
   BITMAP_FREE (partition->loops);
+  BITMAP_FREE (partition->datarefs);
   free (partition);
 }
 
@@ -581,6 +592,8 @@ partition_merge_into (partition *dest, partition *partition, enum fuse_type ft)
   if (partition_reduction_p (partition))
 dest->reduction_p = true;
 
+  bitmap_ior_into (dest->datarefs, partition->datarefs);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Fuse partitions because %s:\n", fuse_message[ft]);
@@ -1051,10 +1064,11 @@ generate_code_for_partition (struct loop *loop,
 static partition *
 build_rdg_partition_for_vertex (struct graph *rdg, int v)
 {
-  partition *partition = partition_alloc (NULL, NULL);
+  partition *partition = partition_alloc ();
   auto_vec nodes;
-  unsigned i;
+  unsigned i, j;
   int x;
+  data_reference_p dr;
 
   graphds_dfs (rdg, , 1, , false, NULL);
 
@@ -1063,6 +1077,14 @@ build_rdg_partition_for_vertex (struct graph *rdg, int v)
   bitmap_set_bit (partition->stmts, x);
   bitmap_set_bit (partition->loops,
 		  loop_containing_stmt (RDG_STMT (rdg, x))->num);
+
+  for (j = 0; RDG_DATAREFS (rdg, x).iterate (j, ); ++j)
+	{
+	  unsigned idx = (unsigned) DR_INDEX (dr);
+	  gcc_assert (idx < datarefs_vec.length ());
+
+	  bitmap_set_bit (partition->datarefs, idx);
+	}
 }
 
   return partition;
@@ -1427,63 +1449,74 @@ partition_contains_all_rw (struct graph *rdg,
 
 static int
 pg_add_dependence_edges (struct graph *rdg, int dir,
-			 vec drs1,
-			 vec drs2)
+			 bitmap drs1, bitmap drs2)
 {
-  data_reference_p dr1, dr2;
+  unsigned i, j;
+  bitmap_iterator bi, bj;
+  data_reference_p dr1, dr2, saved_dr1;
 
   /* dependence direction - 0 is no dependence, -1 is back,
  1 is forth, 2 is both (we can stop then, merging will occur).  */
-  for (int ii = 0; drs1.iterate (ii, ); ++ii)
-for (int jj = 0; drs2.iterate (jj, ); ++jj)
-  

Re: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-19 Thread Richard Earnshaw (lists)
On 16/06/17 22:08, Michael Collison wrote:
> This patch improves code generation for literal vector construction by 
> expanding and exposing the pattern to rtl optimization earlier. The current 
> implementation delays splitting the pattern until after reload which results 
> in poor code generation for the following code:
> 
> 
> #include "arm_neon.h"
> 
> int16x8_t
> foo ()
> {
>   return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8));
> }
> 
> Trunk generates:
> 
> foo:
>   moviv1.2s, 0
>   moviv0.4h, 0x8
>   dup d2, v1.d[0]
>   ins v2.d[1], v0.d[0]
>   orr v0.16b, v2.16b, v2.16b
>   ret
> 
> With the patch we now generate:
> 
> foo:
>   moviv1.4h, 0x8
>   moviv0.4s, 0
>   ins v0.d[1], v1.d[0]
>   ret
> 
> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk.
> 
> 2017-06-15  Michael Collison  
> 
>   * config/aarch64/aarch64-simd.md(aarch64_combine_internal):
>   Convert from define_insn_and_split into define_expand
>   * config/aarch64/aarch64.c(aarch64_split_simd_combine):
>   Allow register and subreg operands.
> 

Your changelog entry is confusing.  You've deleted the
aarch64_combine_internal pattern entirely, having merged some of
its functionality directly into its caller (aarch64_combine).

So I think it should read:

* config/aarch64/aarch64-simd.md (aarch64_combine): Directly call
aarch64_split_simd_combine.
(aarch64_combine_internal): Delete pattern.
* ...

Note also there should be a space between the file name and the open
bracket for the first function name.

Why don't you need the big-endian code path any more?

R.

> 
> pr7057.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index c462164..4a253a9 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2807,27 +2807,11 @@
>op1 = operands[1];
>op2 = operands[2];
>  }
> -  emit_insn (gen_aarch64_combine_internal (operands[0], op1, op2));
> -  DONE;
> -}
> -)
>  
> -(define_insn_and_split "aarch64_combine_internal"
> -  [(set (match_operand: 0 "register_operand" "=")
> -(vec_concat: (match_operand:VDC 1 "register_operand" "w")
> -(match_operand:VDC 2 "register_operand" "w")))]
> -  "TARGET_SIMD"
> -  "#"
> -  "&& reload_completed"
> -  [(const_int 0)]
> -{
> -  if (BYTES_BIG_ENDIAN)
> -aarch64_split_simd_combine (operands[0], operands[2], operands[1]);
> -  else
> -aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
> +  aarch64_split_simd_combine (operands[0], op1, op2);
> +
>DONE;
>  }
> -[(set_attr "type" "multiple")]
>  )
>  
>  (define_expand "aarch64_simd_combine"
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 2e385c4..46bd78b 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1650,7 +1650,8 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
>  
>gcc_assert (VECTOR_MODE_P (dst_mode));
>  
> -  if (REG_P (dst) && REG_P (src1) && REG_P (src2))
> +  if (register_operand (dst, dst_mode) && register_operand (src1, src_mode)
> +  && register_operand (src2, src_mode))
>  {
>rtx (*gen) (rtx, rtx, rtx);
>  
> 



[rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
Many parallel set insns are of the form of a single set that also sets
the condition code flags.  In this case the cost of such an insn is
normally the cost of the part that doesn't set the flags, since updating
the condition flags is simply a side effect.

At present all such insns are treated as having unknown cost (ie 0) and
combine assumes that such insns are infinitely more expensive than any
other insn sequence with a non-zero cost.

This patch addresses this problem by allowing insn_rtx_cost to ignore
the condition setting part of a PARALLEL iff there is exactly one
comparison set and one non-comparison set.  If the only set operation is
a comparison we still use that as the basis of the insn cost.

* rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
comparison set and one other set, use the cost of the
non-comparison set.

Bootstrapped on aarch64-none-linuxgnu

OK?

R.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index d9f57c3..5cae793 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5260,23 +5260,41 @@ insn_rtx_cost (rtx pat, bool speed)
   int i, cost;
   rtx set;
 
-  /* Extract the single set rtx from the instruction pattern.
- We can't use single_set since we only have the pattern.  */
+  /* Extract the single set rtx from the instruction pattern.  We
+ can't use single_set since we only have the pattern.  We also
+ consider PARALLELs of a normal set and and a single comparison.
+ In that case we use the cost of the non-comparison SET operation,
+ which is most-likely to be the real cost of this operation.  */
   if (GET_CODE (pat) == SET)
 set = pat;
   else if (GET_CODE (pat) == PARALLEL)
 {
   set = NULL_RTX;
+  rtx comparison = NULL_RTX;
+
   for (i = 0; i < XVECLEN (pat, 0); i++)
 	{
 	  rtx x = XVECEXP (pat, 0, i);
 	  if (GET_CODE (x) == SET)
 	{
-	  if (set)
-		return 0;
-	  set = x;
+	  if (GET_CODE (SET_SRC (x)) == COMPARE)
+		{
+		  if (comparison)
+		return 0;
+		  comparison = x;
+		}
+	  else
+		{
+		  if (set)
+		return 0;
+		  set = x;
+		}
 	}
 	}
+
+  if (!set && comparison)
+	set = comparison;
+
   if (!set)
 	return 0;
 }


[PATCH] ASAN: handle addressable params (PR sanitize/81040).

2017-06-19 Thread Martin Liška
Hi.

Following patch addresses issue where we have a function argument which address
is taken and -fsanitize=address does not wrap up the argument with red zone.
It's done in sanopt pass, where I create a new automatic variable which is used
in the function instead of the original argument.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
And I can bootstrap-asan on the same machine.

Ready to be installed?
Martin
>From f8a48a3f361d9914dd45c1896e8c5ba607a62b06 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 14 Jun 2017 11:40:01 +0200
Subject: [PATCH] ASAN: handle addressable params (PR sanitize/81040).

gcc/testsuite/ChangeLog:

2017-06-19  Martin Liska  

	PR sanitize/81040
	* g++.dg/asan/function-argument-1.C: New test.
	* g++.dg/asan/function-argument-2.C: New test.
	* g++.dg/asan/function-argument-3.C: New test.

gcc/ChangeLog:

2017-06-19  Martin Liska  

	PR sanitize/81040
	* sanopt.c (rewrite_usage_of_param): New function.
	(sanitize_rewrite_addressable_params): Likewise.
	(pass_sanopt::execute): Call rewrite_usage_of_param.
---
 gcc/sanopt.c| 118 
 gcc/testsuite/g++.dg/asan/function-argument-1.C |  30 ++
 gcc/testsuite/g++.dg/asan/function-argument-2.C |  24 +
 gcc/testsuite/g++.dg/asan/function-argument-3.C |  27 ++
 4 files changed, 199 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-1.C
 create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-2.C
 create mode 100644 gcc/testsuite/g++.dg/asan/function-argument-3.C

diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index 16bdba76042..10464841972 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -37,6 +37,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "cfghooks.h"
 
 /* This is used to carry information about basic blocks.  It is
attached to the AUX field of the standard CFG block.  */
@@ -858,6 +862,117 @@ sanitize_asan_mark_poison (void)
 }
 }
 
+static tree
+rewrite_usage_of_param (tree *op, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  std::pair *replacement = (std::pair *)wi->info;
+
+  if (*op == replacement->first)
+{
+  *op = replacement->second;
+  *walk_subtrees = 0;
+}
+
+  return NULL;
+}
+
+/* For a given function FUN, rewrite all addressable parameters so that
+   a new automatic variable is introduced.  Right after function entry
+   a parameter is assigned to the variable.  */
+
+static void
+sanitize_rewrite_addressable_params (function *fun)
+{
+  basic_block entry_bb = NULL;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+   arg; arg = DECL_CHAIN (arg))
+{
+  if (TREE_ADDRESSABLE (arg) && !TREE_ADDRESSABLE (TREE_TYPE (arg)))
+	{
+	  /* The parameter is no longer addressable.  */
+	  tree type = TREE_TYPE (arg);
+	  TREE_ADDRESSABLE (arg) = 0;
+
+	  /* Create a new automatic variable.  */
+	  tree var = build_decl (DECL_SOURCE_LOCATION (arg),
+ VAR_DECL, DECL_NAME (arg), type);
+	  TREE_ADDRESSABLE (var) = 1;
+	  DECL_ARTIFICIAL (var) = 1;
+	  DECL_SEEN_IN_BIND_EXPR_P (var) = 0;
+
+	  gimple_add_tmp_var (var);
+
+	  if (dump_file)
+	fprintf (dump_file,
+		 "Rewritting parameter whos address is taken: %s\n",
+		 IDENTIFIER_POINTER (DECL_NAME (arg)));
+
+	  gimple_seq stmts = NULL;
+
+	  /* Assign value of parameter to newly created variable.  */
+	  if ((TREE_CODE (type) == COMPLEX_TYPE
+	   || TREE_CODE (type) == VECTOR_TYPE))
+	{
+	  /* We need to create a SSA name that will be used for the
+		 assignment.  */
+	  tree tmp = make_ssa_name (type);
+	  gimple *g = gimple_build_assign (tmp, arg);
+	  gimple_set_location (g, DECL_SOURCE_LOCATION (arg));
+	  gimple_seq_add_stmt (, g);
+	  g = gimple_build_assign (var, tmp);
+	  gimple_set_location (g, DECL_SOURCE_LOCATION (arg));
+	  gimple_seq_add_stmt (, g);
+	}
+	  else
+	{
+	  gimple *g = gimple_build_assign (var, arg);
+	  gimple_set_location (g, DECL_SOURCE_LOCATION (arg));
+	  gimple_seq_add_stmt (, g);
+	}
+
+	  /* Replace all usages of PARM_DECL with the newly
+	 created variable VAR.  */
+	  basic_block bb;
+	  gimple_stmt_iterator gsi;
+	  FOR_EACH_BB_FN (bb, fun)
+	{
+	  std::pair replacement (arg, var);
+	  struct walk_stmt_info wi;
+	  memset (, 0, sizeof (wi));
+	  wi.info = (void *)
+
+	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
+		{
+		  gimple *stmt = gsi_stmt (gsi);
+		  gimple_stmt_iterator it = gsi_for_stmt (stmt);
+		  walk_gimple_stmt (, NULL, rewrite_usage_of_param, );
+		}
+	  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
+		{

Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
Hi!

On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
> Many parallel set insns are of the form of a single set that also sets
> the condition code flags.  In this case the cost of such an insn is
> normally the cost of the part that doesn't set the flags, since updating
> the condition flags is simply a side effect.
> 
> At present all such insns are treated as having unknown cost (ie 0) and
> combine assumes that such insns are infinitely more expensive than any
> other insn sequence with a non-zero cost.

That's not what combine does: it optimistically assumes any combination
with unknown costs is an improvement.

> This patch addresses this problem by allowing insn_rtx_cost to ignore
> the condition setting part of a PARALLEL iff there is exactly one
> comparison set and one non-comparison set.  If the only set operation is
> a comparison we still use that as the basis of the insn cost.

I'll test this on a zillion archs, see what the effect is.

Have you considered costing general parallels as well?


Segher


[PR c++/81119] Wshadow regression

2017-06-19 Thread Nathan Sidwell
This fixes pr 81119.  My rewriting of name lookup was a little too eager 
to warn about constructor hiding.  This restores the earlier behaviour 
of only warning when hiding via a function.


nathan
--
Nathan Sidwell
2017-06-19  Nathan Sidwell  

	PR c++/81119
	* name-lookup.c (update_binding): Only warn about constructors
	hidden by functions.

	PR c++/81119
	* g++.dg/warn/pr81119.C: New.

Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 249364)
+++ cp/name-lookup.c	(working copy)
@@ -1784,6 +1784,14 @@ update_binding (cp_binding_level *level,
   else
 	goto conflict;
 
+  if (to_type != old_type
+	  && warn_shadow
+	  && MAYBE_CLASS_TYPE_P (TREE_TYPE (to_type))
+	  && !(DECL_IN_SYSTEM_HEADER (decl)
+	   && DECL_IN_SYSTEM_HEADER (to_type)))
+	warning (OPT_Wshadow, "%q#D hides constructor for %q#D",
+		 decl, to_type);
+
   to_val = ovl_insert (decl, old);
 }
   else if (!old)
@@ -1849,21 +1857,6 @@ update_binding (cp_binding_level *level,
 	  add_decl_to_level (level, to_add);
 	}
 
-  if (to_type != old_type)
-	{
-	  gcc_checking_assert (!old_type
-			   && TREE_CODE (to_type) == TYPE_DECL
-			   && DECL_ARTIFICIAL (to_type));
-
-	  tree type = TREE_TYPE (to_type);
-	  if (to_type != decl
-	  && MAYBE_CLASS_TYPE_P (type) && warn_shadow
-	  && (!DECL_IN_SYSTEM_HEADER (decl)
-		  || !DECL_IN_SYSTEM_HEADER (to_type)))
-	warning (OPT_Wshadow, "%q#D hides constructor for %q#T",
-		 decl, type);
-	}
-
   if (slot)
 	{
 	  if (STAT_HACK_P (*slot))
Index: testsuite/g++.dg/warn/pr81119.C
===
--- testsuite/g++.dg/warn/pr81119.C	(nonexistent)
+++ testsuite/g++.dg/warn/pr81119.C	(working copy)
@@ -0,0 +1,20 @@
+// PR c++/81119 Wshadow regression
+// { dg-additional-options "-Wshadow" }
+
+struct A;
+typedef A A; // No warning, does not hide
+
+struct B; // { dg-message "previous" }
+typedef int B; // { dg-error "conflicting" }
+
+struct C;
+void C (); // { dg-warning "hides constructor" }
+void C (int); // warning not repeated
+
+struct D;
+int D; // no warning, not a function
+
+struct E;
+
+enum X 
+  {E}; // no warning, not a function


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags.  In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
> 
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.

So try this testcase on ARM.

unsigned long x, y, z;
int b;
void test()
{
   b = __builtin_sub_overflow (y,z, );
}


Without the patch, combine rips apart a compare and subtract insn
because it sees it as having cost zero and substitutes it with separate
compare and subtract insns.

ie before:


ldr r3, .L5
ldr r2, .L5+4
ldr r3, [r3]
ldr r2, [r2]
cmp r3, r2<=
movcs   r0, #0
movcc   r0, #1
ldr ip, .L5+8
ldr r1, .L5+12
sub r3, r3, r2  <=
str r3, [ip]
str r0, [r1]
bx  lr

after:

ldr r3, .L5
ldr r2, .L5+4
ldr r3, [r3]
ldr r2, [r2]
subsr3, r3, r2  <
movcc   r1, #1
movcs   r1, #0
ldr r0, .L5+8
ldr r2, .L5+12
str r3, [r0]
str r1, [r2]
bx  lr

The combine log before the patch shows:

allowing combination of insns 10 and 51
original costs 0 + 8 = 0
replacement costs 4 + 12 = 16

So it is clearly deciding that the original costs are greater than the
replacement costs.

> 
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set.  If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
> 
> I'll test this on a zillion archs, see what the effect is.
> 
> Have you considered costing general parallels as well?
> 
> 

I thought about it but concluded that there's no generically correct
answer.  It might be the max of all the individual sets or it might be
the sum, or it might be somewhere in between.  For example on ARM the
load/store multiple operations are expressed as parallels, but their
cost will depend on how many loads/stores happen in parallel within the
hardware.

I think we'd need a new back-end hook to handle the other cases sensibly.

R.

> Segher
> 



Re: [PATCH] Initialize live_switch_vars for SWITCH_BODY == STATEMENT_LIST (PR sanitizer/80879).

2017-06-19 Thread Jakub Jelinek
On Fri, May 26, 2017 at 01:05:28PM +0200, Martin Liška wrote:
> Hello.
> 
> Unfortunately I guarded use-after-scope to track live switch variables just
> to BIND_EXPR. However the bind expression can be included in a STATEMENT_LIST.
> That enables proper tracking and fixes the test added.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From a7f63e228118b3f256d9e774fdeeb8c85c0da437 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 25 May 2017 17:53:06 +0200
> Subject: [PATCH] Initialize live_switch_vars for SWITCH_BODY == STATEMENT_LIST
>  (PR sanitizer/80879).
> 
> gcc/ChangeLog:
> 
> 2017-05-25  Martin Liska  
> 
>   * gimplify.c (gimplify_switch_expr):
>   Initialize live_switch_vars for SWITCH_BODY == STATEMENT_LIST.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-05-25  Martin Liska  
> 
>   * gcc.dg/asan/use-after-scope-switch-4.c: New test.

Ok, thanks.

Jakub


Re: [PATCH] ASAN: handle addressable params (PR sanitize/81040).

2017-06-19 Thread Jakub Jelinek
On Mon, Jun 19, 2017 at 03:50:42PM +0200, Martin Liška wrote:
> @@ -858,6 +862,117 @@ sanitize_asan_mark_poison (void)
>  }
>  }
>  

Missing function comment.

> +static tree
> +rewrite_usage_of_param (tree *op, int *walk_subtrees, void *data)
> +{
> +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> +  std::pair *replacement = (std::pair *)wi->info;

Missing space after )

> +
> +  if (*op == replacement->first)
> +{
> +  *op = replacement->second;
> +  *walk_subtrees = 0;
> +}
> +
> +  return NULL;
> +}

> +static void
> +sanitize_rewrite_addressable_params (function *fun)
> +{
> +  basic_block entry_bb = NULL;
> +
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +   arg; arg = DECL_CHAIN (arg))
> +{
> +  if (TREE_ADDRESSABLE (arg) && !TREE_ADDRESSABLE (TREE_TYPE (arg)))
> + {
> +   /* The parameter is no longer addressable.  */
> +   tree type = TREE_TYPE (arg);
> +   TREE_ADDRESSABLE (arg) = 0;
> +
> +   /* Create a new automatic variable.  */
> +   tree var = build_decl (DECL_SOURCE_LOCATION (arg),
> +  VAR_DECL, DECL_NAME (arg), type);
> +   TREE_ADDRESSABLE (var) = 1;
> +   DECL_ARTIFICIAL (var) = 1;
> +   DECL_SEEN_IN_BIND_EXPR_P (var) = 0;

I think it is highly inefficient to walk the whole IL for every addressable
argument.  Can't you first find out what PARM_DECLs you need to change,
stick the corresponding VAR_DECL somewhere (dunno, e.g. a vector with pairs
perhaps sorted by DECL_UID, or stick it into DECL_VALUE_EXPR or whatever),
then if you create at least one, walk whole IL and replace all the
PARM_DECLs you want to replace, then finally clear the TREE_ADDRESSABLE
flag for all of them and emit the initialization sequence?
Then something needs to be done for debugging too.  If it is without VTA,
then probably just having DECL_VALUE_EXPR is good enough, otherwise
(VTA) you probably don't want that (or can reset it at that point), but
instead emit after the initialization stmt a debug stmt that the variable
value now lives in a different var.  Though ideally we want the debugger
to be able to also change the value of the var, that might be harder.
With DECL_VALUE_EXPR on the other side the debug info will be incorrect in
the prologue until it is assigned to the slot.

> +
> +   gimple_add_tmp_var (var);
> +
> +   if (dump_file)
> + fprintf (dump_file,
> +  "Rewritting parameter whos address is taken: %s\n",
> +  IDENTIFIER_POINTER (DECL_NAME (arg)));

s/tting/ting/, s/whos/whose/ 
> +
> +   gimple_seq stmts = NULL;
> +
> +   /* Assign value of parameter to newly created variable.  */
> +   if ((TREE_CODE (type) == COMPLEX_TYPE
> +|| TREE_CODE (type) == VECTOR_TYPE))
> + {
> +   /* We need to create a SSA name that will be used for the
> +  assignment.  */
> +   tree tmp = make_ssa_name (type);
> +   gimple *g = gimple_build_assign (tmp, arg);
> +   gimple_set_location (g, DECL_SOURCE_LOCATION (arg));
> +   gimple_seq_add_stmt (, g);
> +   g = gimple_build_assign (var, tmp);
> +   gimple_set_location (g, DECL_SOURCE_LOCATION (arg));
> +   gimple_seq_add_stmt (, g);
> + }
> +   else
> + {
> +   gimple *g = gimple_build_assign (var, arg);
> +   gimple_set_location (g, DECL_SOURCE_LOCATION (arg));
> +   gimple_seq_add_stmt (, g);
> + }

I don't understand the distinction.  If you turn the original parm
for complex/vector DECL_GIMPLE_REG_P, you should need the exact same code
(but I think it would be better to use the default SSA_NAME of the PARM_DECL
if it is a gimple reg type, rather than use the PARM_DECL itself
and wait for update_ssa).

Jakub


Re: [PATCH] Fix PR81112

2017-06-19 Thread Jakub Jelinek
On Mon, Jun 19, 2017 at 09:16:58AM +0200, Richard Biener wrote:
> 
> The following fixes an ommision in find_constructor_constant_at_offset
> to handle RANGE_EXPR in array constructor indices.  The handling is
> conservative in that it only handles the first index in the range.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
> sofar.
> 
> Richard.
> 
> 2017-06-19  Richard Biener  
> 
>   PR ipa/81112
>   * ipa-prop.c (find_constructor_constant_at_offset): Handle
>   RANGE_EXPR conservatively.
> 
>   * g++.dg/torture/pr81112.C: New testcase.

The testcase fails on i686-linux, fixed thusly, tested on
{x86_64,i686}-linux, committed to trunk.

2017-06-19  Jakub Jelinek  

PR ipa/81112
* g++.dg/torture/pr81112.C: Add -Wno-psabi to dg-additional-options.

--- gcc/testsuite/g++.dg/torture/pr81112.C.jj   2017-06-19 16:15:41.0 
+0200
+++ gcc/testsuite/g++.dg/torture/pr81112.C  2017-06-19 16:32:37.812078027 
+0200
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-additional-options "-Wno-psabi" }
 
 class AssertionResult {
 bool success_;


Jakub


Re: [PATCH] Fix yet another -fsanitize=undefined ubsan_encode_value ICE (PR sanitizer/81125)

2017-06-19 Thread Richard Biener
On Mon, 19 Jun 2017, Jakub Jelinek wrote:

> Hi!
> 
> And here is another ICE.  While we have a current_function_decl
> in this case, still create_tmp_var's called gimple_add_tmp_var
> and mark_addressable don't work too well when the current function
> is a C++ ctor or dtor that the FE then duplicates.
> 
> Fixed by telling ubsan_encode_value whether it is called from the
> FE (when it shouldn't use gimple_add_tmp_var nor mark_addressable
> and should use TARGET_EXPR), or from GIMPLE passes (when it should
> do what it did before with in_expand_p == false) or from RTL expansion
> (when it should do what it did with in_expand_p == true).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ugh.

Ok.

Thanks,
Richard.

> 2017-06-19  Jakub Jelinek  
> 
>   PR sanitizer/81125
>   * ubsan.h (enum ubsan_encode_value_phase): New.
>   (ubsan_encode_value): Change second argument to
>   enum ubsan_encode_value_phase with default value of
>   UBSAN_ENCODE_VALUE_GENERIC.
>   * ubsan.c (ubsan_encode_value): Change second argument to
>   enum ubsan_encode_value_phase PHASE from bool IN_EXPAND_P,
>   adjust uses, for UBSAN_ENCODE_VALUE_GENERIC use just
>   create_tmp_var_raw instead of create_tmp_var and use a
>   TARGET_EXPR.
>   (ubsan_expand_bounds_ifn, ubsan_build_overflow_builtin,
>   instrument_bool_enum_load, ubsan_instrument_float_cast): Adjust
>   ubsan_encode_value callers.
> 
>   * g++.dg/ubsan/pr81125.C: New test.
> 
> --- gcc/ubsan.h.jj2017-06-19 08:26:17.0 +0200
> +++ gcc/ubsan.h   2017-06-19 10:06:52.723664974 +0200
> @@ -42,6 +42,13 @@ enum ubsan_print_style {
>UBSAN_PRINT_ARRAY
>  };
>  
> +/* This controls ubsan_encode_value behavior.  */
> +enum ubsan_encode_value_phase {
> +  UBSAN_ENCODE_VALUE_GENERIC,
> +  UBSAN_ENCODE_VALUE_GIMPLE,
> +  UBSAN_ENCODE_VALUE_RTL
> +};
> +
>  extern bool ubsan_expand_bounds_ifn (gimple_stmt_iterator *);
>  extern bool ubsan_expand_null_ifn (gimple_stmt_iterator *);
>  extern bool ubsan_expand_objsize_ifn (gimple_stmt_iterator *);
> @@ -49,7 +56,8 @@ extern bool ubsan_expand_vptr_ifn (gimpl
>  extern bool ubsan_instrument_unreachable (gimple_stmt_iterator *);
>  extern tree ubsan_create_data (const char *, int, const location_t *, ...);
>  extern tree ubsan_type_descriptor (tree, enum ubsan_print_style = 
> UBSAN_PRINT_NORMAL);
> -extern tree ubsan_encode_value (tree, bool = false);
> +extern tree ubsan_encode_value (tree, enum ubsan_encode_value_phase
> +   = UBSAN_ENCODE_VALUE_GENERIC);
>  extern bool is_ubsan_builtin_p (tree);
>  extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree,
> tree, tree *);
> --- gcc/ubsan.c.jj2017-06-19 10:27:33.0 +0200
> +++ gcc/ubsan.c   2017-06-19 10:13:53.434541556 +0200
> @@ -114,10 +114,10 @@ decl_for_type_insert (tree type, tree de
>  /* Helper routine, which encodes a value in the pointer_sized_int_node.
> Arguments with precision <= POINTER_SIZE are passed directly,
> the rest is passed by reference.  T is a value we are to encode.
> -   IN_EXPAND_P is true if this function is called during expansion.  */
> +   PHASE determines when this function is called.  */
>  
>  tree
> -ubsan_encode_value (tree t, bool in_expand_p)
> +ubsan_encode_value (tree t, enum ubsan_encode_value_phase phase)
>  {
>tree type = TREE_TYPE (t);
>const unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
> @@ -144,7 +144,7 @@ ubsan_encode_value (tree t, bool in_expa
> /* The reason for this is that we don't want to pessimize
>code by making vars unnecessarily addressable.  */
> tree var;
> -   if (current_function_decl)
> +   if (phase != UBSAN_ENCODE_VALUE_GENERIC)
>   {
> var = create_tmp_var (type);
> mark_addressable (var);
> @@ -154,7 +154,7 @@ ubsan_encode_value (tree t, bool in_expa
> var = create_tmp_var_raw (type);
> TREE_ADDRESSABLE (var) = 1;
>   }
> -   if (in_expand_p)
> +   if (phase == UBSAN_ENCODE_VALUE_RTL)
>   {
> rtx mem
>   = assign_stack_temp_for_type (TYPE_MODE (type),
> @@ -164,7 +164,7 @@ ubsan_encode_value (tree t, bool in_expa
> expand_assignment (var, t, false);
> return build_fold_addr_expr (var);
>   }
> -   if (current_function_decl)
> +   if (phase != UBSAN_ENCODE_VALUE_GENERIC)
>   {
> tree tem = build2 (MODIFY_EXPR, void_type_node, var, t);
> t = build_fold_addr_expr (var);
> @@ -725,9 +725,9 @@ ubsan_expand_bounds_ifn (gimple_stmt_ite
> ? BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS
> : BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT;
>tree fn = builtin_decl_explicit (bcode);
> -  tree val
> - = force_gimple_operand_gsi (gsi, ubsan_encode_value 

[PATCH] Fix PR81118

2017-06-19 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-06-19  Richard Biener  

PR middle-end/81118
* tree-cfgcleanup.c (cleanup_tree_cfg_noloop): Clear niter
estimates if we changed anything.

* gcc.dg/torture/pr81118.c: New testcase.

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 249358)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -839,7 +839,12 @@ cleanup_tree_cfg_noloop (void)
   timevar_pop (TV_TREE_CLEANUP_CFG);
 
   if (changed && current_loops)
-loops_state_set (LOOPS_NEED_FIXUP);
+{
+  /* Removing edges and/or blocks may make recorded bounds refer
+ to stale GIMPLE stmts now, so clear them.  */
+  free_numbers_of_iterations_estimates (cfun);
+  loops_state_set (LOOPS_NEED_FIXUP);
+}
 
   return changed;
 }
Index: gcc/testsuite/gcc.dg/torture/pr81118.c
===
--- gcc/testsuite/gcc.dg/torture/pr81118.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr81118.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-w" } */
+
+int a[7], b;
+int c()
+{
+  int d, e;
+  for (; d; d++)
+if (a[d])
+  if (b)
+   return;
+  else if (d >= e)
+   return 0;
+}


Re: [PR target/25111] New patterns for m68k bit insns

2017-06-19 Thread Andreas Schwab
On Nov 19 2016, Jeff Law  wrote:

> diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
> index 7b7f373..2085619 100644
> --- a/gcc/config/m68k/m68k.md
> +++ b/gcc/config/m68k/m68k.md
> @@ -5336,6 +5336,45 @@
>  }
>[(set_attr "type" "bitrw")])
>  
> +(define_insn "*bsetdreg"
> +  [(set (match_operand:SI 0 "register_operand" "+d")

I think you want "=d" insted of "+d".  That fixes PR 80970.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH GCC][06/13]Preserve loop nest in whole distribution life time

2017-06-19 Thread Bin.Cheng
On Tue, Jun 13, 2017 at 12:08 PM, Richard Biener
 wrote:
> On Tue, Jun 13, 2017 at 1:06 PM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng  wrote:
>>> Hi,
>>> This simple patch computes and preserves loop nest vector for whole 
>>> distribution
>>> life time.  The loop nest will be used multiple times in on-demand data 
>>> dependence
>>> computation.
>>>
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Don't like it too much but I guess we can see if refactoring it back
>> to pass down
>> loop_nest can work.
>>
>> Ok.
>
> Oh.
>
> +/* The loop (nest) to be distributed.  */
> +static vec *loop_nest;
> +
>
> please make it
>
> static vec loop_nest;
>
> instead to avoid a pointless indirection (vec<> just contains a
> pointer to allocated storage).
Hi Richard,
This is the updated patch according to your comment, is it OK?

Thanks,
bin

2017-06-17  Bin Cheng  

* tree-loop-distribution.c (loop_nest): New global var.
(build_rdg): Use loop directly, rather than loop nest.
(pg_add_dependence_edges): Remove loop nest parameter.  Use global
variable directly.
(distribute_loop): Compute global variable loop nest.  Update use.
From 6acd21a433606955b756ada75a33f3f61e2e0b6c Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 11:56:28 +0100
Subject: [PATCH 05/13] loop-nest-20170609.txt

---
 gcc/tree-loop-distribution.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index f409e94..8183090 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -66,6 +66,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 
 
+/* The loop (nest) to be distributed.  */
+static vec loop_nest;
+
 /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
 struct rdg_vertex
 {
@@ -454,22 +457,22 @@ free_rdg (struct graph *rdg)
   free_graph (rdg);
 }
 
-/* Build the Reduced Dependence Graph (RDG) with one vertex per
-   statement of the loop nest LOOP_NEST, and one edge per data dependence or
-   scalar dependence.  */
+/* Build the Reduced Dependence Graph (RDG) with one vertex per statement of
+   LOOP, and one edge per flow dependence or control dependence from control
+   dependence CD.  */
 
 static struct graph *
-build_rdg (vec loop_nest, control_dependences *cd)
+build_rdg (struct loop *loop, control_dependences *cd)
 {
   struct graph *rdg;
   vec datarefs;
 
   /* Create the RDG vertices from the stmts of the loop nest.  */
   auto_vec stmts;
-  stmts_from_loop (loop_nest[0], );
+  stmts_from_loop (loop, );
   rdg = new_graph (stmts.length ());
   datarefs.create (10);
-  if (!create_rdg_vertices (rdg, stmts, loop_nest[0], ))
+  if (!create_rdg_vertices (rdg, stmts, loop, ))
 {
   datarefs.release ();
   free_rdg (rdg);
@@ -479,7 +482,7 @@ build_rdg (vec loop_nest, control_dependences *cd)
 
   create_rdg_flow_edges (rdg);
   if (cd)
-create_rdg_cd_edges (rdg, cd, loop_nest[0]);
+create_rdg_cd_edges (rdg, cd, loop);
 
   datarefs.release ();
 
@@ -1418,7 +1421,7 @@ partition_contains_all_rw (struct graph *rdg,
and DRS2 and modify and return DIR according to that.  */
 
 static int
-pg_add_dependence_edges (struct graph *rdg, vec loops, int dir,
+pg_add_dependence_edges (struct graph *rdg, int dir,
 			 vec drs1,
 			 vec drs2)
 {
@@ -1439,8 +1442,8 @@ pg_add_dependence_edges (struct graph *rdg, vec loops, int dir,
 	std::swap (dr1, dr2);
 	this_dir = -this_dir;
 	  }
-	ddr = initialize_data_dependence_relation (dr1, dr2, loops);
-	compute_affine_dependence (ddr, loops[0]);
+	ddr = initialize_data_dependence_relation (dr1, dr2, loop_nest);
+	compute_affine_dependence (ddr, loop_nest[0]);
 	if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
 	  this_dir = 2;
 	else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE)
@@ -1508,11 +1511,14 @@ distribute_loop (struct loop *loop, vec stmts,
 
   *destroy_p = false;
   *nb_calls = 0;
-  auto_vec loop_nest;
+  loop_nest.create (0);
   if (!find_loop_nest (loop, _nest))
-return 0;
+{
+  loop_nest.release ();
+  return 0;
+}
 
-  rdg = build_rdg (loop_nest, cd);
+  rdg = build_rdg (loop, cd);
   if (!rdg)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -1520,6 +1526,7 @@ distribute_loop (struct loop *loop, vec stmts,
 		 "Loop %d not distributed: failed to build the RDG.\n",
 		 loop->num);
 
+  loop_nest.release ();
   return 0;
 }
 
@@ -1643,15 +1650,15 @@ distribute_loop (struct loop *loop, vec stmts,
 	/* dependence direction - 0 is no dependence, -1 is back,
 	   1 is forth, 2 is both (we can stop then, merging will occur).  */
 	int dir = 0;
-	dir = pg_add_dependence_edges (rdg, loop_nest, dir,
+	

[PATCH] Fix -fsanitize=undefined ubsan_encode_value ICE (PR sanitizer/81111)

2017-06-19 Thread Jakub Jelinek
Hi!

Martin's recent patch that introduced sanitize_flags_p causes us to
instrument operations even when current_function_decl is NULL.  If it
is valid constant expression it will be folded away soon, otherwise
usually we emit a runtime initializer in the static ctors function for
it.  In any case, neither gimple_add_tmp_var that create_tmp_var calls
normark_addressable actually work in that case, fixed thusly,
bootstrapped/regtested on x86_64-linux and i686-linux plus
bootstrapped/regtested with bootstrap-ubsan, ok for trunk?

2017-06-19  Jakub Jelinek  

PR sanitizer/8
* ubsan.c (ubsan_encode_value): If current_function_decl is NULL,
use create_tmp_var_raw instead of create_tmp_var, mark it addressable
just by setting TREE_ADDRESSABLE on the result and use a TARGET_EXPR.

* g++.dg/ubsan/pr8.C: New test.

--- gcc/ubsan.c.jj  2017-06-16 13:27:48.0 +0200
+++ gcc/ubsan.c 2017-06-16 16:28:29.099155949 +0200
@@ -145,9 +145,17 @@ ubsan_encode_value (tree t, bool in_expa
{
  /* The reason for this is that we don't want to pessimize
 code by making vars unnecessarily addressable.  */
- tree var = create_tmp_var (type);
- tree tem = build2 (MODIFY_EXPR, void_type_node, var, t);
- mark_addressable (var);
+ tree var;
+ if (current_function_decl)
+   {
+ var = create_tmp_var (type);
+ mark_addressable (var);
+   }
+ else
+   {
+ var = create_tmp_var_raw (type);
+ TREE_ADDRESSABLE (var) = 1;
+   }
  if (in_expand_p)
{
  rtx mem
@@ -158,8 +166,17 @@ ubsan_encode_value (tree t, bool in_expa
  expand_assignment (var, t, false);
  return build_fold_addr_expr (var);
}
- t = build_fold_addr_expr (var);
- return build2 (COMPOUND_EXPR, TREE_TYPE (t), tem, t);
+ if (current_function_decl)
+   {
+ tree tem = build2 (MODIFY_EXPR, void_type_node, var, t);
+ t = build_fold_addr_expr (var);
+ return build2 (COMPOUND_EXPR, TREE_TYPE (t), tem, t);
+   }
+ else
+   {
+ var = build4 (TARGET_EXPR, type, var, t, NULL_TREE, NULL_TREE);
+ return build_fold_addr_expr (var);
+   }
}
   else
return build_fold_addr_expr (t);
--- gcc/testsuite/g++.dg/ubsan/pr8.C.jj 2017-06-16 15:39:57.752886010 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr8.C2017-06-16 15:39:37.0 
+0200
@@ -0,0 +1,45 @@
+// PR sanitizer/8
+// { dg-do compile }
+// { dg-options "-fsanitize=shift" }
+
+template 
+struct N
+{
+  static const V m = (((V)(-1) < 0)
+ ? (V)1 << (sizeof(V) * __CHAR_BIT__ - ((V)(-1) < 0))
+ : (V) 0);
+};
+
+template
+const V N::m;
+
+template 
+struct O
+{
+  static const V m = (V)1 << sizeof(V) * __CHAR_BIT__;
+};
+
+template
+const V O::m;
+
+void
+foo ()
+{
+  N::m;
+  N::m;
+#ifdef __SIZEOF_INT128__
+  N<__int128>::m;
+  N::m;
+#endif
+}
+
+void
+bar ()
+{
+  O::m;
+  O::m;
+#ifdef __SIZEOF_INT128__
+  O<__int128>::m;
+  O::m;
+#endif
+}

Jakub


Re: [PATCH GCC][09/13]Simply cost model merges partitions with the same references

2017-06-19 Thread Bin.Cheng
On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
 wrote:
> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>> Hi,
>> Current primitive cost model merges partitions with data references sharing 
>> the same
>> base address.  I believe it's designed to maximize data reuse in 
>> distribution, but
>> that should be done by dedicated data reusing algorithm.  At this stage of 
>> merging,
>> we should be conservative and only merge partitions with the same references.
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> Well, I'd say "conservative" is merging more, not less.  For example
> splitting a[i+1] from a[i]
> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
> merging.  Maybe
> DR_INIT within a cacheline or so.
>
> How many extra distributions in say SPEC do you get from this change alone?
Hi,
I collected data for spec2006 only with/without this patch.  I am a
bit surprised that it doesn't change the number of distributed loops.
>
> It shows also that having partition->reads_and_writes would be nice
> ...  the code duplication
Yeah, I merged read/write data references in previous patch, now this
duplication is gone.  Update patch attached.  Is it OK?

Thanks.
bin
2017-06-07  Bin Cheng  

* tree-loop-distribution.c (ref_base_address): Delete.
(similar_memory_accesses): Rename ...
(share_memory_accesses): ... to this.  Check if partitions access
the same memory reference.
(distribute_loop): Call share_memory_accesses.
From 98af3ab3b309ac7e7b2fb3c6b55eb19a9004225c Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 12:41:36 +0100
Subject: [PATCH 08/13] share-memory-access-20170608.txt

---
 gcc/tree-loop-distribution.c | 71 
 1 file changed, 33 insertions(+), 38 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 03bb735..2e5a828 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1268,30 +1268,16 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition)
 }
 }
 
-/* For a data reference REF, return the declaration of its base
-   address or NULL_TREE if the base is not determined.  */
-
-static tree
-ref_base_address (data_reference_p dr)
-{
-  tree base_address = DR_BASE_ADDRESS (dr);
-  if (base_address
-  && TREE_CODE (base_address) == ADDR_EXPR)
-return TREE_OPERAND (base_address, 0);
-
-  return base_address;
-}
-
-/* Returns true when PARTITION1 and PARTITION2 have similar memory
-   accesses in RDG.  */
+/* Returns true when PARTITION1 and PARTITION2 access the same memory
+   object in RDG.  */
 
 static bool
-similar_memory_accesses (struct graph *rdg, partition *partition1,
-			 partition *partition2)
+share_memory_accesses (struct graph *rdg,
+		   partition *partition1, partition *partition2)
 {
-  unsigned i, j, k, l;
+  unsigned i, j;
   bitmap_iterator bi, bj;
-  data_reference_p ref1, ref2;
+  data_reference_p dr1, dr2;
 
   /* First check whether in the intersection of the two partitions are
  any loads or stores.  Common loads are the situation that happens
@@ -1301,23 +1287,32 @@ similar_memory_accesses (struct graph *rdg, partition *partition1,
 	|| RDG_MEM_READS_STMT (rdg, i))
   return true;
 
-  /* Then check all data-references against each other.  */
-  EXECUTE_IF_SET_IN_BITMAP (partition1->stmts, 0, i, bi)
-if (RDG_MEM_WRITE_STMT (rdg, i)
-	|| RDG_MEM_READS_STMT (rdg, i))
-  EXECUTE_IF_SET_IN_BITMAP (partition2->stmts, 0, j, bj)
-	if (RDG_MEM_WRITE_STMT (rdg, j)
-	|| RDG_MEM_READS_STMT (rdg, j))
-	  {
-	FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, i), k, ref1)
-	  {
-		tree base1 = ref_base_address (ref1);
-		if (base1)
-		  FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, j), l, ref2)
-		if (base1 == ref_base_address (ref2))
-		  return true;
-	  }
-	  }
+  /* Then check whether the two partitions access the same memory object.  */
+  EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
+{
+  gcc_assert (i < datarefs_vec.length ());
+  dr1 = datarefs_vec[i];
+
+  if (!DR_BASE_ADDRESS (dr1)
+	  || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
+	continue;
+
+  EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
+	{
+	  gcc_assert (j < datarefs_vec.length ());
+	  dr2 = datarefs_vec[j];
+
+	  if (!DR_BASE_ADDRESS (dr2)
+	  || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
+	continue;
+
+	  if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
+	  && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
+	  && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
+	  && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
+	return true;
+	}
+}
 
   return false;
 }
@@ -1654,7 +1649,7 @@ distribute_loop (struct loop *loop, vec stmts,
   for (int j = i + 1;
 	   

[PATCH] Fix yet another -fsanitize=undefined ubsan_encode_value ICE (PR sanitizer/81125)

2017-06-19 Thread Jakub Jelinek
Hi!

And here is another ICE.  While we have a current_function_decl
in this case, still create_tmp_var's called gimple_add_tmp_var
and mark_addressable don't work too well when the current function
is a C++ ctor or dtor that the FE then duplicates.

Fixed by telling ubsan_encode_value whether it is called from the
FE (when it shouldn't use gimple_add_tmp_var nor mark_addressable
and should use TARGET_EXPR), or from GIMPLE passes (when it should
do what it did before with in_expand_p == false) or from RTL expansion
(when it should do what it did with in_expand_p == true).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-06-19  Jakub Jelinek  

PR sanitizer/81125
* ubsan.h (enum ubsan_encode_value_phase): New.
(ubsan_encode_value): Change second argument to
enum ubsan_encode_value_phase with default value of
UBSAN_ENCODE_VALUE_GENERIC.
* ubsan.c (ubsan_encode_value): Change second argument to
enum ubsan_encode_value_phase PHASE from bool IN_EXPAND_P,
adjust uses, for UBSAN_ENCODE_VALUE_GENERIC use just
create_tmp_var_raw instead of create_tmp_var and use a
TARGET_EXPR.
(ubsan_expand_bounds_ifn, ubsan_build_overflow_builtin,
instrument_bool_enum_load, ubsan_instrument_float_cast): Adjust
ubsan_encode_value callers.

* g++.dg/ubsan/pr81125.C: New test.

--- gcc/ubsan.h.jj  2017-06-19 08:26:17.0 +0200
+++ gcc/ubsan.h 2017-06-19 10:06:52.723664974 +0200
@@ -42,6 +42,13 @@ enum ubsan_print_style {
   UBSAN_PRINT_ARRAY
 };
 
+/* This controls ubsan_encode_value behavior.  */
+enum ubsan_encode_value_phase {
+  UBSAN_ENCODE_VALUE_GENERIC,
+  UBSAN_ENCODE_VALUE_GIMPLE,
+  UBSAN_ENCODE_VALUE_RTL
+};
+
 extern bool ubsan_expand_bounds_ifn (gimple_stmt_iterator *);
 extern bool ubsan_expand_null_ifn (gimple_stmt_iterator *);
 extern bool ubsan_expand_objsize_ifn (gimple_stmt_iterator *);
@@ -49,7 +56,8 @@ extern bool ubsan_expand_vptr_ifn (gimpl
 extern bool ubsan_instrument_unreachable (gimple_stmt_iterator *);
 extern tree ubsan_create_data (const char *, int, const location_t *, ...);
 extern tree ubsan_type_descriptor (tree, enum ubsan_print_style = 
UBSAN_PRINT_NORMAL);
-extern tree ubsan_encode_value (tree, bool = false);
+extern tree ubsan_encode_value (tree, enum ubsan_encode_value_phase
+ = UBSAN_ENCODE_VALUE_GENERIC);
 extern bool is_ubsan_builtin_p (tree);
 extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree,
  tree, tree *);
--- gcc/ubsan.c.jj  2017-06-19 10:27:33.0 +0200
+++ gcc/ubsan.c 2017-06-19 10:13:53.434541556 +0200
@@ -114,10 +114,10 @@ decl_for_type_insert (tree type, tree de
 /* Helper routine, which encodes a value in the pointer_sized_int_node.
Arguments with precision <= POINTER_SIZE are passed directly,
the rest is passed by reference.  T is a value we are to encode.
-   IN_EXPAND_P is true if this function is called during expansion.  */
+   PHASE determines when this function is called.  */
 
 tree
-ubsan_encode_value (tree t, bool in_expand_p)
+ubsan_encode_value (tree t, enum ubsan_encode_value_phase phase)
 {
   tree type = TREE_TYPE (t);
   const unsigned int bitsize = GET_MODE_BITSIZE (TYPE_MODE (type));
@@ -144,7 +144,7 @@ ubsan_encode_value (tree t, bool in_expa
  /* The reason for this is that we don't want to pessimize
 code by making vars unnecessarily addressable.  */
  tree var;
- if (current_function_decl)
+ if (phase != UBSAN_ENCODE_VALUE_GENERIC)
{
  var = create_tmp_var (type);
  mark_addressable (var);
@@ -154,7 +154,7 @@ ubsan_encode_value (tree t, bool in_expa
  var = create_tmp_var_raw (type);
  TREE_ADDRESSABLE (var) = 1;
}
- if (in_expand_p)
+ if (phase == UBSAN_ENCODE_VALUE_RTL)
{
  rtx mem
= assign_stack_temp_for_type (TYPE_MODE (type),
@@ -164,7 +164,7 @@ ubsan_encode_value (tree t, bool in_expa
  expand_assignment (var, t, false);
  return build_fold_addr_expr (var);
}
- if (current_function_decl)
+ if (phase != UBSAN_ENCODE_VALUE_GENERIC)
{
  tree tem = build2 (MODIFY_EXPR, void_type_node, var, t);
  t = build_fold_addr_expr (var);
@@ -725,9 +725,9 @@ ubsan_expand_bounds_ifn (gimple_stmt_ite
  ? BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS
  : BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT;
   tree fn = builtin_decl_explicit (bcode);
-  tree val
-   = force_gimple_operand_gsi (gsi, ubsan_encode_value (orig_index), true,
-   NULL_TREE, true, GSI_SAME_STMT);
+  tree val = ubsan_encode_value (orig_index, UBSAN_ENCODE_VALUE_GIMPLE);
+  val = force_gimple_operand_gsi (gsi, 

Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets

2017-06-19 Thread Thomas Preudhomme



On 19/06/17 15:31, Christophe Lyon wrote:

On 19 June 2017 at 16:11, Thomas Preudhomme
 wrote:



On 19/06/17 10:16, Thomas Preudhomme wrote:




On 19/06/17 08:41, Christophe Lyon wrote:


Hi Thomas,


On 15 June 2017 at 18:18, Thomas Preudhomme
 wrote:


Hi,

Conditions checked for ARM targets in vector-related effective targets
are inconsistent:

* sometimes arm*-*-* is checked
* sometimes Neon is checked
* sometimes arm_neon_ok and sometimes arm_neon is used for neon check
* sometimes check_effective_target_* is used, sometimes
is-effective-target

This patch consolidate all of these check into using is-effective-target
arm_neon and when little endian was checked, the check is kept.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-06-06  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_vect_int):
Replace
current ARM check by ARM NEON's availability check.
(check_effective_target_vect_intfloat_cvt): Likewise.
(check_effective_target_vect_uintfloat_cvt): Likewise.
(check_effective_target_vect_floatint_cvt): Likewise.
(check_effective_target_vect_floatuint_cvt): Likewise.
(check_effective_target_vect_shift): Likewise.
(check_effective_target_whole_vector_shift): Likewise.
(check_effective_target_vect_bswap): Likewise.
(check_effective_target_vect_shift_char): Likewise.
(check_effective_target_vect_long): Likewise.
(check_effective_target_vect_float): Likewise.
(check_effective_target_vect_perm): Likewise.
(check_effective_target_vect_perm_byte): Likewise.
(check_effective_target_vect_perm_short): Likewise.
(check_effective_target_vect_widen_sum_hi_to_si_pattern):
Likewise.
(check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi_pattern):
Likewise.
(check_effective_target_vect_widen_mult_hi_to_si_pattern):
Likewise.
(check_effective_target_vect_widen_shift): Likewise.
(check_effective_target_vect_extract_even_odd): Likewise.
(check_effective_target_vect_interleave): Likewise.
(check_effective_target_vect_multiple_sizes): Likewise.
(check_effective_target_vect64): Likewise.
(check_effective_target_vect_max_reduc): Likewise.

Testing: Testsuite shows no regression when targeting ARMv7-A with
-mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with
default FPU and float ABI (soft). Testing was done with both
compare_tests
and the updated dg-cmp-results proposed in
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html

Is this ok for trunk?



I applied your patch on top of r249233, and noticed quite a few changes:

http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html


Note that "Big-Regression" cases are caused by the fact that there a
are PASS->XPASS and XFAILs disappear with your patch, and many
(3000-4000) PASS disappear.
In that intended?



It certainly is not. I'd like to investigate this but the link to results
for
rev 249233 is broken. Could you provide me with the results you have for
that so
that I can compare manually?



Actually yes it is, at least for the configurations with default (which
still uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've checked
all the ->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi configuration
and none of them has a dg directive to select the neon unit (such as
dg-additional-options ).
I've also looked at arm-none-linux-gnueabihf configuration with neon FPU and
there is no regression there.

I therefore think this is all normal and expected. Note that under current
trunk this should be different because neon-fp16 would be selected instead
of vfp for default FPU with Cortex-A9.



OK, thanks for checking. So the version you sent on June 15th is OK?


Yes.


I can start a validation against current trunk, after Richard's series,
it probably makes sense, doesn't it?


I think it'll give cleaner results yes. Note that the one with an explicit 
-mfpu=vfp* without neon will still have a lot of changes but at least the one 
with default FPU should be more readable.


Thanks,

Christophe


Best regards,

Thomas


Re: [PATCH GCC][06/13]Preserve loop nest in whole distribution life time

2017-06-19 Thread Richard Biener
On Mon, Jun 19, 2017 at 3:32 PM, Bin.Cheng  wrote:
> On Tue, Jun 13, 2017 at 12:08 PM, Richard Biener
>  wrote:
>> On Tue, Jun 13, 2017 at 1:06 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng  wrote:
 Hi,
 This simple patch computes and preserves loop nest vector for whole 
 distribution
 life time.  The loop nest will be used multiple times in on-demand data 
 dependence
 computation.

 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> Don't like it too much but I guess we can see if refactoring it back
>>> to pass down
>>> loop_nest can work.
>>>
>>> Ok.
>>
>> Oh.
>>
>> +/* The loop (nest) to be distributed.  */
>> +static vec *loop_nest;
>> +
>>
>> please make it
>>
>> static vec loop_nest;
>>
>> instead to avoid a pointless indirection (vec<> just contains a
>> pointer to allocated storage).
> Hi Richard,
> This is the updated patch according to your comment, is it OK?

Ok.

Richard.

> Thanks,
> bin
>
> 2017-06-17  Bin Cheng  
>
> * tree-loop-distribution.c (loop_nest): New global var.
> (build_rdg): Use loop directly, rather than loop nest.
> (pg_add_dependence_edges): Remove loop nest parameter.  Use global
> variable directly.
> (distribute_loop): Compute global variable loop nest.  Update use.


Re: [PATCH GCC][07/13]Preserve data references for whole distribution life time

2017-06-19 Thread Bin.Cheng
On Tue, Jun 13, 2017 at 12:14 PM, Richard Biener
 wrote:
> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng  wrote:
>> Hi,
>> This patch collects and preserves all data references in loop for whole
>> distribution life time.  It will be used afterwards.
>>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>
> +/* Vector of data references in the loop to be distributed.  */
> +static vec *datarefs_vec;
> +
> +/* Map of data reference in the loop to a unique id.  */
> +static hash_map *datarefs_map;
> +
>
> no need to make those pointers.  It's not a unique id but
> the index into the datarefs_vec vector, right?
>
> loop distribution doesn't yet use dr->aux so it would be nice
> to avoid the hash_map in favor of using that field.
>
> #define DR_INDEX (dr) ((uintptr_t)(dr)->aux)
>
> +  if (datarefs_vec->length () > 64)
>
> There is PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS)
> with a default value of 1000.  Please use that instead of magic numbers.
>
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file,
> +"Loop %d not distributed: more than 64 memory references.\n",
> +loop->num);
> +
> +  free_rdg (rdg);
> +  loop_nest->release ();
> +  delete loop_nest;
> +  free_data_refs (*datarefs_vec);
> +  delete datarefs_vec;
> +  return 0;
> +}
>
> auto_* were so nice ...
Hi Richard,
This is the updated patch.  It removes datarefs_map as well as checks
number of data references against the parameter.  Is it OK?

Thanks,
bin
2017-06-07  Bin Cheng  

* tree-loop-distribution.c (params.h): Include header file.
(MAX_DATAREFS_NUM, DR_INDEX): New macro.
(datarefs_vec): New global var.
(create_rdg_vertices): Use datarefs_vec directly.
(free_rdg): Don't free data references.
(build_rdg): Update use.  Don't free data references.
(distribute_loop): Compute global variable for data references.
Bail out if there are too many data references.


Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets

2017-06-19 Thread Thomas Preudhomme



On 19/06/17 10:16, Thomas Preudhomme wrote:



On 19/06/17 08:41, Christophe Lyon wrote:

Hi Thomas,


On 15 June 2017 at 18:18, Thomas Preudhomme
 wrote:

Hi,

Conditions checked for ARM targets in vector-related effective targets
are inconsistent:

* sometimes arm*-*-* is checked
* sometimes Neon is checked
* sometimes arm_neon_ok and sometimes arm_neon is used for neon check
* sometimes check_effective_target_* is used, sometimes is-effective-target

This patch consolidate all of these check into using is-effective-target
arm_neon and when little endian was checked, the check is kept.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-06-06  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_vect_int): Replace
current ARM check by ARM NEON's availability check.
(check_effective_target_vect_intfloat_cvt): Likewise.
(check_effective_target_vect_uintfloat_cvt): Likewise.
(check_effective_target_vect_floatint_cvt): Likewise.
(check_effective_target_vect_floatuint_cvt): Likewise.
(check_effective_target_vect_shift): Likewise.
(check_effective_target_whole_vector_shift): Likewise.
(check_effective_target_vect_bswap): Likewise.
(check_effective_target_vect_shift_char): Likewise.
(check_effective_target_vect_long): Likewise.
(check_effective_target_vect_float): Likewise.
(check_effective_target_vect_perm): Likewise.
(check_effective_target_vect_perm_byte): Likewise.
(check_effective_target_vect_perm_short): Likewise.
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Likewise.
(check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise.
(check_effective_target_vect_widen_shift): Likewise.
(check_effective_target_vect_extract_even_odd): Likewise.
(check_effective_target_vect_interleave): Likewise.
(check_effective_target_vect_multiple_sizes): Likewise.
(check_effective_target_vect64): Likewise.
(check_effective_target_vect_max_reduc): Likewise.

Testing: Testsuite shows no regression when targeting ARMv7-A with
-mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with
default FPU and float ABI (soft). Testing was done with both compare_tests
and the updated dg-cmp-results proposed in
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html

Is this ok for trunk?



I applied your patch on top of r249233, and noticed quite a few changes:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html


Note that "Big-Regression" cases are caused by the fact that there a
are PASS->XPASS and XFAILs disappear with your patch, and many
(3000-4000) PASS disappear.
In that intended?


It certainly is not. I'd like to investigate this but the link to results for
rev 249233 is broken. Could you provide me with the results you have for that so
that I can compare manually?


Actually yes it is, at least for the configurations with default (which still 
uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've checked all the 
->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi configuration and none of 
them has a dg directive to select the neon unit (such as dg-additional-options 
). I've also looked at 
arm-none-linux-gnueabihf configuration with neon FPU and there is no regression 
there.


I therefore think this is all normal and expected. Note that under current trunk 
this should be different because neon-fp16 would be selected instead of vfp for 
default FPU with Cortex-A9.


Best regards,

Thomas


[C++ PATCH] small pt.c cleanup

2017-06-19 Thread Nathan Sidwell
I discovered a cleanup I'd forgotten about.  The only unobvious thing 
about this patch is that just before the second hunk we've already set 
DECL_RTL to NULL.   No need to do it twice.


nathan
--
Nathan Sidwell
2017-06-19  Nathan Sidwell  

	* pt.c (coerce_template_parms): Fix indentation.
	(tsubst_decl): Remove repeated SET_DECL_RTL.  Move VAR_P handling
	in to single block.

Index: pt.c
===
--- pt.c	(revision 249364)
+++ pt.c	(working copy)
@@ -8101,10 +8101,10 @@ coerce_template_parms (tree parms,
   parm = TREE_VEC_ELT (parms, parm_idx);
  
   if (parm == error_mark_node)
-  {
-TREE_VEC_ELT (new_inner_args, arg_idx) = error_mark_node;
-continue;
-  }
+	{
+	  TREE_VEC_ELT (new_inner_args, arg_idx) = error_mark_node;
+	  continue;
+	}
 
   /* Calculate the next argument.  */
   if (arg_idx < nargs)
@@ -12947,13 +12947,11 @@ tsubst_decl (tree t, tree args, tsubst_f
 	/* The initializer must not be expanded until it is required;
 	   see [temp.inst].  */
 	DECL_INITIAL (r) = NULL_TREE;
-	if (VAR_P (r))
-	  SET_DECL_MODE (r, VOIDmode);
-	if (CODE_CONTAINS_STRUCT (TREE_CODE (t), TS_DECL_WRTL))
-	  SET_DECL_RTL (r, NULL);
 	DECL_SIZE (r) = DECL_SIZE_UNIT (r) = 0;
 	if (VAR_P (r))
 	  {
+	SET_DECL_MODE (r, VOIDmode);
+
 	/* Possibly limit visibility based on template args.  */
 	DECL_VISIBILITY (r) = VISIBILITY_DEFAULT;
 	if (DECL_VISIBILITY_SPECIFIED (t))


Re: [PATCH, GCC/testsuite/ARM] Consistently check for neon in vect effective targets

2017-06-19 Thread Christophe Lyon
On 19 June 2017 at 16:11, Thomas Preudhomme
 wrote:
>
>
> On 19/06/17 10:16, Thomas Preudhomme wrote:
>>
>>
>>
>> On 19/06/17 08:41, Christophe Lyon wrote:
>>>
>>> Hi Thomas,
>>>
>>>
>>> On 15 June 2017 at 18:18, Thomas Preudhomme
>>>  wrote:

 Hi,

 Conditions checked for ARM targets in vector-related effective targets
 are inconsistent:

 * sometimes arm*-*-* is checked
 * sometimes Neon is checked
 * sometimes arm_neon_ok and sometimes arm_neon is used for neon check
 * sometimes check_effective_target_* is used, sometimes
 is-effective-target

 This patch consolidate all of these check into using is-effective-target
 arm_neon and when little endian was checked, the check is kept.

 ChangeLog entry is as follows:

 *** gcc/testsuite/ChangeLog ***

 2017-06-06  Thomas Preud'homme  

 * lib/target-supports.exp (check_effective_target_vect_int):
 Replace
 current ARM check by ARM NEON's availability check.
 (check_effective_target_vect_intfloat_cvt): Likewise.
 (check_effective_target_vect_uintfloat_cvt): Likewise.
 (check_effective_target_vect_floatint_cvt): Likewise.
 (check_effective_target_vect_floatuint_cvt): Likewise.
 (check_effective_target_vect_shift): Likewise.
 (check_effective_target_whole_vector_shift): Likewise.
 (check_effective_target_vect_bswap): Likewise.
 (check_effective_target_vect_shift_char): Likewise.
 (check_effective_target_vect_long): Likewise.
 (check_effective_target_vect_float): Likewise.
 (check_effective_target_vect_perm): Likewise.
 (check_effective_target_vect_perm_byte): Likewise.
 (check_effective_target_vect_perm_short): Likewise.
 (check_effective_target_vect_widen_sum_hi_to_si_pattern):
 Likewise.
 (check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
 (check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
 (check_effective_target_vect_widen_mult_hi_to_si): Likewise.
 (check_effective_target_vect_widen_mult_qi_to_hi_pattern):
 Likewise.
 (check_effective_target_vect_widen_mult_hi_to_si_pattern):
 Likewise.
 (check_effective_target_vect_widen_shift): Likewise.
 (check_effective_target_vect_extract_even_odd): Likewise.
 (check_effective_target_vect_interleave): Likewise.
 (check_effective_target_vect_multiple_sizes): Likewise.
 (check_effective_target_vect64): Likewise.
 (check_effective_target_vect_max_reduc): Likewise.

 Testing: Testsuite shows no regression when targeting ARMv7-A with
 -mfpu=neon-fpv4 and -mfloat-abi=hard or when targeting Cortex-M3 with
 default FPU and float ABI (soft). Testing was done with both
 compare_tests
 and the updated dg-cmp-results proposed in
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01030.html

 Is this ok for trunk?

>>>
>>> I applied your patch on top of r249233, and noticed quite a few changes:
>>>
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/249233-consistent_neon_check.patch/report-build-info.html
>>>
>>>
>>> Note that "Big-Regression" cases are caused by the fact that there a
>>> are PASS->XPASS and XFAILs disappear with your patch, and many
>>> (3000-4000) PASS disappear.
>>> In that intended?
>>
>>
>> It certainly is not. I'd like to investigate this but the link to results
>> for
>> rev 249233 is broken. Could you provide me with the results you have for
>> that so
>> that I can compare manually?
>
>
> Actually yes it is, at least for the configurations with default (which
> still uses -mfpu=vfp in r249233) or VFP (whatever version) FPU. I've checked
> all the ->NA and ->UNSUPPORTED for the arm-none-linux-gnueabi configuration
> and none of them has a dg directive to select the neon unit (such as
> dg-additional-options ).
> I've also looked at arm-none-linux-gnueabihf configuration with neon FPU and
> there is no regression there.
>
> I therefore think this is all normal and expected. Note that under current
> trunk this should be different because neon-fp16 would be selected instead
> of vfp for default FPU with Cortex-A9.
>

OK, thanks for checking. So the version you sent on June 15th is OK?
I can start a validation against current trunk, after Richard's series,
it probably makes sense, doesn't it?

Thanks,

Christophe

> Best regards,
>
> Thomas


Re: [PATCH] Fix -fsanitize=undefined ubsan_encode_value ICE (PR sanitizer/81111)

2017-06-19 Thread Richard Biener
On Mon, 19 Jun 2017, Jakub Jelinek wrote:

> Hi!
> 
> Martin's recent patch that introduced sanitize_flags_p causes us to
> instrument operations even when current_function_decl is NULL.  If it
> is valid constant expression it will be folded away soon, otherwise
> usually we emit a runtime initializer in the static ctors function for
> it.  In any case, neither gimple_add_tmp_var that create_tmp_var calls
> normark_addressable actually work in that case, fixed thusly,
> bootstrapped/regtested on x86_64-linux and i686-linux plus
> bootstrapped/regtested with bootstrap-ubsan, ok for trunk?

Ok.

Richard.

> 2017-06-19  Jakub Jelinek  
> 
>   PR sanitizer/8
>   * ubsan.c (ubsan_encode_value): If current_function_decl is NULL,
>   use create_tmp_var_raw instead of create_tmp_var, mark it addressable
>   just by setting TREE_ADDRESSABLE on the result and use a TARGET_EXPR.
> 
>   * g++.dg/ubsan/pr8.C: New test.
> 
> --- gcc/ubsan.c.jj2017-06-16 13:27:48.0 +0200
> +++ gcc/ubsan.c   2017-06-16 16:28:29.099155949 +0200
> @@ -145,9 +145,17 @@ ubsan_encode_value (tree t, bool in_expa
>   {
> /* The reason for this is that we don't want to pessimize
>code by making vars unnecessarily addressable.  */
> -   tree var = create_tmp_var (type);
> -   tree tem = build2 (MODIFY_EXPR, void_type_node, var, t);
> -   mark_addressable (var);
> +   tree var;
> +   if (current_function_decl)
> + {
> +   var = create_tmp_var (type);
> +   mark_addressable (var);
> + }
> +   else
> + {
> +   var = create_tmp_var_raw (type);
> +   TREE_ADDRESSABLE (var) = 1;
> + }
> if (in_expand_p)
>   {
> rtx mem
> @@ -158,8 +166,17 @@ ubsan_encode_value (tree t, bool in_expa
> expand_assignment (var, t, false);
> return build_fold_addr_expr (var);
>   }
> -   t = build_fold_addr_expr (var);
> -   return build2 (COMPOUND_EXPR, TREE_TYPE (t), tem, t);
> +   if (current_function_decl)
> + {
> +   tree tem = build2 (MODIFY_EXPR, void_type_node, var, t);
> +   t = build_fold_addr_expr (var);
> +   return build2 (COMPOUND_EXPR, TREE_TYPE (t), tem, t);
> + }
> +   else
> + {
> +   var = build4 (TARGET_EXPR, type, var, t, NULL_TREE, NULL_TREE);
> +   return build_fold_addr_expr (var);
> + }
>   }
>else
>   return build_fold_addr_expr (t);
> --- gcc/testsuite/g++.dg/ubsan/pr8.C.jj   2017-06-16 15:39:57.752886010 
> +0200
> +++ gcc/testsuite/g++.dg/ubsan/pr8.C  2017-06-16 15:39:37.0 
> +0200
> @@ -0,0 +1,45 @@
> +// PR sanitizer/8
> +// { dg-do compile }
> +// { dg-options "-fsanitize=shift" }
> +
> +template 
> +struct N
> +{
> +  static const V m = (((V)(-1) < 0)
> +   ? (V)1 << (sizeof(V) * __CHAR_BIT__ - ((V)(-1) < 0))
> +   : (V) 0);
> +};
> +
> +template
> +const V N::m;
> +
> +template 
> +struct O
> +{
> +  static const V m = (V)1 << sizeof(V) * __CHAR_BIT__;
> +};
> +
> +template
> +const V O::m;
> +
> +void
> +foo ()
> +{
> +  N::m;
> +  N::m;
> +#ifdef __SIZEOF_INT128__
> +  N<__int128>::m;
> +  N::m;
> +#endif
> +}
> +
> +void
> +bar ()
> +{
> +  O::m;
> +  O::m;
> +#ifdef __SIZEOF_INT128__
> +  O<__int128>::m;
> +  O::m;
> +#endif
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags.  In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
> 
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.

Actually the logic is

  int reject = old_cost > 0 && new_cost > old_cost;


So reject will never be true if old cost is zero.

R.
> 
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set.  If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
> 
> I'll test this on a zillion archs, see what the effect is.
> 
> Have you considered costing general parallels as well?
> 
> 
> Segher
> 



Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 03:28:20PM +0100, Richard Earnshaw (lists) wrote:
> > That's not what combine does: it optimistically assumes any combination
> > with unknown costs is an improvement.
> 
> So try this testcase on ARM.
> 
> unsigned long x, y, z;
> int b;
> void test()
> {
>b = __builtin_sub_overflow (y,z, );
> }
> 
> 
> Without the patch, combine rips apart a compare and subtract insn
> because it sees it as having cost zero and substitutes it with separate
> compare and subtract insns.

> The combine log before the patch shows:
> 
> allowing combination of insns 10 and 51
> original costs 0 + 8 = 0
> replacement costs 4 + 12 = 16

Yes, this is a good example of a case where your patch helps.  Thanks.

> So it is clearly deciding that the original costs are greater than the
> replacement costs.

No: it allows any combination with unknown cost (either old or new cost).
See combine_validate_cost.

> >> This patch addresses this problem by allowing insn_rtx_cost to ignore
> >> the condition setting part of a PARALLEL iff there is exactly one
> >> comparison set and one non-comparison set.  If the only set operation is
> >> a comparison we still use that as the basis of the insn cost.
> > 
> > I'll test this on a zillion archs, see what the effect is.
> > 
> > Have you considered costing general parallels as well?
> 
> I thought about it but concluded that there's no generically correct
> answer.  It might be the max of all the individual sets or it might be
> the sum, or it might be somewhere in between.  For example on ARM the
> load/store multiple operations are expressed as parallels, but their
> cost will depend on how many loads/stores happen in parallel within the
> hardware.
> 
> I think we'd need a new back-end hook to handle the other cases sensibly.

And in general make insn_rtx_cost do something more sane than just looking
at a set_src_cost, yeah.

The problem is changing any of this without regressing some targets.
Of course we are in stage 1 now ;-)


Segher


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 03:45:23PM +0100, Richard Earnshaw (lists) wrote:
> >> At present all such insns are treated as having unknown cost (ie 0) and
> >> combine assumes that such insns are infinitely more expensive than any
> >> other insn sequence with a non-zero cost.
> > 
> > That's not what combine does: it optimistically assumes any combination
> > with unknown costs is an improvement.
> 
> Actually the logic is
> 
>   int reject = old_cost > 0 && new_cost > old_cost;
> 
> So reject will never be true if old cost is zero.

Yes, exactly; and neither if new_cost is zero.  If any cost is unknown
combine just hopes for the best.


Segher


Re: [PATCH, contrib] Support multi-tool sum files in dg-cmp-results.sh

2017-06-19 Thread Mike Stump
On Jun 14, 2017, at 5:30 AM, Thomas Preudhomme  
wrote:
> 
> 2017-06-14  Thomas Preud'homme  
> 
>   * dg-cmp-results.sh: Keep test result lines rather than throwing
>   header and summary to support sum files with multiple tools.
> 
> Tested successfully on sum file with single tool with similar results
> and on sum file with multiple tools now showing a regression with patch
> proposed in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00875.html
> 
> Is this ok for trunk?

Ok.


[PATCH] Ping of ccmp.c (conditional compare) patch

2017-06-19 Thread Steve Ellcey

This is a re-ping of:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00021.html

It was pointed out to me that my original subject line [PATCH/AARCH64] might
be misleading since the patch is not to code in config/aarch64.  It is to
ccmp.c which is in the shared gcc directory.  That said, aarch64 is currently
the only architecture that uses this code because it is the only architecture
that sets targetm.gen_ccmp_first and targetm.gen_ccmp_next.

Steve Ellcey


Add dg-add-options feature stack_size

2017-06-19 Thread Tom de Vries

[ was: Re: [PATCH, testsuite] Add effective target stack_size ]

On 06/09/2017 05:25 PM, Mike Stump wrote:

On Jun 9, 2017, at 7:24 AM, Tom de Vries  wrote:

this patch adds effective target stack_size.
OK for trunk if x86_64 and nvptx testing succeeds?

Ok.


Hi,

I came across dg-add-options, and wondered if adding a dg-add-options 
feature stack_size is a better way to make STACK_SIZE available.


Info looks like this:
...
7.2.3.12 Other attributes

'stack_size'
 Target has limited stack size.  The stack size limit can be
 obtained using the STACK_SIZE macro defined by *note
 'dg-add-options' feature 'stack_size': stack_size_ao.

7.2.4 Features for 'dg-add-options'

'stack_size'
 Add the flags needed to define macro STACK_SIZE and set it to the
 stack size limit associated with the *note 'stack_size' effective
 target: stack_size_et.
...

Incomplete (updated just one testcase) and untested.

OK if complete and tested?

Thanks,
- Tom
Add dg-add-options feature stack_size

2017-06-19  Tom de Vries  

	* doc/sourcebuild.texi (Add Options, Features for dg-add-options): Add
	stack_size feature.
	(Effective-Target Keywords, Other attributes): Suggest using
	dg-add-options stack_size feature to get stack limit in stack_size
	effective target documentation.

	* lib/target-supports.exp (add_options_for_stack_size): New proc.
	* gcc.c-torture/execute/920501-7.c: Use dg-add-options stack_size.

---
 gcc/doc/sourcebuild.texi   | 15 ++-
 gcc/testsuite/gcc.c-torture/execute/920501-7.c |  2 +-
 gcc/testsuite/lib/target-supports.exp  | 11 +++
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e5f0da6..7f5c2cf 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2097,11 +2097,10 @@ Target supports section anchors.
 Target defaults to short enums.
 
 @item stack_size
-Target has limited stack size.  The stack size limit can be obtained using
-@code{[dg-effective-target-value stack_size]}.  For example:
-@smallexample
-/* @{ dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value stack_size]" @{ target @{ stack_size @} @} @} */
-@end smallexample
+@anchor{stack_size_et}
+Target has limited stack size.  The stack size limit can be obtained using the
+STACK_SIZE macro defined by @ref{stack_size_ao,,@code{dg-add-options} feature
+@code{stack_size}}.
 
 @item static
 Target supports @option{-static}.
@@ -2282,6 +2281,12 @@ compliance mode.
 @code{mips16} function attributes.
 Only MIPS targets support this feature, and only then in certain modes.
 
+@item stack_size
+@anchor{stack_size_ao}
+Add the flags needed to define macro STACK_SIZE and set it to the stack size
+limit associated with the @ref{stack_size_et,,@code{stack_size} effective
+target}.
+
 @item tls
 Add the target-specific flags needed to use thread-local storage.
 @end table
diff --git a/gcc/testsuite/gcc.c-torture/execute/920501-7.c b/gcc/testsuite/gcc.c-torture/execute/920501-7.c
index 5cced09..1396eeb 100644
--- a/gcc/testsuite/gcc.c-torture/execute/920501-7.c
+++ b/gcc/testsuite/gcc.c-torture/execute/920501-7.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target label_values } */
 /* { dg-require-effective-target trampolines } */
-/* { dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value stack_size]" { target { stack_size } } } */
+/* { dg-add-options stack_size } */
 
 #ifdef STACK_SIZE
 #define DEPTH ((STACK_SIZE) / 512 + 1)
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 31701c2..502986e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7249,6 +7249,17 @@ proc add_options_for_double_vectors { flags } {
 return $flags
 }
 
+# Add to FLAGS the flags needed to define the STACK_SIZE macro.
+
+proc add_options_for_stack_size { flags } {
+if [is-effective-target stack_size] {
+	set stack_size [dg-effective-target-value stack_size]
+	return "$flags -DSTACK_SIZE=$stack_size"
+}
+
+return $flags
+}
+
 # Return 1 if the target provides a full C99 runtime.
 
 proc check_effective_target_c99_runtime { } {


Re: [PATCH GCC][09/13]Simply cost model merges partitions with the same references

2017-06-19 Thread Richard Biener
On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng  wrote:
> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>>> Hi,
>>> Current primitive cost model merges partitions with data references sharing 
>>> the same
>>> base address.  I believe it's designed to maximize data reuse in 
>>> distribution, but
>>> that should be done by dedicated data reusing algorithm.  At this stage of 
>>> merging,
>>> we should be conservative and only merge partitions with the same 
>>> references.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Well, I'd say "conservative" is merging more, not less.  For example
>> splitting a[i+1] from a[i]
>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>> merging.  Maybe
>> DR_INIT within a cacheline or so.
>>
>> How many extra distributions in say SPEC do you get from this change alone?
> Hi,
> I collected data for spec2006 only with/without this patch.  I am a
> bit surprised that it doesn't change the number of distributed loops.
>>
>> It shows also that having partition->reads_and_writes would be nice
>> ...  the code duplication
> Yeah, I merged read/write data references in previous patch, now this
> duplication is gone.  Update patch attached.  Is it OK?

+  gcc_assert (i < datarefs_vec.length ());
+  dr1 = datarefs_vec[i];

these asserts are superfluous -- vec::operator[] does them as well.

Ok if you remove them.

Richard.

> Thanks.
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (ref_base_address): Delete.
> (similar_memory_accesses): Rename ...
> (share_memory_accesses): ... to this.  Check if partitions access
> the same memory reference.
> (distribute_loop): Call share_memory_accesses.


[PATCH] Fix x86 ICE with -mtune=amdfam10 -mno-sse2 (PR target/81121)

2017-06-19 Thread Jakub Jelinek
Hi!

This testcase started to ICE when PR70873 fix changed the splitter:
@@ -5153,11 +5147,11 @@
 ;; slots when !TARGET_INTER_UNIT_MOVES_TO_VEC disables the general_regs
 ;; alternative in sse2_loadld.
 (define_split
-  [(set (match_operand:MODEF 0 "register_operand")
+  [(set (match_operand:MODEF 0 "sse_reg_operand")
(float:MODEF (match_operand:SI 1 "nonimmediate_operand")))]
-  "TARGET_SSE2 && TARGET_SSE_MATH
-   && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun)
-   && reload_completed && SSE_REG_P (operands[0])
+  "TARGET_USE_VECTOR_CONVERTS
+   && optimize_function_for_speed_p (cfun)
+   && reload_completed
&& (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
&& (!EXT_REX_SSE_REG_P (operands[0])
|| TARGET_AVX512VL)"
Having sse_reg_operand match the output operand does not imply
TARGET_SSE2 is enabled, but we need it for both the
  if (mode == V4SFmode)
emit_insn (gen_floatv4siv4sf2 (operands[3], operands[4]));
  else
emit_insn (gen_sse2_cvtdq2pd (operands[3], operands[4]));
instructions that we want to use in the splitter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
(or do you want TARGET_SSE2 first or right after
TARGET_USE_VECTOR_CONVERTS)?

2017-06-19  Jakub Jelinek  

PR target/81121
* config/i386/i386.md (TARGET_USE_VECTOR_CONVERTS float si->{sf,df}
splitter): Require TARGET_SSE2 in the condition.

* gcc.target/i386/pr81121.c: New test.

--- gcc/config/i386/i386.md.jj  2017-06-08 20:50:46.0 +0200
+++ gcc/config/i386/i386.md 2017-06-19 11:30:38.937491668 +0200
@@ -5294,6 +5294,7 @@ (define_split
&& optimize_function_for_speed_p (cfun)
&& reload_completed
&& (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
+   && TARGET_SSE2
&& (!EXT_REX_SSE_REG_P (operands[0])
|| TARGET_AVX512VL)"
   [(const_int 0)]
--- gcc/testsuite/gcc.target/i386/pr81121.c.jj  2017-06-19 11:36:06.545501324 
+0200
+++ gcc/testsuite/gcc.target/i386/pr81121.c 2017-06-19 11:35:40.0 
+0200
@@ -0,0 +1,10 @@
+/* PR target/81121 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=amdfam10 -mno-sse2" } */
+
+void
+foo (short *x, short *y)
+{
+  float a = 0;
+  y[0] = x[0] * a;
+}

Jakub


Re: [PATCH GCC][07/13]Preserve data references for whole distribution life time

2017-06-19 Thread Bin.Cheng
On Mon, Jun 19, 2017 at 4:16 PM, Richard Biener
 wrote:
> On Mon, Jun 19, 2017 at 3:34 PM, Bin.Cheng  wrote:
>> On Tue, Jun 13, 2017 at 12:14 PM, Richard Biener
>>  wrote:
>>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng  wrote:
 Hi,
 This patch collects and preserves all data references in loop for whole
 distribution life time.  It will be used afterwards.

 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> +/* Vector of data references in the loop to be distributed.  */
>>> +static vec *datarefs_vec;
>>> +
>>> +/* Map of data reference in the loop to a unique id.  */
>>> +static hash_map *datarefs_map;
>>> +
>>>
>>> no need to make those pointers.  It's not a unique id but
>>> the index into the datarefs_vec vector, right?
>>>
>>> loop distribution doesn't yet use dr->aux so it would be nice
>>> to avoid the hash_map in favor of using that field.
>>>
>>> #define DR_INDEX (dr) ((uintptr_t)(dr)->aux)
>>>
>>> +  if (datarefs_vec->length () > 64)
>>>
>>> There is PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS)
>>> with a default value of 1000.  Please use that instead of magic numbers.
>>>
>>> +{
>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>> +   fprintf (dump_file,
>>> +"Loop %d not distributed: more than 64 memory 
>>> references.\n",
>>> +loop->num);
>>> +
>>> +  free_rdg (rdg);
>>> +  loop_nest->release ();
>>> +  delete loop_nest;
>>> +  free_data_refs (*datarefs_vec);
>>> +  delete datarefs_vec;
>>> +  return 0;
>>> +}
>>>
>>> auto_* were so nice ...
>> Hi Richard,
>> This is the updated patch.  It removes datarefs_map as well as checks
>> number of data references against the parameter.  Is it OK?
>
> ENOPATCH
Ah Sorry for that.

Thanks,
bin
>
>> Thanks,
>> bin
>> 2017-06-07  Bin Cheng  
>>
>> * tree-loop-distribution.c (params.h): Include header file.
>> (MAX_DATAREFS_NUM, DR_INDEX): New macro.
>> (datarefs_vec): New global var.
>> (create_rdg_vertices): Use datarefs_vec directly.
>> (free_rdg): Don't free data references.
>> (build_rdg): Update use.  Don't free data references.
>> (distribute_loop): Compute global variable for data references.
>> Bail out if there are too many data references.
From 4e0c23380e82465e6865dac017b13ac5791cba5d Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 9 Jun 2017 12:09:03 +0100
Subject: [PATCH 06/13] preserve-datarefs-20170609.txt

---
 gcc/tree-loop-distribution.c | 53 
 1 file changed, 39 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 8183090..a013556 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -63,12 +63,22 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
+#include "params.h"
 #include "tree-vectorizer.h"
 
 
+#define MAX_DATAREFS_NUM \
+	((unsigned) PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
+
 /* The loop (nest) to be distributed.  */
 static vec loop_nest;
 
+/* Vector of data references in the loop to be distributed.  */
+static vec datarefs_vec;
+
+/* Store index of data reference in aux field.  */
+#define DR_INDEX(dr)  ((uintptr_t) (dr)->aux)
+
 /* A Reduced Dependence Graph (RDG) vertex representing a statement.  */
 struct rdg_vertex
 {
@@ -339,8 +349,7 @@ create_rdg_cd_edges (struct graph *rdg, control_dependences *cd, loop_p loop)
if that failed.  */
 
 static bool
-create_rdg_vertices (struct graph *rdg, vec stmts, loop_p loop,
-		 vec *datarefs)
+create_rdg_vertices (struct graph *rdg, vec stmts, loop_p loop)
 {
   int i;
   gimple *stmt;
@@ -360,12 +369,12 @@ create_rdg_vertices (struct graph *rdg, vec stmts, loop_p loop,
   if (gimple_code (stmt) == GIMPLE_PHI)
 	continue;
 
-  unsigned drp = datarefs->length ();
-  if (!find_data_references_in_stmt (loop, stmt, datarefs))
+  unsigned drp = datarefs_vec.length ();
+  if (!find_data_references_in_stmt (loop, stmt, _vec))
 	return false;
-  for (unsigned j = drp; j < datarefs->length (); ++j)
+  for (unsigned j = drp; j < datarefs_vec.length (); ++j)
 	{
-	  data_reference_p dr = (*datarefs)[j];
+	  data_reference_p dr = datarefs_vec[j];
 	  if (DR_IS_READ (dr))
 	RDGV_HAS_MEM_READS (v) = true;
 	  else
@@ -449,7 +458,7 @@ free_rdg (struct graph *rdg)
   if (v->data)
 	{
 	  gimple_set_uid (RDGV_STMT (v), -1);
-	  free_data_refs (RDGV_DATAREFS (v));
+	  (RDGV_DATAREFS (v)).release ();
 	  free (v->data);
 	}
 }
@@ -459,22 +468,20 @@ free_rdg (struct graph *rdg)
 
 /* Build the Reduced Dependence Graph (RDG) with one vertex per statement of
LOOP, and one edge per flow dependence or 

Re: [PATCH GCC][07/13]Preserve data references for whole distribution life time

2017-06-19 Thread Richard Biener
On Mon, Jun 19, 2017 at 3:34 PM, Bin.Cheng  wrote:
> On Tue, Jun 13, 2017 at 12:14 PM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch collects and preserves all data references in loop for whole
>>> distribution life time.  It will be used afterwards.
>>>
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> +/* Vector of data references in the loop to be distributed.  */
>> +static vec *datarefs_vec;
>> +
>> +/* Map of data reference in the loop to a unique id.  */
>> +static hash_map *datarefs_map;
>> +
>>
>> no need to make those pointers.  It's not a unique id but
>> the index into the datarefs_vec vector, right?
>>
>> loop distribution doesn't yet use dr->aux so it would be nice
>> to avoid the hash_map in favor of using that field.
>>
>> #define DR_INDEX (dr) ((uintptr_t)(dr)->aux)
>>
>> +  if (datarefs_vec->length () > 64)
>>
>> There is PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS)
>> with a default value of 1000.  Please use that instead of magic numbers.
>>
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +   fprintf (dump_file,
>> +"Loop %d not distributed: more than 64 memory 
>> references.\n",
>> +loop->num);
>> +
>> +  free_rdg (rdg);
>> +  loop_nest->release ();
>> +  delete loop_nest;
>> +  free_data_refs (*datarefs_vec);
>> +  delete datarefs_vec;
>> +  return 0;
>> +}
>>
>> auto_* were so nice ...
> Hi Richard,
> This is the updated patch.  It removes datarefs_map as well as checks
> number of data references against the parameter.  Is it OK?

ENOPATCH

> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (params.h): Include header file.
> (MAX_DATAREFS_NUM, DR_INDEX): New macro.
> (datarefs_vec): New global var.
> (create_rdg_vertices): Use datarefs_vec directly.
> (free_rdg): Don't free data references.
> (build_rdg): Update use.  Don't free data references.
> (distribute_loop): Compute global variable for data references.
> Bail out if there are too many data references.


Re: [PATCH, testsuite] Add effective target stack_size

2017-06-19 Thread Mike Stump
On Jun 19, 2017, at 2:21 AM, Christophe Lyon  wrote:
> 
> The attached patch removes the support for STACK_SIZE in the testcase
> as you suggested, and it works fine (cross-tested on aarch64/arm targets)
> 
> OK for trunk?

Ok.

RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
As some of you are likely aware, Qualys has just published fairly
detailed information on using stack/heap clashes as an attack vector.
Eric B, Michael M -- sorry I couldn't say more when I contact you about
-fstack-check and some PPC specific stuff.  This has been under embargo
for the last month.


--


http://www.openwall.com/lists/oss-security/2017/06/19/1


Obviously various vulnerabilities pointed out in that advisory are being
mitigated, particularly those found within glibc.  But those are really
just scratching the surface of this issue.

At its core, this chained attack relies first upon using various
techniques to bring the stack and heap close together.  Then the
exploits rely on large stack allocations to "jump the guard".  Once the
guard has been jumped, the stack and heap have collided and all hell
breaks loose.

The "jump the guard" step can be mitigated with help from the compiler.
We just have to ensure that as we allocate chunks of stack space that we
touch each allocated page.  That ensures that the guard page is hit.

This sounds a whole lot like -fstack-check and initially that's what
folks were hoping could be used to eliminate this class of problems.

--

Unfortunately, -fstack-check is actually not well suited for our purposes.

Some background.  -fstack-check was designed primarily for Ada's needs.
It assumes the whole program is compiled with -fstack-check and it is
designed to ensure there is enough stack space left so that if the
program hits the guard (say via infinite recursion) the program can
safely call into a signal handler and raise an exception.

To ensure there's always enough space to meet that design requirement,
-fstack-check probes stack space ahead of the actual need of the code.

The assumption that all code was compiled with -fstack-check allows for
elision of some stack probes as they are assumed to have been probed by
earlier callers in the call chain.  This elision is safe in an
environment where all callers use -fstack-check, but fatally flawed in a
mixed environment.

Most ports first probe by pages for whatever space is requested, then
after all probing is done, they actually allocate space.  This runs
afoul of valgrind in various unpleasant ways (including crashing
valgrind on two targets).

Only x86-linux currently uses a "moving sp" allocation and probing
strategy.  ie, it actually allocates space, then probes the space.

--

After much poking around I concluded that we really need to implement
allocation and probing via a "moving sp" strategy.   Probing into
unallocated areas runs afoul of valgrind, so that's a non-starter.

Allocating stack space, then probing the pages within the space is
vulnerable to async signal delivery between the allocation point and the
probe point.  If that occurs the signal handler could end up running on
a stack that has collided with the heap.

Ideally we would allocate and probe a page as an atomic unit (which is
feasible on PPC).  Alternatively, due to ISA restrictions, allocate a
page, then probe the page as distinct instructions.  The latter still
has a race, but we'd have to take the async signal in a single
instruction window.

A key point to remember is that you can never have an allocation
(potentially using more than one allocation site) which is larger than a
page without probing the page.

Furthermore, we can not assume that earlier functions in the call stack
were compiled with stack checking enabled.  Thus we can not make any
assumptions about what pages other functions in the callstack have
probed or not probed.

Finally, we need not ensure the ability to handle a signal at stack
overflow.  It is fine for the kernel to halt the process immediately if
it detects a reference to the guard page.


--

With all that in mind, we also want to be as efficient as possible and I
think we do pretty good on x86 and ppc.  On x86, the call instruction
itself stores into the stack and on ppc stack is only supposed to be
allocated via the store-with-base-register-modification instructions
which also store into *sp.

Those "implicit probes" allow us to greatly reduce the amount of probing
we do on those architectures.  If a function allocates less than a page
of space, no probing is needed -- this covers the vast majority of
functions.  Furthermore, if we allocate N pages + M bytes of residuals,
we need only explicitly probe the N pages, but not any of the residual
allocation.

On glibc, we end up creating probes in ~1.5% of the functions on those
two architectures.  We could probably do even better on PPC, but we
currently assume 4k pages which is overly-conservative on that target.

aarch64 is significantly worse.  There are no implicit probes we can
exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
So we have the track the distance to the most recent probe and when that
distance grows too large, we have to emit a probe.  Of course we have to
make worst case assumptions at function entry.

s390 is 

Re: [PATCH GCC][08/13]Refactoring structure partition for distribution

2017-06-19 Thread Richard Biener
On Mon, Jun 19, 2017 at 3:37 PM, Bin.Cheng  wrote:
> On Wed, Jun 14, 2017 at 2:47 PM, Richard Biener
>  wrote:
>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch refactors struct partition for later distribution.  It records
>>> bitmap of data references in struct partition rather than vertices' data in
>>> partition dependence graph.  It simplifies code as well as enables following
>>> rewriting.
>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Ok.
> Hi,
> I updated patch by merging read/write data references together in
> struct partition.  This helps remove code duplication.  Is it OK?

Ok.

Richard.

> Thanks,
> bin
> 2017-06-07  Bin Cheng  
>
> * tree-loop-distribution.c (struct partition): New field recording
> its data reference.
> (partition_alloc, partition_free): Init and release data refs.
> (partition_merge_into): Merge data refs.
> (build_rdg_partition_for_vertex): Collect data refs for partition.
> (pg_add_dependence_edges): Change parameters from vector to bitmap.
> Update uses.
> (distribute_loop): Remve data refs from vertice data of partition
> graph.


Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Richard Earnshaw (lists)
On 19/06/17 16:09, Segher Boessenkool wrote:
> On Mon, Jun 19, 2017 at 03:45:23PM +0100, Richard Earnshaw (lists) wrote:
 At present all such insns are treated as having unknown cost (ie 0) and
 combine assumes that such insns are infinitely more expensive than any
 other insn sequence with a non-zero cost.
>>>
>>> That's not what combine does: it optimistically assumes any combination
>>> with unknown costs is an improvement.
>>
>> Actually the logic is
>>
>>   int reject = old_cost > 0 && new_cost > old_cost;
>>
>> So reject will never be true if old cost is zero.
> 
> Yes, exactly; and neither if new_cost is zero.  If any cost is unknown
> combine just hopes for the best.
> 
> 
> Segher
> 

Yeah, and I'm not suggesting we change the logic there (sorry if the
description was misleading).  Instead I'm proposing that we handle more
cases for parallels to not return zero.

R.


Re: [Patch ARM] Fix PR71778

2017-06-19 Thread James Greenhalgh
On Fri, Jun 16, 2017 at 11:07:41AM +0100, Kyrill Tkachov wrote:
> 
> On 16/06/17 10:07, James Greenhalgh wrote:
> >On Wed, Jun 14, 2017 at 11:21:30AM +0100, Kyrill Tkachov wrote:
> >
> >   <...>
> >
> >>That movv2di expander is the one in vec-common.md that ends up calling
> >>neon_make_constant. I wonder why const0_rtx passed its predicate check
> >>(that would require a V2DImode vector of zeroes rather than a const0_rtx).
> >>Perhaps the midend code at this point doesn't check the operand predicate.
> >>
> >>In the builtin expansion code that you quoted I wonder wonder if we could 
> >>fail
> >>more gracefully by returning CONST0_RTX (mode[argc]) to match the expected
> >>mode of the operand (we've already emitted an error, so we shouldn't care
> >>what RTL we emit as long as it doesn't cause an ICE).
> >   <...>
> >
> >>diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> >>index e503891..b8d59c6 100644
> >>--- a/gcc/config/arm/arm.c
> >>+++ b/gcc/config/arm/arm.c
> >>@@ -12124,6 +12124,11 @@ neon_make_constant (rtx vals)
> >>if (n_const == n_elts)
> >>const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
> >>  }
> >>+  else if (vals == const0_rtx)
> >>+/* Something invalid, perhaps from expanding an intrinsic
> >>+   which requires a constant argument, where a variable argument
> >>+   was passed.  */
> >>+ return const0_rtx;
> >>else
> >>  gcc_unreachable ();
> >>
> >>I'm not a fan of this as the function has a precondition that its argument 
> >>is
> >>a PARALLEL or a CONST_VECTOR and special-casing const0_rtx breaks that. I'd
> >>rather we tried fixing this closer to the error source.  Can you try the
> >>suggestion above instead please?
> >Your suggestion doesn't quite work, but this is pretty close to it. Rather
> >than try to guess at the correct mode for CONST0_RTX (we can't just use
> >mode[argc] as that will get you the scalar mode), we can just return target
> >directly. That will ensure we've given something valid back in the correct
> >mode, even if it is not all that useful.
> 
> Yeah, that actually looks better.
> 
> >Bootstrapped on arm-none-linux-gnueabihf. OK?
> 
> Ok.

Thanks.

The patch applies cleanly to gcc-7-branch and gcc-6-branch, both of which
I've bootstrapped and tested on arm-none-linux-gnueabihf without issue.

Is it OK for me to apply these backports and close out the PR (it is
marked as a 6/7 regression).

Thanks,
James


> >---
> >gcc/
> >
> >2017-06-15  James Greenhalgh  
> >
> > PR target/71778
> > * config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET
> > if given a non-constant argument for an intrinsic which requires a
> > constant.
> >
> >gcc/testsuite/
> >
> >2017-06-15  James Greenhalgh  
> >
> > PR target/71778
> > * gcc.target/arm/pr71778.c: New.
> >
> 


Re: [PATCH 3/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Martin Sebor

On 06/11/2017 07:34 PM, Xi Ruoyao wrote:

This patch adds warning option -Wstring-plus-char for C/C++.



+void
+warn_if_string_plus_char (location_t loc, tree ptrtype, tree inttype)
+{
+  if (POINTER_TYPE_P (ptrtype)
+  && type_main_variant_is_char (TREE_TYPE (ptrtype))
+  && type_main_variant_is_char (inttype))
+warning_at (loc, OPT_Wstring_plus_char,
+"add %qT to string pointer %qT does not append "
+"to the string", inttype, ptrtype);

The text of the warning doesn't read like a grammatically correct
sentence.  ("Adding a to b" would be correct.)

That said, I wonder if it should also be made more accurate.
Based on c-c++-common/Wstring-plus-char.c for the snippet below

  char *a;
  const char *b;
  const char c = 'c';
  const char *d = a + c;

it will print

  warning: add 'char' to 'char *' does not append to the string

even though no string is apparent or need to exist in the program
(a could point to an array of chars with no terminating NUL).
I see Clang prints something similar (modulo the bad grammar) but
I think it might be clearer if the warning instead read something
like:

  adding 'char' to 'char *' does not append to a string

or (if the warning were to trigger only for character constants
like in Clang):

  adding 'char' to 'char *' does not append 'c' to the first operand

i.e., if the warning also included the value of the character
constant.

Martin


Re: [Patch ARM] Fix PR71778

2017-06-19 Thread Kyrill Tkachov


On 19/06/17 17:16, James Greenhalgh wrote:

On Fri, Jun 16, 2017 at 11:07:41AM +0100, Kyrill Tkachov wrote:

On 16/06/17 10:07, James Greenhalgh wrote:

On Wed, Jun 14, 2017 at 11:21:30AM +0100, Kyrill Tkachov wrote:

   <...>


That movv2di expander is the one in vec-common.md that ends up calling
neon_make_constant. I wonder why const0_rtx passed its predicate check
(that would require a V2DImode vector of zeroes rather than a const0_rtx).
Perhaps the midend code at this point doesn't check the operand predicate.

In the builtin expansion code that you quoted I wonder wonder if we could fail
more gracefully by returning CONST0_RTX (mode[argc]) to match the expected
mode of the operand (we've already emitted an error, so we shouldn't care
what RTL we emit as long as it doesn't cause an ICE).

   <...>


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e503891..b8d59c6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12124,6 +12124,11 @@ neon_make_constant (rtx vals)
if (n_const == n_elts)
const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
  }
+  else if (vals == const0_rtx)
+/* Something invalid, perhaps from expanding an intrinsic
+   which requires a constant argument, where a variable argument
+   was passed.  */
+ return const0_rtx;
else
  gcc_unreachable ();

I'm not a fan of this as the function has a precondition that its argument is
a PARALLEL or a CONST_VECTOR and special-casing const0_rtx breaks that. I'd
rather we tried fixing this closer to the error source.  Can you try the
suggestion above instead please?

Your suggestion doesn't quite work, but this is pretty close to it. Rather
than try to guess at the correct mode for CONST0_RTX (we can't just use
mode[argc] as that will get you the scalar mode), we can just return target
directly. That will ensure we've given something valid back in the correct
mode, even if it is not all that useful.

Yeah, that actually looks better.


Bootstrapped on arm-none-linux-gnueabihf. OK?

Ok.

Thanks.

The patch applies cleanly to gcc-7-branch and gcc-6-branch, both of which
I've bootstrapped and tested on arm-none-linux-gnueabihf without issue.

Is it OK for me to apply these backports and close out the PR (it is
marked as a 6/7 regression).


Ok.
Thanks,
Kyrill


Thanks,
James



---
gcc/

2017-06-15  James Greenhalgh  

PR target/71778
* config/arm/arm-builtins.c (arm_expand_builtin_args): Return TARGET
if given a non-constant argument for an intrinsic which requires a
constant.

gcc/testsuite/

2017-06-15  James Greenhalgh  

PR target/71778
* gcc.target/arm/pr71778.c: New.





Re: [PATCH 6/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Martin Sebor

On 06/11/2017 07:39 PM, Xi Ruoyao wrote:

This patch adds document of -Wstring-plus-int and -Wstring-plus-char.


+@item -Wstring-plus-char
+@opindex Wstring-plus-char
+@opindex Wno-string-plus-char
+Warn for adding a character to a string pointer, which seems like a failed
+attempt to append to the string.  For example, this option will issue a
+warning for the code below.

The text above should be corrected for grammar:

  Warn when a character is added to a character pointer.  Such
  addition it may be an incorrect attempt to append the character
  to a string.

Similarly, the text below should be corrected (though as I mentioned
in my earlier response to one of the prior patches, I would prefer
to see the out-of-bounds warning(s) phrased in terms the (undefined)
effects of the addition and included in -Warray-bounds rather than
adding a new option based on assumptions about the intended effects,
and extended to all arrays of known bound rather than applied only
to string literals).

+@item -Wstring-plus-int
+@opindex Wstring-plus-int
+@opindex Wno-string-plus-int
+Warn for adding an integer to a string literal, which may forms a pointer
+out of the bound of the string.  The typical examples this warns about are
+@samp{"abc" + 'd'}, @samp{"abc" + getchar()} and @samp{"abc" + 5}, but
+not @samp{"abc" + 1}.

  Warn when an integer constant in excess of its upper bound is
  added to a string literal.

Martin


Re: [PATCH 0/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Martin Sebor

On 06/11/2017 07:26 PM, Xi Ruoyao wrote:

Hi,

I've implemented -Wstring-plus-int and -Wstring-plus-char (like their
counterpart in Clang) for GCC.


From the Clang patch(*) it only "warns when a character literal is
added (using '+') to a variable with type 'char *' (or any other
pointer to character type).

Based on the tests in this patch, GCC will warn for non-literal
operands as well.  Is that intentional, and if so, what is your
rationale for the difference?

Martin

[*] 
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20131021/091671.html


Re: [PATCH 2/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Martin Sebor

On 06/11/2017 07:32 PM, Xi Ruoyao wrote:

This patch adds warning option -Wstring-plus-int for C/C++.

gcc/ChangeLog:

2017-06-12  Xi Ruoyao  

* c-family/c.opt: New option -Wstring-plus-int.
* c-family/c-common.c (pointer_int_sum): Checking for
-Wstring-plus-int.


This is a very useful warning but I would suggest to word it
in terms of what it actually does rather than what it might be
intended to do.  E.g., for

  const char *p = "123" + 7;

issue

  warning: offset 7 exceeds the upper bound 3 of the array

rather than

  warning: adding 'int' to a string does not append to the string

(I have trouble envisioning on what grounds someone might expect
the addition to have this effect.)

Given that the warning only triggers when the upper bound of
an array is exceeded I would also suggest to consider including
the warning in -Warray-bounds.  (With that, it would be useful
to also detect exceeding the upper bound of non-literal arrays
as well.)

Martin


[PATCH] Fix UB in tree-chkp.c

2017-06-19 Thread Jakub Jelinek
Hi!

bootstrap-ubsan shows a couple of:
../../gcc/tree-chkp.c:694:37: runtime error: shift exponent 63 is too large for 
32-bit type 'int'
errors.

1 << (TYPE_PRECISION (ptr_type_node) - 1)
should have been obviously
HOST_WIDE_INT_1U << (TYPE_PRECISION (ptr_type_node) - 1)
but even then, it is 1) unnecessarily complicated and expensive way
to create a pointer with just the MSB bit set and all other clear and
2) will not work if ptr_type_node has higher precision than HWI (just
theoretical possibility now)
For 1), e.g. fold_convert (ptr_type_node, integer_zero_node) is
better written as build_int_cst (ptr_type_node, 0), but still
we can actually avoid the fold_build_pointer_plus_hwi and folding
it altogether.

Bootstrapped/regtested on x86_64-linux and i686-linux (both normal
and bootstrap-ubsan), ok for trunk?

2017-07-19  Jakub Jelinek  

* tree-chkp.c (chkp_get_hard_register_var_fake_base_address):
Rewritten to avoid overflow for > 32-bit pointers.

--- gcc/tree-chkp.c.jj  2017-06-12 12:41:55.0 +0200
+++ gcc/tree-chkp.c 2017-06-19 12:57:24.670478544 +0200
@@ -690,9 +690,8 @@ chkp_erase_completed_bounds (void)
 static tree
 chkp_get_hard_register_var_fake_base_address ()
 {
-  tree base = fold_convert (ptr_type_node, integer_zero_node);
-  unsigned HOST_WIDE_INT offset = 1 << (TYPE_PRECISION (ptr_type_node) - 1);
-  return fold_build_pointer_plus_hwi (base, offset);
+  int prec = TYPE_PRECISION (ptr_type_node);
+  return wide_int_to_tree (ptr_type_node, wi::min_value (prec, SIGNED));
 }
 
 /* If we check bounds for a hard register variable, we cannot


Jakub


Re: [PATCH] Fix UB in tree-ssa-structalias.c

2017-06-19 Thread Richard Biener
On June 19, 2017 7:46:03 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>Another easy to fix bug reported by bootstrap-ubsan.
>We check that rhsunitoffset fits into shwi, but even if it does,
>8x that might not, in which case we trigger UB.
>Fixed by doing the multiplication in unsigned HWI type to make it well
>defined.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux (both normal
>and bootstrap-ubsan), ok for trunk?

OK.

Richard.

>2017-06-19  Jakub Jelinek  
>
>   * tree-ssa-structalias.c (get_constraint_for_ptr_offset): Multiply
>   in UWHI to avoid undefined overflow.
>
>--- gcc/tree-ssa-structalias.c.jj  2017-05-24 11:59:06.0 +0200
>+++ gcc/tree-ssa-structalias.c 2017-06-19 14:10:50.989594911 +0200
>@@ -3087,7 +3087,7 @@ get_constraint_for_ptr_offset (tree ptr,
>   {
> /* Make sure the bit-offset also fits.  */
> HOST_WIDE_INT rhsunitoffset = soffset.to_shwi ();
>-rhsoffset = rhsunitoffset * BITS_PER_UNIT;
>+rhsoffset = rhsunitoffset * (unsigned HOST_WIDE_INT) BITS_PER_UNIT;
> if (rhsunitoffset != rhsoffset / BITS_PER_UNIT)
>   rhsoffset = UNKNOWN_OFFSET;
>   }
>
>   Jakub



Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Joseph Myers
On Mon, 19 Jun 2017, Jeff Law wrote:

> A key point to remember is that you can never have an allocation
> (potentially using more than one allocation site) which is larger than a
> page without probing the page.

There's a platform ABI issue here.  At least some kernel fixes for these 
stack issues, as I understand it, increase the size of the stack guard to 
more than a single page.  It would be possible to define the ABI to 
require such a larger guard for protection and so reduce the number of 
(non-alloca/VLA-using) functions that need probes generated, depending on 
whether a goal is to achieve security on kernels without such a fix.  
(Thinking in terms of how to get to enabling such probes by default.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jakub Jelinek
On Mon, Jun 19, 2017 at 05:50:56PM +, Joseph Myers wrote:
> On Mon, 19 Jun 2017, Jeff Law wrote:
> 
> > A key point to remember is that you can never have an allocation
> > (potentially using more than one allocation site) which is larger than a
> > page without probing the page.
> 
> There's a platform ABI issue here.  At least some kernel fixes for these 
> stack issues, as I understand it, increase the size of the stack guard to 
> more than a single page.  It would be possible to define the ABI to 
> require such a larger guard for protection and so reduce the number of 
> (non-alloca/VLA-using) functions that need probes generated, depending on 
> whether a goal is to achieve security on kernels without such a fix.  
> (Thinking in terms of how to get to enabling such probes by default.)

Note that the kernel imposed stack guard page is just one thing (eventhough
probably the most common), POSIX threads allow to specify the guard size
for stack sizes too and increasing the guard size too much in that case is a
bigger problem than just doing it for a single initial thread.
Also, people can override it, if they use 0 guard size, we can say it is
their problem to allow this kind of exploits, but asking them to use much
larger guard sizes might be a problem for apps that create many threads.

Jakub


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Richard Biener
On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek  wrote:
>On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote:
>> After much poking around I concluded that we really need to implement
>> allocation and probing via a "moving sp" strategy.   Probing into
>> unallocated areas runs afoul of valgrind, so that's a non-starter.
>> 
>> Allocating stack space, then probing the pages within the space is
>> vulnerable to async signal delivery between the allocation point and
>the
>> probe point.  If that occurs the signal handler could end up running
>on
>> a stack that has collided with the heap.
>> 
>> Ideally we would allocate and probe a page as an atomic unit (which
>is
>> feasible on PPC).  Alternatively, due to ISA restrictions, allocate a
>> page, then probe the page as distinct instructions.  The latter still
>> has a race, but we'd have to take the async signal in a single
>> instruction window.
>
>And if the allocation is only a page at a time, the single insn race
>window
>can be mitigated in the kernel (probe (read-only is fine) the word at
>the
>stack when setting up a signal frame for async signal).
>
>> So, time to open the discussion to questions & comments.
>> 
>> I've got patches I need to cleanup and post for comments that
>implement
>> this for x86, ppc, aarch64 and s390.  x86 and ppc are IMHO in good
>> shape.  THere's an unhandled case for s390.  I've got evaluation
>still
>> to do on aarch64.
>
>In the patches Jeff is going to post, we have (at least for
>-fasynchronous-unwind-tables which is on by default on e.g. x86)
>precise unwind info even with the new stack check mode.
>ira.c currently has:
> /* We need the frame pointer to catch stack overflow exceptions if
>   the stack pointer is moving (as for the alloca case just above).  */
>   || (STACK_CHECK_MOVING_SP
>   && flag_stack_check
>   && flag_exceptions
>   && cfun->can_throw_non_call_exceptions)
>For alloca we have a frame pointer for other reasons, the question is
>if we really need this hunk even if we provided proper unwind info
>even for the Ada -fstack-check mode.  Or, if we provide proper unwind
>info
>for -fasynchronous-unwind-tables, if the above could not be also
>&& !flag_asynchronous_unwind_tables.  Eric, what exactly is the reason
>for the above, is it just lack of proper CFI notes, or something
>different?
>
>Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>while it is shorter, is it actually faster or as slow as movq $0,
>(%rsp)
>or movl $0, (%esp) ?

It at least has the chance of bypassing all of the store queue in CPUs and thus 
cause no cacheline allocation or trigger prefetching.

Not sure if any of that is done though.

Performance counters might tell.

Otherwise incrementing SP by 4095 and then pushing al would work as well (and 
be similarly short as the or).

Richard.

Richard.

>   Jakub



Re: [patch, libfortran] Speed up cshift for dim > 1

2017-06-19 Thread Thomas Koenig

Hi Dominique,


For the record, the following CSHIFT is still 4 times slower than the DO loop


I have looked into this a bit. The main reason is that, unlike cshift0
(without the array as shift) we do not generate individual functions to
call for the usual data types, we use memcpy with a size determined
at run-time by looking at the array descriptor.  This is, of course,
quite slow.

So, the solution should probably be to generate functions like
cshift1_4_i4 and then call them. This would generate a bit of
bloat, but if people use this in a serios way, I think
this is OK.

This was already done for cshift0 a few years ago. What
we have there looks like (intrinsics/cshift0.c)

  type_size = GFC_DTYPE_TYPE_SIZE (array);

  switch(type_size)
{
case GFC_DTYPE_LOGICAL_1:
case GFC_DTYPE_INTEGER_1:
case GFC_DTYPE_DERIVED_1:
  cshift0_i1 ((gfc_array_i1 *)ret, (gfc_array_i1 *) array, shift, 
which);

  return;

case GFC_DTYPE_LOGICAL_2:
case GFC_DTYPE_INTEGER_2:
  cshift0_i2 ((gfc_array_i2 *)ret, (gfc_array_i2 *) array, shift, 
which);

  return;

so this is something that we could also emulate.

A bit of work, but nothing that looks un-doable.

Regards

Thomas


Re: [PATCH] Fix UB in tree-chkp.c

2017-06-19 Thread Ilya Enkovich
2017-06-19 20:43 GMT+03:00 Jakub Jelinek :
> Hi!
>
> bootstrap-ubsan shows a couple of:
> ../../gcc/tree-chkp.c:694:37: runtime error: shift exponent 63 is too large 
> for 32-bit type 'int'
> errors.
>
> 1 << (TYPE_PRECISION (ptr_type_node) - 1)
> should have been obviously
> HOST_WIDE_INT_1U << (TYPE_PRECISION (ptr_type_node) - 1)
> but even then, it is 1) unnecessarily complicated and expensive way
> to create a pointer with just the MSB bit set and all other clear and
> 2) will not work if ptr_type_node has higher precision than HWI (just
> theoretical possibility now)
> For 1), e.g. fold_convert (ptr_type_node, integer_zero_node) is
> better written as build_int_cst (ptr_type_node, 0), but still
> we can actually avoid the fold_build_pointer_plus_hwi and folding
> it altogether.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (both normal
> and bootstrap-ubsan), ok for trunk?

OK. Thanks for the fix!

Ilya

>
> 2017-07-19  Jakub Jelinek  
>
> * tree-chkp.c (chkp_get_hard_register_var_fake_base_address):
> Rewritten to avoid overflow for > 32-bit pointers.
>
> --- gcc/tree-chkp.c.jj  2017-06-12 12:41:55.0 +0200
> +++ gcc/tree-chkp.c 2017-06-19 12:57:24.670478544 +0200
> @@ -690,9 +690,8 @@ chkp_erase_completed_bounds (void)
>  static tree
>  chkp_get_hard_register_var_fake_base_address ()
>  {
> -  tree base = fold_convert (ptr_type_node, integer_zero_node);
> -  unsigned HOST_WIDE_INT offset = 1 << (TYPE_PRECISION (ptr_type_node) - 1);
> -  return fold_build_pointer_plus_hwi (base, offset);
> +  int prec = TYPE_PRECISION (ptr_type_node);
> +  return wide_int_to_tree (ptr_type_node, wi::min_value (prec, SIGNED));
>  }
>
>  /* If we check bounds for a hard register variable, we cannot
>
>
> Jakub


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Richard Kenner
Out of curiousity, does the old Alpha/VMS stack-checking API meet the
requirements?  From what I recall, I think it does.


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Florian Weimer
On 06/19/2017 07:50 PM, Joseph Myers wrote:
> There's a platform ABI issue here.  At least some kernel fixes for these 
> stack issues, as I understand it, increase the size of the stack guard to 
> more than a single page.  It would be possible to define the ABI to 
> require such a larger guard for protection and so reduce the number of 
> (non-alloca/VLA-using) functions that need probes generated, depending on 
> whether a goal is to achieve security on kernels without such a fix.  
> (Thinking in terms of how to get to enabling such probes by default.)

I think architectures such as aarch64 without implied stack probing as
part of the function call sequence would benefit most from an ABI
agreement (splitting the probing responsibility in some way between
caller and callee).  For architectures with some form of implied
probing, the complications from negotiating a guard region size between
GCC, kernel, glibc, and perhaps even applications (see Jakub's comment
about thread stacks) outweigh the performance gains.

Thanks,
Florian


Re: [PATCH 2/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Xi Ruoyao
On 2017-06-19 10:51 -0600, Martin Sebor wrote:
> On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
> > This patch adds warning option -Wstring-plus-int for C/C++.
> > 
> > gcc/ChangeLog:
> > 
> > 2017-06-12  Xi Ruoyao  
> > 
> > * c-family/c.opt: New option -Wstring-plus-int.
> > * c-family/c-common.c (pointer_int_sum): Checking for
> > -Wstring-plus-int.
> 
> This is a very useful warning but I would suggest to word it
> in terms of what it actually does rather than what it might be
> intended to do.  E.g., for
> 
>    const char *p = "123" + 7;
> 
> issue
> 
>    warning: offset 7 exceeds the upper bound 3 of the array
> 
> rather than
> 
>    warning: adding 'int' to a string does not append to the string
> 
> (I have trouble envisioning on what grounds someone might expect
> the addition to have this effect.)

How about something like `const char *p = "123" + getchar();` ?

I'd like this for -Wstring-plus-int=1:

warning: adding 'int' to a string does not append to the string
[-Wstring-plus-int=]
const char *p = "123" + 7;
  ^
note: offset 7 exceeds the size 4 of the string, using the result
may lead to undefined behaviour.

(Clang permits "123" + 4 since its result is well defined in standard.
Maybe we could permit "123" + 3 only.)

For level 1 we only warn for such obvious mistakes. And for
-Wstring-plus-int=2:

warning: adding 'int' to a string does not append to the string
[-Wstring-plus-int=]
const char *p = "123" + getchar();
  ^
note: the offset may exceed the size of the string.

(Clang also warn while it's impossible to know whether the offset
exceeds.  It seems aggressively so we can make it level 2.)

> Given that the warning only triggers when the upper bound of
> an array is exceeded I would also suggest to consider including
> the warning in -Warray-bounds.  (With that, it would be useful
> to also detect exceeding the upper bound of non-literal arrays
> as well.)

We can let -Warray-bounds enable -Wstring-plus-int=1, but not =2.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


C++ PATCH for c++/81073, constexpr and static var in statement-expression

2017-06-19 Thread Jason Merrill
The testcase successfully compiles, but then fails to link because
we've optimized away the declaration of the variable.  We catch this
in potential_constant_expression_1, but this path wasn't calling it.

Fixed on trunk by always calling that function, not just in templates.
With that change, I needed to adjust pce1 to not require that a
variable be initialized yet, so that we can check it within the
initializer.  To avoid that causing some missed errors,
decl_maybe_constant_var_p now considers the initializer if it is
already known.

Fixed on 7 branch more simply, by calling p_c_e from
cxx_eval_constant_expression.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 46761de0ab74a6983c931c13bfb78c095ae4f651
Author: Jason Merrill 
Date:   Sat Jun 17 00:00:21 2017 -0400

PR c++/81073 - constexpr and static var in statement-expression.

* typeck2.c (store_init_value): Always call
require_potential_constant_expression.
* pt.c (convert_nontype_argument): Likewise.
* constexpr.c (potential_constant_expression_1): Adjust message.
Use decl_maybe_constant_var_p instead of decl_constant_var_p.
* decl2.c (decl_maybe_constant_var_p): Consider initializer.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ae24e40..569a247 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -5212,10 +5212,11 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict,
   if (want_rval
  && !var_in_maybe_constexpr_fn (t)
  && !type_dependent_expression_p (t)
- && !decl_constant_var_p (t)
+ && !decl_maybe_constant_var_p (t)
  && (strict
  || !CP_TYPE_CONST_NON_VOLATILE_P (TREE_TYPE (t))
- || !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (t))
+ || (DECL_INITIAL (t)
+ && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (t)))
  && COMPLETE_TYPE_P (TREE_TYPE (t))
  && !is_really_empty_class (TREE_TYPE (t)))
 {
@@ -5540,21 +5541,21 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict,
{
  if (flags & tf_error)
error_at (DECL_SOURCE_LOCATION (tmp), "%qD declared "
- "% in % function", tmp);
+ "% in % context", tmp);
  return false;
}
  else if (CP_DECL_THREAD_LOCAL_P (tmp))
{
  if (flags & tf_error)
error_at (DECL_SOURCE_LOCATION (tmp), "%qD declared "
- "% in % function", tmp);
+ "% in % context", tmp);
  return false;
}
  else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp))
{
  if (flags & tf_error)
error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized "
- "variable %qD in % function", tmp);
+ "variable %qD in % context", tmp);
  return false;
}
}
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 72239ec..a475146 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -4145,10 +4145,19 @@ decl_maybe_constant_var_p (tree decl)
 /* A proxy isn't constant.  */
 return false;
   if (TREE_CODE (type) == REFERENCE_TYPE)
-/* References can be constant.  */
+/* References can be constant.  */;
+  else if (CP_TYPE_CONST_NON_VOLATILE_P (type)
+  && INTEGRAL_OR_ENUMERATION_TYPE_P (type))
+/* And const integers.  */;
+  else
+return false;
+
+  if (DECL_INITIAL (decl)
+  && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl))
+/* We know the initializer, and it isn't constant.  */
+return false;
+  else
 return true;
-  return (CP_TYPE_CONST_NON_VOLATILE_P (type)
- && INTEGRAL_OR_ENUMERATION_TYPE_P (type));
 }
 
 /* Complain that DECL uses a type with no linkage.  In C++98 mode this is
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e5238ad..69ca929 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6585,10 +6585,10 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
  if (complain & tf_error)
{
  int errs = errorcount, warns = warningcount + werrorcount;
- if (processing_template_decl
- && !require_potential_constant_expression (expr))
-   return NULL_TREE;
- expr = cxx_constant_value (expr);
+ if (!require_potential_constant_expression (expr))
+   expr = error_mark_node;
+ else
+   expr = cxx_constant_value (expr);
  if (errorcount > errs || warningcount + werrorcount > warns)
inform (loc, "in template argument for type %qT ", type);
  if (expr == error_mark_node)
diff --git 

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jakub Jelinek
On Mon, Jun 19, 2017 at 11:45:13AM -0600, Jeff Law wrote:
> On 06/19/2017 11:29 AM, Jakub Jelinek wrote:
> > 
> > Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
> > while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
> > or movl $0, (%esp) ?
> Florian raised this privately to me as well.  THere's a couple issues.
> 
> 1. Is there a performance penalty/gain for sub-word operations?  If not,
>we can improve things slighly there.  Even if it's performance
>neutral we can probably do better on code size.

CCing Uros and Honza here, I believe there are at least on x86 penalties
for 2-byte, maybe for 1-byte and then sometimes some stalls when you
write or read in a different size from a recent write or read.

> Thus I find myself rethinking is this a probing policy option or should
> it just be another variant of -fstack-check=.

Yeah, IMHO it is just another way of stack probing next to generic and
specific, and for users it would be easier to write -fstack-check=whatever
than -fstack-check -fstack-check-probe=whatever

Jakub


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Richard Biener
On June 19, 2017 8:00:19 PM GMT+02:00, Richard Biener 
 wrote:
>On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek 
>wrote:
>>On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote:
>>> After much poking around I concluded that we really need to
>implement
>>> allocation and probing via a "moving sp" strategy.   Probing into
>>> unallocated areas runs afoul of valgrind, so that's a non-starter.
>>> 
>>> Allocating stack space, then probing the pages within the space is
>>> vulnerable to async signal delivery between the allocation point and
>>the
>>> probe point.  If that occurs the signal handler could end up running
>>on
>>> a stack that has collided with the heap.
>>> 
>>> Ideally we would allocate and probe a page as an atomic unit (which
>>is
>>> feasible on PPC).  Alternatively, due to ISA restrictions, allocate
>a
>>> page, then probe the page as distinct instructions.  The latter
>still
>>> has a race, but we'd have to take the async signal in a single
>>> instruction window.
>>
>>And if the allocation is only a page at a time, the single insn race
>>window
>>can be mitigated in the kernel (probe (read-only is fine) the word at
>>the
>>stack when setting up a signal frame for async signal).
>>
>>> So, time to open the discussion to questions & comments.
>>> 
>>> I've got patches I need to cleanup and post for comments that
>>implement
>>> this for x86, ppc, aarch64 and s390.  x86 and ppc are IMHO in good
>>> shape.  THere's an unhandled case for s390.  I've got evaluation
>>still
>>> to do on aarch64.
>>
>>In the patches Jeff is going to post, we have (at least for
>>-fasynchronous-unwind-tables which is on by default on e.g. x86)
>>precise unwind info even with the new stack check mode.
>>ira.c currently has:
>> /* We need the frame pointer to catch stack overflow exceptions
>if
>>   the stack pointer is moving (as for the alloca case just above). 
>*/
>>   || (STACK_CHECK_MOVING_SP
>>   && flag_stack_check
>>   && flag_exceptions
>>   && cfun->can_throw_non_call_exceptions)
>>For alloca we have a frame pointer for other reasons, the question is
>>if we really need this hunk even if we provided proper unwind info
>>even for the Ada -fstack-check mode.  Or, if we provide proper unwind
>>info
>>for -fasynchronous-unwind-tables, if the above could not be also
>>&& !flag_asynchronous_unwind_tables.  Eric, what exactly is the reason
>>for the above, is it just lack of proper CFI notes, or something
>>different?
>>
>>Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>while it is shorter, is it actually faster or as slow as movq $0,
>>(%rsp)
>>or movl $0, (%esp) ?
>
>It at least has the chance of bypassing all of the store queue in CPUs
>and thus cause no cacheline allocation or trigger prefetching.
>
>Not sure if any of that is done though.
>
>Performance counters might tell.
>
>Otherwise incrementing SP by 4095 and then pushing al would work as
>well (and be similarly short as the or).

Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 would 
solve the signal atomicity as well. Might be larger and somewhat interfere with 
CPUs stack engine.  Who knows...

Richard.

>Richard.
>
>Richard.
>
>>  Jakub



Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Florian Weimer
On 06/19/2017 08:02 PM, Richard Biener wrote:
> Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 
> would solve the signal atomicity as well. Might be larger and somewhat 
> interfere with CPUs stack engine.  Who knows...

On x86-64, PUSH REG is just a single byte, so for sequences that have to
move SP and probe, it's the shortest possible sequence AFAIK.  NEG/NOT
can take an offsettable memory operand, but it's three bytes.

(I believe the use of ORQ in the current -fstack-check probes might be
an oversight.  For a start, the REX prefix seems completely unnecessary.)

Thanks,
Florian


Re: Add dg-add-options feature stack_size

2017-06-19 Thread Mike Stump
On Jun 19, 2017, at 10:11 AM, Tom de Vries  wrote:
> 
> I came across dg-add-options, and wondered if adding a dg-add-options feature 
> stack_size is a better way to make STACK_SIZE available.

I don't have a strong opinion here, but, it does look a tad simpler and nicer.

> OK if complete and tested?

Ok.

Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.

2017-06-19 Thread Segher Boessenkool
On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote:
> Yeah, and I'm not suggesting we change the logic there (sorry if the
> description was misleading).  Instead I'm proposing that we handle more
> cases for parallels to not return zero.

Right.  My test run is half way through, will have results later --
your change looks good to me, but it is always surprising whether
better costs help or not, or even *hurt* good code generation (things
are just too tightly tuned to the current behaviour, so some things
may need retuning).


Segher


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 11:29 AM, Jakub Jelinek wrote:
> 
> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
> while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
> or movl $0, (%esp) ?
Florian raised this privately to me as well.  THere's a couple issues.

1. Is there a performance penalty/gain for sub-word operations?  If not,
   we can improve things slighly there.  Even if it's performance
   neutral we can probably do better on code size.

2. I would *prefer* if the probe actually changed the value, and the
   more destructive, the better :-0  It allows catching something gone
   wild easier.


These are pretty minor implementation details IMHO, but now is a good
time to revisit the probe style.


I'm mostly concerned about holes in the basic probing strategy, how
we're going to deal with the additional architectures (I can't imagine
we'll want to go through the pain of a custom implementation for each
target) and the UI.

On the last topic.  When we first started this work, it appeared like we
could make most targets with -fstack-check=specific support work
(possibly with some inefficiency) by just dropping the
probe-ahead-of-need aspects of the existing implementation.

ie, we'd drop the requirement for being able to run the signal handler
and stop probing 2 pages beyond the current stack requirements and
instead just probe up to what the current function needed.

This felt like a "probing policy" (ahead-of-need vs as-needed).

But when we ran into the issues with valgrind it became clear that we
really couldn't safely use the current port support for
-fstack-check=specific.

Thus I find myself rethinking is this a probing policy option or should
it just be another variant of -fstack-check=.

Jeff


[PATCH] Fix UB in tree-ssa-structalias.c

2017-06-19 Thread Jakub Jelinek
Hi!

Another easy to fix bug reported by bootstrap-ubsan.
We check that rhsunitoffset fits into shwi, but even if it does,
8x that might not, in which case we trigger UB.
Fixed by doing the multiplication in unsigned HWI type to make it well
defined.

Bootstrapped/regtested on x86_64-linux and i686-linux (both normal
and bootstrap-ubsan), ok for trunk?

2017-06-19  Jakub Jelinek  

* tree-ssa-structalias.c (get_constraint_for_ptr_offset): Multiply
in UWHI to avoid undefined overflow.

--- gcc/tree-ssa-structalias.c.jj   2017-05-24 11:59:06.0 +0200
+++ gcc/tree-ssa-structalias.c  2017-06-19 14:10:50.989594911 +0200
@@ -3087,7 +3087,7 @@ get_constraint_for_ptr_offset (tree ptr,
{
  /* Make sure the bit-offset also fits.  */
  HOST_WIDE_INT rhsunitoffset = soffset.to_shwi ();
- rhsoffset = rhsunitoffset * BITS_PER_UNIT;
+ rhsoffset = rhsunitoffset * (unsigned HOST_WIDE_INT) BITS_PER_UNIT;
  if (rhsunitoffset != rhsoffset / BITS_PER_UNIT)
rhsoffset = UNKNOWN_OFFSET;
}

Jakub


Forward list default default and move constructors

2017-06-19 Thread François Dumont

Hi

Here is the patch to default the default and move constructors on 
the std::forward_list. Putting a move constructor on _Fwd_list_node_base 
helped limiting the code impact of this patch. It doesn't have any side 
effect as iterator types using this base type are not defining any move 
semantic.


I also took the time to optimize the move constructor with 
allocator when allocator is always equal. It avoids initializing an 
empty forward list for nothing.


I think it is fine but could we have an abi issue because of the 
change in forward_list.tcc ?


* include/bits/forward_list.h
(_Fwd_list_node_base(_Fwd_list_node_base&&)): New.
(_Fwd_list_impl()): Add noexcept qualification.
(_Fwd_list_impl(_Fwd_list_impl&&)): New, default.
(_Fwd_list_impl(_Fwd_list_impl&&, _Node_alloc_type&&)): New.
(_Fwd_list_base()): Default.
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, true_type)): New.
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, false_type)): 
New.

(_Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a)): Use
latter.
(_Fwd_list_base(_Fwd_list_base&&)): Default.
(forward_list<>()): Default.
(forward_list<>(forward_list&&)): Default.
* include/bits/forward_list.tcc
(_Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, false_type)): 
New.

* testsuite/23_containers/forward_list/allocator/default_init.cc: New.

Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h
index f319b7f..312cd9e 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -53,6 +53,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   struct _Fwd_list_node_base
   {
 _Fwd_list_node_base() = default;
+_Fwd_list_node_base(_Fwd_list_node_base&& __x) noexcept
+  : _M_next(__x._M_next)
+{ __x._M_next = nullptr; }
 
 _Fwd_list_node_base* _M_next = nullptr;
 
@@ -284,15 +287,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 _Fwd_list_node_base _M_head;
 
 	_Fwd_list_impl()
-: _Node_alloc_type(), _M_head()
+	  noexcept( noexcept(_Node_alloc_type()) )
+	: _Node_alloc_type()
 	{ }
 
 	_Fwd_list_impl(const _Node_alloc_type& __a)
-: _Node_alloc_type(__a), _M_head()
+	: _Node_alloc_type(__a)
+	{ }
+
+	_Fwd_list_impl(_Fwd_list_impl&&) = default;
+
+	_Fwd_list_impl(_Fwd_list_impl&& __fl, _Node_alloc_type&& __a)
+	: _Node_alloc_type(std::move(__a)), _M_head(std::move(__fl._M_head))
 	{ }
 
 	_Fwd_list_impl(_Node_alloc_type&& __a)
-	: _Node_alloc_type(std::move(__a)), _M_head()
+	: _Node_alloc_type(std::move(__a))
 	{ }
   };
 
@@ -311,20 +321,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _M_get_Node_allocator() const noexcept
   { return this->_M_impl; }
 
-  _Fwd_list_base()
-  : _M_impl() { }
+private:
+  _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a, std::true_type)
+	: _M_impl(std::move(__lst._M_impl), std::move(__a))
+  { }
+
+  _Fwd_list_base(_Fwd_list_base&&, _Node_alloc_type&&, std::false_type);
+
+public:
+  _Fwd_list_base() = default;
 
   _Fwd_list_base(_Node_alloc_type&& __a)
   : _M_impl(std::move(__a)) { }
 
-  _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a);
+  _Fwd_list_base(_Fwd_list_base&& __lst, _Node_alloc_type&& __a)
+  : _Fwd_list_base(std::move(__lst), std::move(__a),
+		   typename _Node_alloc_traits::is_always_equal{})
+  { }
 
-  _Fwd_list_base(_Fwd_list_base&& __lst)
-  : _M_impl(std::move(__lst._M_get_Node_allocator()))
-  {
-	this->_M_impl._M_head._M_next = __lst._M_impl._M_head._M_next;
-	__lst._M_impl._M_head._M_next = 0;
-  }
+  _Fwd_list_base(_Fwd_list_base&&) = default;
 
   ~_Fwd_list_base()
   { _M_erase_after(&_M_impl._M_head, 0); }
@@ -436,10 +451,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   /**
*  @brief  Creates a %forward_list with no elements.
*/
-  forward_list()
-  noexcept(is_nothrow_default_constructible<_Node_alloc_type>::value)
-  : _Base()
-  { }
+  forward_list() = default;
 
   /**
*  @brief  Creates a %forward_list with no elements.
@@ -532,15 +544,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /**
*  @brief  The %forward_list move constructor.
-   *  @param  __list  A %forward_list of identical element and allocator
-   *  types.
+   *  @param  A %forward_list of identical element and allocator types.
*
-   *  The newly-created %forward_list contains the exact contents of @a
-   *  __list. The contents of @a __list are a valid, but unspecified
-   *  %forward_list.
+   *  The newly-created %forward_list contains the exact contents of the
+   *  moved instance. The content of the moved instance is valid, but
+   *  unspecified %forward_list.
*/
-  forward_list(forward_list&& __list) 

C++ PATCH for c++/80562, ICE with C++17 constexpr if

2017-06-19 Thread Jason Merrill
We need to call instantiate_non_dependent_expr before
cxx_constant_value in a template.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 1645e51aeab6cea4e7206cb6a3520eaf383e47f6
Author: Jason Merrill 
Date:   Mon Jun 19 15:47:47 2017 -0400

PR c++/80562 - ICE with constexpr if.

* semantics.c (finish_if_stmt_cond): Call
instantiate_non_dependent_expr.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 5b5ec54..5fe772a 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -733,7 +733,10 @@ finish_if_stmt_cond (tree cond, tree if_stmt)
   if (IF_STMT_CONSTEXPR_P (if_stmt)
   && require_potential_rvalue_constant_expression (cond)
   && !value_dependent_expression_p (cond))
-cond = cxx_constant_value (cond, NULL_TREE);
+{
+  cond = instantiate_non_dependent_expr (cond);
+  cond = cxx_constant_value (cond, NULL_TREE);
+}
   finish_cond (_COND (if_stmt), cond);
   add_stmt (if_stmt);
   THEN_CLAUSE (if_stmt) = push_stmt_list ();
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-if12.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-if12.C
new file mode 100644
index 000..1ed2c30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-if12.C
@@ -0,0 +1,14 @@
+// PR c++/80562
+// { dg-options -std=c++1z }
+
+struct T {
+  constexpr auto foo() { return false; }
+};
+
+template 
+constexpr auto bf(T t) {
+if constexpr(t.foo()) {
+return false;
+}
+return true;
+}


C++ PATCH for c++/80829, ICE with constexpr copy of base

2017-06-19 Thread Jason Merrill
The constexpr code uses the CONSTRUCTOR_NO_IMPLICIT_ZERO flag to track
partially-initialized aggregates, but we were failing to clear it on
base subobjects.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 2e3142bcd6fde9f9ac22928718e55584a6255286
Author: Jason Merrill 
Date:   Mon Jun 19 15:15:14 2017 -0400

PR c++/80829 - ICE with constexpr copy of base subobject.

* constexpr.c (clear_no_implicit_zero): New.
(cxx_eval_call_expression): Call it.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 569a247..5a57452 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1394,6 +1394,21 @@ cxx_eval_internal_function (const constexpr_ctx *ctx, 
tree t,
   return t;
 }
 
+/* Clean CONSTRUCTOR_NO_IMPLICIT_ZERO from CTOR and its sub-aggregates.  */
+
+static void
+clear_no_implicit_zero (tree ctor)
+{
+  if (CONSTRUCTOR_NO_IMPLICIT_ZERO (ctor))
+{
+  CONSTRUCTOR_NO_IMPLICIT_ZERO (ctor) = false;
+  tree elt; unsigned HOST_WIDE_INT idx;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), idx, elt)
+   if (TREE_CODE (elt) == CONSTRUCTOR)
+ clear_no_implicit_zero (elt);
+}
+}
+
 /* Subroutine of cxx_eval_constant_expression.
Evaluate the call expression tree T in the context of OLD_CALL expression
evaluation.  */
@@ -1697,7 +1712,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 
   /* The result of a constexpr function must be completely initialized.  */
   if (TREE_CODE (result) == CONSTRUCTOR)
-CONSTRUCTOR_NO_IMPLICIT_ZERO (result) = false;
+clear_no_implicit_zero (result);
 
   pop_cx_call_context ();
   return unshare_constructor (result);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-base5.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-base5.C
new file mode 100644
index 000..84700bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-base5.C
@@ -0,0 +1,15 @@
+// PR c++/80829
+// { dg-do compile { target c++11 } }
+
+struct A {
+  constexpr A(int a) : _a(a) {}
+  int _a;
+};
+
+struct B : public A {
+  constexpr B(int a) : A(a) {}
+};
+
+int main() {
+  constexpr A a = B(10);
+}


Re: [PATCH 2/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Martin Sebor

On 06/19/2017 11:28 AM, Xi Ruoyao wrote:

On 2017-06-19 10:51 -0600, Martin Sebor wrote:

On 06/11/2017 07:32 PM, Xi Ruoyao wrote:

This patch adds warning option -Wstring-plus-int for C/C++.

gcc/ChangeLog:

2017-06-12  Xi Ruoyao  

* c-family/c.opt: New option -Wstring-plus-int.
* c-family/c-common.c (pointer_int_sum): Checking for
-Wstring-plus-int.


This is a very useful warning but I would suggest to word it
in terms of what it actually does rather than what it might be
intended to do.  E.g., for

   const char *p = "123" + 7;

issue

   warning: offset 7 exceeds the upper bound 3 of the array

rather than

   warning: adding 'int' to a string does not append to the string

(I have trouble envisioning on what grounds someone might expect
the addition to have this effect.)


How about something like `const char *p = "123" + getchar();` ?


I'm not sure I correctly understand the question (or whether
it's meant in response to my comment in parentheses) but let
me clarify what I meant.

In my view, the group of C++ (and certainly C) programmers who
might expect "123" + i to append the string representation of
the integer result to a string literal isn't significant enough
to focus the warning on.

Whether or not the addition is valid depends on the value of
the integer operand.  There are three sets of cases where the
addition is or may be invalid:

1) the integer operand is an out of bounds constant
2) the integer operand's non-constant value or the lower bound
   of its range is known to be out of bounds,
3) the lower bound of the operand's range is in bounds but
   the upper bound is out of bounds (as in the getchar example).

(1) can be handled with lexical analysis alone (as in you parch)
but it's prone to a high rate of false negatives.  (3) can also
be handled by lexical analysis alone but it's prone to a high
rate of false positives.  (2) has no false positives but some
false negatives.  It can only be detected with optimization.

With that in mind the warning would serve a greater purpose
by being aimed more broadly and describing the nature of the
error: forming an invalid pointer.  I believe it would best
be implemented analogously to or even integrated into
-Warray-bounds.  I.e., I suggest covering set (2) above.



I'd like this for -Wstring-plus-int=1:

warning: adding 'int' to a string does not append to the string
[-Wstring-plus-int=]
const char *p = "123" + 7;
  ^
note: offset 7 exceeds the size 4 of the string, using the result
may lead to undefined behaviour.


The addition itself is undefined, regardless of whether or not
the result is used.



(Clang permits "123" + 4 since its result is well defined in standard.
Maybe we could permit "123" + 3 only.)


"123" is an array of 4 elements, with "123" + 4 pointing just past
the last (NUL) element.  It's valid to form a pointer past the last
element of an array and warning about it would likely be viewed as
a false positive (certainly if it were an out-of-bounds type of
warning).

Martin


[PATCH][AArch64] Mark symbols as constant

2017-06-19 Thread Wilco Dijkstra
Aarch64_legitimate_constant_p currently returns false for symbols,
eventhough they are always valid constants.  This means LOSYM isn't
CSEd correctly.  If we return true CSE works better, resulting in
smaller/faster code (0.3% smaller code on SPEC2006).

int x0 = 1, x1 = 2, x2 = 3;

int 
f (int x, int y)
{
  x += x1;
  if (x > 100)
y += x2;
  x += x0;
  return x + y;
}

Before:
adrpx3, .LANCHOR0
add x4, x3, :lo12:.LANCHOR0
ldr w2, [x3, #:lo12:.LANCHOR0]
add w0, w0, w2
cmp w0, 100
ble .L5
ldr w2, [x4, 8]
add w1, w1, w2
.L5:
add x3, x3, :lo12:.LANCHOR0
ldr w2, [x3, 4]
add w0, w0, w2
add w0, w0, w1
ret

After:
adrpx2, .LANCHOR0
add x3, x2, :lo12:.LANCHOR0
ldr w2, [x2, #:lo12:.LANCHOR0]
add w0, w0, w2
cmp w0, 100
ble .L5
ldr w2, [x3, 8]
add w1, w1, w2
.L5:
ldr w2, [x3, 4]
add w0, w0, w2
add w0, w0, w1
ret

Passes regress and bootstrap, OK for commit?

ChangeLog:
2017-06-19  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
Return true for symbols.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
5ec6bbfcf484baa4005b8a88cb98d0d04f710877..4b7d961102e41ce927d89d458fc89ddfc2adcd6f
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10111,6 +10111,9 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x)
   && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
 return true;
 
+  if (SYMBOL_REF_P (x))
+return true;
+
   return aarch64_constant_address_p (x);
 }
 

[PR c++/81124] inline namespace checking

2017-06-19 Thread Nathan Sidwell
This fixes 81124, where we used ovl_iterate to iterate over a lookup 
result.  We should always use lkp_iterate in those circumstances.


However, regular lookup is not what we want here.  We don't want to 
follow using directives -- just look in the local inline hierarchy. 
Plus also ignore decls found by using declarations.


Finally, I could also fix 79766, by moving the excessive-qualification 
check to after finding the decl.  '::foo' is perfectly fine to discover 
foo in an inline child namespace.


nathan
--
Nathan Sidwell
2017-06-19  Nathan Sidwell  

	PR c++/81124
	PR c++/79766
	* name-lookup.c (set_decl_namespace): Don't follow using
	directives and ignore using decls.  Only check overly-explicit
	scope after discovering decl.

	* g++.dg/lookup/pr79766.C: New.
	* g++.dg/lookup/pr81124.C: New.
	* g++.dg/template/explicit6.C: Adjust.
	* g++.old-deja/g++.other/decl5.C: Adjust.

Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 249369)
+++ cp/name-lookup.c	(working copy)
@@ -4266,8 +4266,6 @@ set_global_binding (tree name, tree val)
 void
 set_decl_namespace (tree decl, tree scope, bool friendp)
 {
-  tree old;
-
   /* Get rid of namespace aliases.  */
   scope = ORIGINAL_NAMESPACE (scope);
 
@@ -4277,41 +4275,49 @@ set_decl_namespace (tree decl, tree scop
 	   decl, scope);
   DECL_CONTEXT (decl) = FROB_CONTEXT (scope);
 
-  /* Writing "int N::i" to declare a variable within "N" is invalid.  */
-  if (scope == current_namespace)
-{
-  if (at_namespace_scope_p ())
-	error ("explicit qualification in declaration of %qD",
-	   decl);
-  return;
-}
+  /* See whether this has been declared in the namespace or inline
+ children.  */
+  tree old = NULL_TREE;
+  {
+name_lookup lookup (DECL_NAME (decl), LOOKUP_HIDDEN);
+if (!lookup.search_qualified (scope, /*usings=*/false))
+  /* No old declaration at all.  */
+  goto not_found;
+old = lookup.value;
+  }
 
-  /* See whether this has been declared in the namespace.  */
-  old = lookup_qualified_name (scope, DECL_NAME (decl), /*type*/false,
-			   /*complain*/true, /*hidden*/true);
-  if (old == error_mark_node)
-/* No old declaration at all.  */
-goto complain;
   /* If it's a TREE_LIST, the result of the lookup was ambiguous.  */
   if (TREE_CODE (old) == TREE_LIST)
 {
+ambiguous:
+  DECL_CONTEXT (decl) = FROB_CONTEXT (scope);
   error ("reference to %qD is ambiguous", decl);
   print_candidates (old);
   return;
 }
-  if (!OVL_P (decl))
+
+  if (!DECL_DECLARES_FUNCTION_P (decl))
 {
-  /* We might have found OLD in an inline namespace inside SCOPE.  */
-  if (TREE_CODE (decl) == TREE_CODE (old))
-	DECL_CONTEXT (decl) = DECL_CONTEXT (old);
   /* Don't compare non-function decls with decls_match here, since
 	 it can't check for the correct constness at this
-	 point. pushdecl will find those errors later.  */
+	 point.  pushdecl will find those errors later.  */
+
+  /* We might have found it in an inline namespace child of SCOPE.  */
+  if (TREE_CODE (decl) == TREE_CODE (old))
+	DECL_CONTEXT (decl) = DECL_CONTEXT (old);
+
+found:
+  /* Writing "N::i" to declare something directly in "N" is invalid.  */
+  if (CP_DECL_CONTEXT (decl) == current_namespace
+	  && at_namespace_scope_p ())
+	error ("explicit qualification in declaration of %qD", decl);
   return;
 }
+
   /* Since decl is a function, old should contain a function decl.  */
   if (!OVL_P (old))
-goto complain;
+goto not_found;
+
   /* We handle these in check_explicit_instantiation_namespace.  */
   if (processing_explicit_instantiation)
 return;
@@ -4325,53 +4331,48 @@ set_decl_namespace (tree decl, tree scop
  friends in any namespace.  */
   if (friendp && DECL_USE_TEMPLATE (decl))
 return;
-  if (OVL_P (old))
+
+  tree found;
+  found = NULL_TREE;
+
+  for (lkp_iterator iter (old); iter; ++iter)
 {
-  tree found = NULL_TREE;
+  if (iter.using_p ())
+	continue;
 
-  for (ovl_iterator iter (old); iter; ++iter)
-	{
-	  tree ofn = *iter;
-	  /* Adjust DECL_CONTEXT first so decls_match will return true
-	 if DECL will match a declaration in an inline namespace.  */
-	  DECL_CONTEXT (decl) = DECL_CONTEXT (ofn);
-	  if (decls_match (decl, ofn))
-	{
-	  if (found && !decls_match (found, ofn))
-		{
-		  DECL_CONTEXT (decl) = FROB_CONTEXT (scope);
-		  error ("reference to %qD is ambiguous", decl);
-		  print_candidates (old);
-		  return;
-		}
-	  found = ofn;
-	}
-	}
-  if (found)
+  tree ofn = *iter;
+
+  /* Adjust DECL_CONTEXT first so decls_match will return true
+	 if DECL will match a declaration in an inline namespace.  */
+  DECL_CONTEXT (decl) = DECL_CONTEXT (ofn);
+  if (decls_match (decl, ofn))
 	{
-	  if (!is_nested_namespace (scope, CP_DECL_CONTEXT (found), true))
-	goto complain;
-	  if 

Re: [PATCH 2/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Xi Ruoyao
On 2017-06-19 12:44 -0600, Martin Sebor wrote:
> On 06/19/2017 11:28 AM, Xi Ruoyao wrote:
> > On 2017-06-19 10:51 -0600, Martin Sebor wrote:
> > > On 06/11/2017 07:32 PM, Xi Ruoyao wrote:
> > > > This patch adds warning option -Wstring-plus-int for C/C++.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > 2017-06-12  Xi Ruoyao  
> > > > 
> > > > * c-family/c.opt: New option -Wstring-plus-int.
> > > > * c-family/c-common.c (pointer_int_sum): Checking for
> > > > -Wstring-plus-int.
> > > 
> > > This is a very useful warning but I would suggest to word it
> > > in terms of what it actually does rather than what it might be
> > > intended to do.  E.g., for
> > > 
> > >    const char *p = "123" + 7;
> > > 
> > > issue
> > > 
> > >    warning: offset 7 exceeds the upper bound 3 of the array
> > > 
> > > rather than
> > > 
> > >    warning: adding 'int' to a string does not append to the string
> > > 
> > > (I have trouble envisioning on what grounds someone might expect
> > > the addition to have this effect.)
> > 
> > How about something like `const char *p = "123" + getchar();` ?
> 
> I'm not sure I correctly understand the question (or whether
> it's meant in response to my comment in parentheses) but let
> me clarify what I meant.
> 
> In my view, the group of C++ (and certainly C) programmers who
> might expect "123" + i to append the string representation of
> the integer result to a string literal isn't significant enough
> to focus the warning on.
> 
> Whether or not the addition is valid depends on the value of
> the integer operand.  There are three sets of cases where the
> addition is or may be invalid:
> 
> 1) the integer operand is an out of bounds constant
> 2) the integer operand's non-constant value or the lower bound
> of its range is known to be out of bounds,
> 3) the lower bound of the operand's range is in bounds but
> the upper bound is out of bounds (as in the getchar example).
> 
> (1) can be handled with lexical analysis alone (as in you parch)
> but it's prone to a high rate of false negatives.  (3) can also
> be handled by lexical analysis alone but it's prone to a high
> rate of false positives.  (2) has no false positives but some
> false negatives.  It can only be detected with optimization.
> 
> With that in mind the warning would serve a greater purpose
> by being aimed more broadly and describing the nature of the
> error: forming an invalid pointer.  I believe it would best
> be implemented analogously to or even integrated into
> -Warray-bounds.  I.e., I suggest covering set (2) above.

Now I think I've been cheat by GCC wiki, which states PR 62181
is an "easy-hack" :)

I'll try to improve -Warray-bounds.

> > 
> > I'd like this for -Wstring-plus-int=1:
> > 
> > warning: adding 'int' to a string does not append to the string
> > [-Wstring-plus-int=]
> > const char *p = "123" + 7;
> >   ^
> > note: offset 7 exceeds the size 4 of the string, using the result
> > may lead to undefined behaviour.
> 
> The addition itself is undefined, regardless of whether or not
> the result is used.
> 
> > 
> > (Clang permits "123" + 4 since its result is well defined in standard.
> > Maybe we could permit "123" + 3 only.)
> 
> "123" is an array of 4 elements, with "123" + 4 pointing just past
> the last (NUL) element.  It's valid to form a pointer past the last
> element of an array and warning about it would likely be viewed as
> a false positive (certainly if it were an out-of-bounds type of
> warning).
> 
> Martin
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jakub Jelinek
On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote:
> After much poking around I concluded that we really need to implement
> allocation and probing via a "moving sp" strategy.   Probing into
> unallocated areas runs afoul of valgrind, so that's a non-starter.
> 
> Allocating stack space, then probing the pages within the space is
> vulnerable to async signal delivery between the allocation point and the
> probe point.  If that occurs the signal handler could end up running on
> a stack that has collided with the heap.
> 
> Ideally we would allocate and probe a page as an atomic unit (which is
> feasible on PPC).  Alternatively, due to ISA restrictions, allocate a
> page, then probe the page as distinct instructions.  The latter still
> has a race, but we'd have to take the async signal in a single
> instruction window.

And if the allocation is only a page at a time, the single insn race window
can be mitigated in the kernel (probe (read-only is fine) the word at the
stack when setting up a signal frame for async signal).

> So, time to open the discussion to questions & comments.
> 
> I've got patches I need to cleanup and post for comments that implement
> this for x86, ppc, aarch64 and s390.  x86 and ppc are IMHO in good
> shape.  THere's an unhandled case for s390.  I've got evaluation still
> to do on aarch64.

In the patches Jeff is going to post, we have (at least for
-fasynchronous-unwind-tables which is on by default on e.g. x86)
precise unwind info even with the new stack check mode.
ira.c currently has:
   /* We need the frame pointer to catch stack overflow exceptions if
  the stack pointer is moving (as for the alloca case just above).  */
   || (STACK_CHECK_MOVING_SP
   && flag_stack_check
   && flag_exceptions
   && cfun->can_throw_non_call_exceptions)
For alloca we have a frame pointer for other reasons, the question is
if we really need this hunk even if we provided proper unwind info
even for the Ada -fstack-check mode.  Or, if we provide proper unwind info
for -fasynchronous-unwind-tables, if the above could not be also
&& !flag_asynchronous_unwind_tables.  Eric, what exactly is the reason
for the above, is it just lack of proper CFI notes, or something different?

Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
or movl $0, (%esp) ?

Jakub


Re: [PATCH 3/6] New warnings -Wstring-plus-{char, int} (PR c++/62181)

2017-06-19 Thread Xi Ruoyao
On 2017-06-19 10:30 -0600, Martin Sebor wrote:
> On 06/11/2017 07:34 PM, Xi Ruoyao wrote:
> > This patch adds warning option -Wstring-plus-char for C/C++.
> > 
> 
> +void
> +warn_if_string_plus_char (location_t loc, tree ptrtype, tree inttype)
> +{
> +  if (POINTER_TYPE_P (ptrtype)
> +  && type_main_variant_is_char (TREE_TYPE (ptrtype))
> +  && type_main_variant_is_char (inttype))
> +warning_at (loc, OPT_Wstring_plus_char,
> +"add %qT to string pointer %qT does not append "
> +"to the string", inttype, ptrtype);
> 
> The text of the warning doesn't read like a grammatically correct
> sentence.  ("Adding a to b" would be correct.)

Yes.  It's a typo.

> That said, I wonder if it should also be made more accurate.
> Based on c-c++-common/Wstring-plus-char.c for the snippet below
> 
>    char *a;
>    const char *b;
>    const char c = 'c';
>    const char *d = a + c;
> 
> it will print
> 
>    warning: add 'char' to 'char *' does not append to the string
> 
> even though no string is apparent or need to exist in the program
> (a could point to an array of chars with no terminating NUL).
> I see Clang prints something similar (modulo the bad grammar) but
> I think it might be clearer if the warning instead read something
> like:
> 
>    adding 'char' to 'char *' does not append to a string
> 
> or (if the warning were to trigger only for character constants
> like in Clang):
> 
>    adding 'char' to 'char *' does not append 'c' to the first operand

Clang 4.0 only warns for character constants.  But Clang 5.0
(pre-release) also warns for variables with type char.
Which option should we take?  Maybe -Wstring-plus-char={1,2} ?

> i.e., if the warning also included the value of the character
> constant.
> 
> Martin
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)

2017-06-19 Thread Jakub Jelinek
Hi!

The following patch adds -fsanitize=pointer-overflow support,
which adds instrumentation (included in -fsanitize=undefined) that checks
that pointer arithmetics doesn't wrap.  If the offset on ptr p+ off when 
treating
it as signed value is non-negative, we check whether the result is bigger
(uintptr_t comparison) than ptr, if it is negative in ssizetype, we check
whether the result is smaller than ptr, otherwise we check at runtime
whether (ssizetype) off < 0 and do the check based on that.
The patch checks both POINTER_PLUS_EXPR, as well as e.g. ADDR_EXPR of
handled components, and even handled components themselves (exception
is for constant offset when the base is an automatic non-VLA decl or
decl that binds to current function where we can at compile time for
sure guarantee it will fit).

Martin has said he'll write the sanopt part of optimization
(if UBSAN_PTR for some pointer is dominated by UBSAN_PTR for the same
pointer and the offset is constant in both cases and equal or absolute value
bigger and same sign in the dominating UBSAN_PTR, we can avoid the dominated
check).  

For the cases where there is a dereference (i.e. not ADDR_EXPR of the
handled component or POINTER_PLUS_EXPR), I wonder if we couldn't ignore
say constant offsets in range <-4096, 4096> or something similar, hoping
people don't have anything mapped at the page 0 and -pagesize in hosted
env.  Thoughts on that?

I've bootstrapped/regtested the patch on x86_64-linux and i686-linux
and additionally bootstrapped/regtested with bootstrap-ubsan on both too.
The latter revealed a couple of issues I'd like to discuss:

1) libcpp/symtab.c contains a couple of spots reduced into:
#define DELETED ((char *) -1)
void bar (char *);
void
foo (char *p)
{
  if (p && p != DELETED)
bar (p);
}
where we fold it early into if ((p p+ -1) <= (char *) -3)
and as the instrumentation is done during ubsan pass, if p is NULL,
we diagnose this as invalid pointer overflow from NULL to 0x*f.
Shall we change the folder so that during GENERIC folding it
actually does the addition and comparison in pointer_sized_int
instead (my preference), or shall I move the UBSAN_PTR instrumentation
earlier into the FEs (but then I still risk stuff is folded earlier)?

2) libcpp/line-map.c has this:
static int
location_adhoc_data_update (void **slot, void *data)
{
  *((char **) slot) += *((int64_t *) data);
  return 1;
}
where the (why int64_t always?, we really need just intptr_t) adjusts
one pointer from an unrelated one (result of realloc).  That is a UB
and actually can trigger this sanitization if the two regions are
far away from each other, e.g. on i686-linux:
../../libcpp/line-map.c:102:21: runtime error: pointer index expression with 
base 0x0899e308 overflowed to 0xf74c4ab8
../../libcpp/line-map.c:102:21: runtime error: pointer index expression with 
base 0x08add7c0 overflowed to 0xf74c9a08
../../libcpp/line-map.c:102:21: runtime error: pointer index expression with 
base 0x092ba308 overflowed to 0xf741cab8
../../libcpp/line-map.c:102:21: runtime error: pointer index expression with 
base 0x0a3757c0 overflowed to 0xf7453a08
Shall we perform the addition in uintptr_t instead to make it
implementation defined rather than UB?

3) not really related to this patch, but something I also saw during the
bootstrap-ubsan on i686-linux:
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 
- 2147475412 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 
- 2147478324 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 
- 2147451580 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 
- 2147465664 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147469348 
- 2147451544 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147482364 
- 2147475376 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483624 
- 2147475376 cannot be represented in type 'int'
../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483628 
- 2147451544 cannot be represented in type 'int'
../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: 
-2147426384 - 2147475376 cannot be represented in type 'int'
../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: 
-2147450216 - 2147451544 cannot be represented in type 'int'
The problem here is that we lower pointer subtraction, e.g.
long foo (char *p, char *q) { return q - p; }
as return (ptrdiff_t) ((ssizetype) q - (ssizetype) p);
and even for a valid testcase where we have an array across
the middle of the virtual address space, say the first one above
is (char *) 0x8000dfb0 - (char *) 0x7fffdfd4 subtraction, even if
there 

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 11:50 AM, Joseph Myers wrote:
> On Mon, 19 Jun 2017, Jeff Law wrote:
> 
>> A key point to remember is that you can never have an allocation
>> (potentially using more than one allocation site) which is larger than a
>> page without probing the page.
> 
> There's a platform ABI issue here.  At least some kernel fixes for these 
> stack issues, as I understand it, increase the size of the stack guard to 
> more than a single page.  It would be possible to define the ABI to 
> require such a larger guard for protection and so reduce the number of 
> (non-alloca/VLA-using) functions that need probes generated, depending on 
> whether a goal is to achieve security on kernels without such a fix.  
> (Thinking in terms of how to get to enabling such probes by default.)
On 32 bit platforms we don't have a lot of address space left, so we
have to be careful about creating too large of a guard.

On 64 bit platforms we have a lot more freedom and I suspect larger
guards, mandated by the ABI would be useful, if for no other reason than
allowing us to allocate more stack without probing.   A simple array of
PATH_MAX characters triggers probing right now.   I suspect (but didn't
bother to confirm) that PATH_MAX array are what causes git to have so
many large stacks.

Also if we look at something like ppc and aarch64, we've currently got
the PROBE_INTERVAL set to 4k.  But in reality they're using much larger
page sizes.  So we could improve things there as well.


jeff



Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jakub Jelinek
On Mon, Jun 19, 2017 at 01:04:57PM -0600, Jeff Law wrote:
> On 06/19/2017 11:50 AM, Joseph Myers wrote:
> > On Mon, 19 Jun 2017, Jeff Law wrote:
> > 
> >> A key point to remember is that you can never have an allocation
> >> (potentially using more than one allocation site) which is larger than a
> >> page without probing the page.
> > 
> > There's a platform ABI issue here.  At least some kernel fixes for these 
> > stack issues, as I understand it, increase the size of the stack guard to 
> > more than a single page.  It would be possible to define the ABI to 
> > require such a larger guard for protection and so reduce the number of 
> > (non-alloca/VLA-using) functions that need probes generated, depending on 
> > whether a goal is to achieve security on kernels without such a fix.  
> > (Thinking in terms of how to get to enabling such probes by default.)
> On 32 bit platforms we don't have a lot of address space left, so we
> have to be careful about creating too large of a guard.
> 
> On 64 bit platforms we have a lot more freedom and I suspect larger
> guards, mandated by the ABI would be useful, if for no other reason than
> allowing us to allocate more stack without probing.   A simple array of
> PATH_MAX characters triggers probing right now.   I suspect (but didn't
> bother to confirm) that PATH_MAX array are what causes git to have so
> many large stacks.
> 
> Also if we look at something like ppc and aarch64, we've currently got
> the PROBE_INTERVAL set to 4k.  But in reality they're using much larger
> page sizes.  So we could improve things there as well.

ppc can use 4K, 16K, 64K or 256K pages, aarch64 4K, 16K or 64K.
So, unless the ABI (or some ABI extension for Linux) says that the guard
page is at least 16K or 64K on these arches (and unless glibc changes the
default pthread_attr_getguardsize - currently defaults everywhere to 1
page), you can't rely on more than 4K there.

Jakub


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Joseph Myers
On Mon, 19 Jun 2017, Florian Weimer wrote:

> I think architectures such as aarch64 without implied stack probing as
> part of the function call sequence would benefit most from an ABI
> agreement (splitting the probing responsibility in some way between
> caller and callee).  For architectures with some form of implied

I'd expect that, regardless of architecture, if calls don't write to the 
stack, the caller has to save its own return address somewhere before 
making a call, which means writing the saved link register.  Is the 
problem case something like: the caller allocates stack space 
unconditionally, without writing to it, and then a particular case in the 
caller calls what it believes to be a noreturn function, or a function 
that it knows won't return in that particular case, so doesn't need to 
save the return address (although not saving return addresses when calling 
noreturn functions is problematic in practice when you want to backtrace 
from abort), so makes a call without ever having written anything to the 
stack (and then you chain many such calls to do large stack allocations, 
never writing to the stack, with each individual allocation being small)?  
Or is the concern simply that the caller might have been compiled without 
stack checking and you don't know *where* it wrote to the stack, even 
given that it must have saved its return address somewhere?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 12:15 PM, Florian Weimer wrote:
> On 06/19/2017 08:02 PM, Richard Biener wrote:
>> Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 
>> would solve the signal atomicity as well. Might be larger and somewhat 
>> interfere with CPUs stack engine.  Who knows...
> 
> On x86-64, PUSH REG is just a single byte, so for sequences that have to
> move SP and probe, it's the shortest possible sequence AFAIK.  NEG/NOT
> can take an offsettable memory operand, but it's three bytes.
Right.  I think we want guidance from Honza & Uros on what the most
runtime efficient mechanisms are (or are likely to be, there's a certain
amount of guesswork that has to happen here), then we look at which are
the most code space efficient.  I'm personally willing to trade off some
unwinder table space if it gives us more compact code.

Jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Richard Kenner
> > Out of curiousity, does the old Alpha/VMS stack-checking API meet the
> > requirements?  From what I recall, I think it does.
> Unsure.  Is this documented somewhere?

It seems to be in

   http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04621389

starting at page 3-54.


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 12:02 PM, Richard Biener wrote:
> On June 19, 2017 8:00:19 PM GMT+02:00, Richard Biener 
>  wrote:
>> On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek 
>> wrote:
>>> On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote:
 After much poking around I concluded that we really need to
>> implement
 allocation and probing via a "moving sp" strategy.   Probing into
 unallocated areas runs afoul of valgrind, so that's a non-starter.

 Allocating stack space, then probing the pages within the space is
 vulnerable to async signal delivery between the allocation point and
>>> the
 probe point.  If that occurs the signal handler could end up running
>>> on
 a stack that has collided with the heap.

 Ideally we would allocate and probe a page as an atomic unit (which
>>> is
 feasible on PPC).  Alternatively, due to ISA restrictions, allocate
>> a
 page, then probe the page as distinct instructions.  The latter
>> still
 has a race, but we'd have to take the async signal in a single
 instruction window.
>>>
>>> And if the allocation is only a page at a time, the single insn race
>>> window
>>> can be mitigated in the kernel (probe (read-only is fine) the word at
>>> the
>>> stack when setting up a signal frame for async signal).
>>>
 So, time to open the discussion to questions & comments.

 I've got patches I need to cleanup and post for comments that
>>> implement
 this for x86, ppc, aarch64 and s390.  x86 and ppc are IMHO in good
 shape.  THere's an unhandled case for s390.  I've got evaluation
>>> still
 to do on aarch64.
>>>
>>> In the patches Jeff is going to post, we have (at least for
>>> -fasynchronous-unwind-tables which is on by default on e.g. x86)
>>> precise unwind info even with the new stack check mode.
>>> ira.c currently has:
>>> /* We need the frame pointer to catch stack overflow exceptions
>> if
>>>   the stack pointer is moving (as for the alloca case just above). 
>> */
>>>   || (STACK_CHECK_MOVING_SP
>>>   && flag_stack_check
>>>   && flag_exceptions
>>>   && cfun->can_throw_non_call_exceptions)
>>> For alloca we have a frame pointer for other reasons, the question is
>>> if we really need this hunk even if we provided proper unwind info
>>> even for the Ada -fstack-check mode.  Or, if we provide proper unwind
>>> info
>>> for -fasynchronous-unwind-tables, if the above could not be also
>>> && !flag_asynchronous_unwind_tables.  Eric, what exactly is the reason
>>> for the above, is it just lack of proper CFI notes, or something
>>> different?
>>>
>>> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>> while it is shorter, is it actually faster or as slow as movq $0,
>>> (%rsp)
>>> or movl $0, (%esp) ?
>>
>> It at least has the chance of bypassing all of the store queue in CPUs
>> and thus cause no cacheline allocation or trigger prefetching.
>>
>> Not sure if any of that is done though.
>>
>> Performance counters might tell.
>>
>> Otherwise incrementing SP by 4095 and then pushing al would work as
>> well (and be similarly short as the or).
> 
> Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 
> would solve the signal atomicity as well. Might be larger and somewhat 
> interfere with CPUs stack engine.  Who knows...
Happy to rely on Honza or Uros for guidance on that.  Though we do have
to maintain proper stack alignment, right?

jeff


Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift

2017-06-19 Thread Andrew Pinski
On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh
 wrote:
> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote:
>> Hi,
>>   Currently for the following function:
>> int f(int a, int b)
>> {
>>   return a + (b <<7);
>> }
>>
>> GCC produces:
>> add w0, w0, w1, lsl 7
>> But for ThunderX 1, it is better if the instruction was split allowing
>> better scheduling to happen in most cases, the latency is the same.  I
>> get a small improvement in coremarks, ~1%.
>>
>> Currently the code does not take into account Arith_shift even though
>> the comment:
>>   /* Strip any extend, leave shifts behind as we will
>> cost them through mult_cost.  */
>> Say it does not strip out the shift, aarch64_strip_extend does and has
>> always has since the back-end was added to GCC.
>>
>> Once I fixed the code around aarch64_strip_extend, I got a regression
>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
>> extends) are considered free so I needed to add a new tuning flag.
>>
>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
>> have not measured it yet as I have not made the change to
>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
>> patch first before submitting any of the cost table changes.  Also I
>> noticed this problem with this tuning first and then looked back at
>> what I needed to do for ThunderX 1.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu without any
>> regressions (both with and without --with-cpu=thunderx).
>
> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap
> or free seems better. I have some other minor points below.


Ok, that seems like a good idea.  I used easy since that was the
wording our hardware folks had came up with.  I am changing the
comments to make clearer when this flag should be used.
I should a new patch out by the end of today.

Thanks,
Andrew


>
>> Index: config/aarch64/aarch64-tuning-flags.def
>> ===
>> --- config/aarch64/aarch64-tuning-flags.def   (revision 243974)
>> +++ config/aarch64/aarch64-tuning-flags.def   (working copy)
>> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte
>>  pairs.   */
>>  AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW)
>>
>> +/* Logical shift left <=4 with/without zero extend are considered easy
>> +   extended, also zero extends without the shift. */
>
>
> I'm struggling to parse this comment. "also zero extends without the shift"
> is what is getting me. I'm also not certain I follow when I should set this
> flag. If all shifts are cheap/free on my platform, should I set this flag?
>
>> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND)
>> +
>>  #undef AARCH64_EXTRA_TUNING_OPTION
>
>
>> +
>> +/* Return true iff X is an easy shift without a sign extend. */
>> +
>
> Again I don't like calling <= 4 "easy", it feels imprecise.
>
> Thanks,
> James
>


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 01:45 PM, Jakub Jelinek wrote:
> On Mon, Jun 19, 2017 at 01:04:57PM -0600, Jeff Law wrote:
>> On 06/19/2017 11:50 AM, Joseph Myers wrote:
>>> On Mon, 19 Jun 2017, Jeff Law wrote:
>>>
 A key point to remember is that you can never have an allocation
 (potentially using more than one allocation site) which is larger than a
 page without probing the page.
>>>
>>> There's a platform ABI issue here.  At least some kernel fixes for these 
>>> stack issues, as I understand it, increase the size of the stack guard to 
>>> more than a single page.  It would be possible to define the ABI to 
>>> require such a larger guard for protection and so reduce the number of 
>>> (non-alloca/VLA-using) functions that need probes generated, depending on 
>>> whether a goal is to achieve security on kernels without such a fix.  
>>> (Thinking in terms of how to get to enabling such probes by default.)
>> On 32 bit platforms we don't have a lot of address space left, so we
>> have to be careful about creating too large of a guard.
>>
>> On 64 bit platforms we have a lot more freedom and I suspect larger
>> guards, mandated by the ABI would be useful, if for no other reason than
>> allowing us to allocate more stack without probing.   A simple array of
>> PATH_MAX characters triggers probing right now.   I suspect (but didn't
>> bother to confirm) that PATH_MAX array are what causes git to have so
>> many large stacks.
>>
>> Also if we look at something like ppc and aarch64, we've currently got
>> the PROBE_INTERVAL set to 4k.  But in reality they're using much larger
>> page sizes.  So we could improve things there as well.
> 
> ppc can use 4K, 16K, 64K or 256K pages, aarch64 4K, 16K or 64K.
> So, unless the ABI (or some ABI extension for Linux) says that the guard
> page is at least 16K or 64K on these arches (and unless glibc changes the
> default pthread_attr_getguardsize - currently defaults everywhere to 1
> page), you can't rely on more than 4K there.While the hardware can use the 
> smaller pages ISTM that we can (and
probably should) be clearer in the ABI.  The current pagesize exported
by the kernel on those targets is 16/32k IIRC.

jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 11:51 AM, Jakub Jelinek wrote:
> On Mon, Jun 19, 2017 at 11:45:13AM -0600, Jeff Law wrote:
>> On 06/19/2017 11:29 AM, Jakub Jelinek wrote:
>>>
>>> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>> while it is shorter, is it actually faster or as slow as movq $0, (%rsp)
>>> or movl $0, (%esp) ?
>> Florian raised this privately to me as well.  THere's a couple issues.
>>
>> 1. Is there a performance penalty/gain for sub-word operations?  If not,
>>we can improve things slighly there.  Even if it's performance
>>neutral we can probably do better on code size.
> 
> CCing Uros and Honza here, I believe there are at least on x86 penalties
> for 2-byte, maybe for 1-byte and then sometimes some stalls when you
> write or read in a different size from a recent write or read.
Obviously, I'll go with whatever Honza & Uros say is the most efficient.
 This stuff would be highly localized and is easily tweaked into
whatever final form we want.

> 
>> Thus I find myself rethinking is this a probing policy option or should
>> it just be another variant of -fstack-check=.
> 
> Yeah, IMHO it is just another way of stack probing next to generic and
> specific, and for users it would be easier to write -fstack-check=whatever
> than -fstack-check -fstack-check-probe=whatever
That's essentially where I'm leaning now.  The difficulty is in
selecting a name.  ISTM that -fstack-check=specific becomes horribly bad
though.  It should really be -fstack-check=ada or somesuch.

jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 03:56 PM, Joseph Myers wrote:
> On Mon, 19 Jun 2017, Florian Weimer wrote:
> 
>> I think architectures such as aarch64 without implied stack probing as
>> part of the function call sequence would benefit most from an ABI
>> agreement (splitting the probing responsibility in some way between
>> caller and callee).  For architectures with some form of implied
> 
> I'd expect that, regardless of architecture, if calls don't write to the 
> stack, the caller has to save its own return address somewhere before 
> making a call, which means writing the saved link register. 
True, but the callee doesn't know the offset where the caller saved the
return address.  In fact, different callers could have stored it at
different offsets.  AFAICT for these targets we just have to make a
worst case assumption about the caller.



 Is the
> problem case something like: the caller allocates stack space 
> unconditionally, without writing to it, and then a particular case in the 
> caller calls what it believes to be a noreturn function, or a function 
> that it knows won't return in that particular case, so doesn't need to 
> save the return address (although not saving return addresses when calling 
> noreturn functions is problematic in practice when you want to backtrace 
> from abort), so makes a call without ever having written anything to the 
> stack (and then you chain many such calls to do large stack allocations, 
> never writing to the stack, with each individual allocation being small)?  
Noreturn functions are a bit special.  In the interest of safety my
patches do two things.

1. Callee always probes *(sp+small offset).  This avoids problems if the
caller allocated spaced, but turned the call into a jump because it knew
the callee was no-return and thus it didn't need to tear down the
caller's frame.  GCC doesn't do this optimization anyway, but better
safe than sorry.

2. GCC will explicitly refuse to optimize a call to a noreturn function
into a jump.





> Or is the concern simply that the caller might have been compiled without 
> stack checking and you don't know *where* it wrote to the stack, even 
> given that it must have saved its return address somewhere?
Right.  In a mixed environment, you don't know if the caller was
compiled with -fstack-check or not.  So unless the architecture does
something useful (stores the return pointer on the stack) or ABI
mandates something useful (*sp always contains outer frame), then you
have to make worst case assumptions.


Jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Jeff Law
On 06/19/2017 12:12 PM, Richard Kenner wrote:
> Out of curiousity, does the old Alpha/VMS stack-checking API meet the
> requirements?  From what I recall, I think it does.
Unsure.  Is this documented somewhere?

jeff


Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-19 Thread Florian Weimer
On 06/20/2017 12:05 AM, Jeff Law wrote:
> On 06/19/2017 03:56 PM, Joseph Myers wrote:
>> On Mon, 19 Jun 2017, Florian Weimer wrote:
>>
>>> I think architectures such as aarch64 without implied stack probing as
>>> part of the function call sequence would benefit most from an ABI
>>> agreement (splitting the probing responsibility in some way between
>>> caller and callee).  For architectures with some form of implied
>>
>> I'd expect that, regardless of architecture, if calls don't write to the 
>> stack, the caller has to save its own return address somewhere before 
>> making a call, which means writing the saved link register.

> True, but the callee doesn't know the offset where the caller saved the
> return address.  In fact, different callers could have stored it at
> different offsets.  AFAICT for these targets we just have to make a
> worst case assumption about the caller.

There are also some weird corner cases like this one:

H. Baker, “CONS Should Not CONS Its Arguments, Part II: Cheney on the
M.T.A.” .

So I think some sort of convention is needed here.

Thanks,
Florian


Re: [PATCH][AArch64] Mark symbols as constant

2017-06-19 Thread Richard Earnshaw
On 19/06/17 19:59, Wilco Dijkstra wrote:
> Aarch64_legitimate_constant_p currently returns false for symbols,
> eventhough they are always valid constants.  This means LOSYM isn't
> CSEd correctly.  If we return true CSE works better, resulting in
> smaller/faster code (0.3% smaller code on SPEC2006).
> 
> int x0 = 1, x1 = 2, x2 = 3;
> 
> int 
> f (int x, int y)
> {
>   x += x1;
>   if (x > 100)
> y += x2;
>   x += x0;
>   return x + y;
> }
> 
> Before:
>   adrpx3, .LANCHOR0
>   add x4, x3, :lo12:.LANCHOR0
>   ldr w2, [x3, #:lo12:.LANCHOR0]
>   add w0, w0, w2
>   cmp w0, 100
>   ble .L5
>   ldr w2, [x4, 8]
>   add w1, w1, w2
> .L5:
>   add x3, x3, :lo12:.LANCHOR0
>   ldr w2, [x3, 4]
>   add w0, w0, w2
>   add w0, w0, w1
>   ret
> 
> After:
>   adrpx2, .LANCHOR0
>   add x3, x2, :lo12:.LANCHOR0
>   ldr w2, [x2, #:lo12:.LANCHOR0]
>   add w0, w0, w2
>   cmp w0, 100
>   ble .L5
>   ldr w2, [x3, 8]
>   add w1, w1, w2
> .L5:
>   ldr w2, [x3, 4]
>   add w0, w0, w2
>   add w0, w0, w1
>   ret
> 
> Passes regress and bootstrap, OK for commit?
> 
> ChangeLog:
> 2017-06-19  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
>   Return true for symbols.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 5ec6bbfcf484baa4005b8a88cb98d0d04f710877..4b7d961102e41ce927d89d458fc89ddfc2adcd6f
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10111,6 +10111,9 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx 
> x)
>&& aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
>  return true;
>  
> +  if (SYMBOL_REF_P (x))
> +return true;
> +
>return aarch64_constant_address_p (x);
>  }
>  
> 

What testing has this had with -fpic?  I'm not convinced that this
assertion is true in that case?

R.


Re: [PATCH, rs6000] Fix vec_mulo and vec_mule instruction generation

2017-06-19 Thread Segher Boessenkool
Hi Carl,

On Fri, Jun 16, 2017 at 02:19:05PM -0700, Carl Love wrote:
> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add

Indent is broken on this line.

>   ALTIVEC_BUILTIN_VMULESW, ALTIVEC_BUILTIN_VMULEUW,
>   ALTIVEC_BUILTIN_VMULOSW, ALTIVEC_BUILTIN_VMULOUW enties.

Typo ("entries").

>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin(),
>   builtin_function_type()): Add needed ALTIVEC_BUILTIN_* case
>   statements.

No () please, just the names.

>   * config/rs6000/altivec.md (define_c_enum "unspec",
>   define_expand "vec_widen_umult_even_v4si",
>   define_expand "vec_widen_smult_even_v4si",
>   define_expand "vec_widen_umult_odd_v4si",
>   define_expand "vec_widen_smult_odd_v4si",
>   define_insn "altivec_vmuleuw", define_insn "altivec_vmulesw",
>   define_insn "altivec_vmulouw",  define_insn "altivec_vmulosw"): Add
>   support to generate vmuleuw, vmulesw, vmulouw, vmulosw instructions.

(UNSPEC_VMULEUW, UNSPEC_VMULESW, UNSPEC_VMULOUW, UNSPEC_VMULOSW):
New enum "unspec" values.
(vec_widen_umult_even_v4si, vec_widen_smult_even_v4si,
vec_widen_umult_odd_v4si, vec_widen_smult_odd_v4si, altivec_vmuleuw,
altivec_vmulesw, altivec_vmulouw, altivec_vmulosw): New patterns.

(Or similar.  Mention all new names added.  Usually for new things, just
saying "New." or "New frobnitz." is enough; the changelog does not describe
the design, or why you added something: it says just what changed).

> +(define_expand "vec_widen_umult_even_v4si"
> +  [(use (match_operand:V2DI 0 "register_operand" ""))
> +   (use (match_operand:V4SI 1 "register_operand" ""))
> +   (use (match_operand:V4SI 2 "register_operand" ""))]

You can leave off the default (empty) constraint strings, in expanders.

> +  "TARGET_ALTIVEC"
> +{
> +  if (VECTOR_ELT_ORDER_BIG)
> +emit_insn (gen_altivec_vmuleuw (operands[0], operands[1],
> operands[2]));

Your patch was wrapped in the mail.  Please try to prevent that, so that
other people can apply the patch to test out, etc.

> +(define_insn "altivec_vmuleuw"
> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
> +(unspec:V2DI [(match_operand:V4SI 1 "register_operand" "v")
> +  (match_operand:V4SI 2 "register_operand" "v")]
> +  UNSPEC_VMULEUW))]

The one-but-last line should be indented with tabs instead of spaces, too.
(This happens a few time more later in the patch; please check it all).

Okay for trunk with those last nits fixed.  Thanks!


Segher


Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-19 Thread Segher Boessenkool
On Fri, Jun 16, 2017 at 05:55:35PM -0400, Michael Meissner wrote:
> Here is the latest patch that restricts the optimization to 64-bit (due to
> needing VSX small integers).  I've done a full bootstrap/make check on a 
> little
> endian power8 system, and a build without bootstrap and make check on a little
> endian power9 system.  Neither the power8 nor the power9 systems had any
> regressions.  I'm also running a test on a big endian power7 system for
> completeness.
> 
> Assuming the power7 test finishes without any regressions, can I check this
> patch into the trunk and later the GCC 7 branch.
> 
> The main change was to restrict the optimization to 64-bit PowerPC that have
> VSX small integer support turned on (default for 64-bit).  I did shorten the
> one line in the testsuite that you mentioned.

Okay for both.  Thanks!


Segher


> 2017-06-16  Michael Meissner  
> 
>   PR target/79799
>   * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
>   for doing vector set of SFmode on ISA 3.0.
>   * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
>   (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
>   element.
>   (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
>   SFmode value into a V4SF variable that was extracted from another
>   V4SF variable without converting the element to double precision
>   and back to single precision vector format.
>   (vsx_insert_extract_v4sf_p9_2): Likewise.
> 
> [gcc/testsuite]
> 2017-06-16  Michael Meissner  
> 
>   PR target/79799
>   * gcc.target/powerpc/pr79799-1.c: New test.
>   * gcc.target/powerpc/pr79799-2.c: Likewise.
>   * gcc.target/powerpc/pr79799-3.c: Likewise.
>   * gcc.target/powerpc/pr79799-4.c: Likewise.
>   * gcc.target/powerpc/pr79799-5.c: Likewise.


Re: [PATCH rs6000] Fix for commit 249311

2017-06-19 Thread Segher Boessenkool
On Fri, Jun 16, 2017 at 09:08:50PM -0700, Carl Love wrote:
> Commit r249311 had an error.  During the patch review the define expand
> for VFC_inst was changed to VF_sxddp.  I compiled and tested the source
> after making the change and it seemed fine.  However, I missed a couple
> of changes.  It seems that since I didn't remove all the binaries before
> recompiling the build tree still had the old definition in it.

Either we have some missing dependencies then, or (more likely) something
in your workflow didn't set modification dates correctly.

> I also
> found I had to move the VF_sxddp definition back to the file where it is
> used.  Need to make sure I do a clean build just to be sure before
> committing things.
> 
> I found the issue after pulling down a fresh tree and compiling when the
> build failed.  I have already applied the following change to the tree
> as I didn't want to leave a broken tree all weekend.

Thanks!

> Please let me know
> if there are any changes to this fix-up patch that you would like to see
> made and I will take care of it. 
> 
> Sorry about breaking things.

Heh, it happens :-)


Segher


Re: [PATCH, testsuite] Add effective target stack_size

2017-06-19 Thread Christophe Lyon
On 12 June 2017 at 16:28, Tom de Vries  wrote:
> On 06/12/2017 02:28 PM, Christophe Lyon wrote:
>>
>> Hi Tom,
>>
>> On 9 June 2017 at 17:25, Mike Stump  wrote:
>>>
>>> On Jun 9, 2017, at 7:24 AM, Tom de Vries  wrote:

 this patch adds effective target stack_size.
>>>
>>>
 OK for trunk if x86_64 and nvptx testing succeeds?
>>>
>>>
>>> Ok.
>>>
>>> The only last issue in this area that I know about is that there are a
>>> few more test cases that need up to 48 MB to run, the problem is that
>>> targets might have substantially less memory.  Stack size is one of the ways
>>> this problem can be exposed.  The failure to load case is or can be handled
>>> in other ways, but the dynamic allocation case I think is relatively poorly
>>> handled.  On my machine, I just punted by running on a virtual simulator
>>> that I pushed memory up to 48 MB and ignored the issue.  If anyone wants to
>>> try their hand at it, I'd be happy to review some patches.  For those on
>>> demand virtual memory systems, of course, the problem is invisible.  I
>>> didn't have any good ideas in this area.  Marking large memory test cases
>>> with size information, and then just trimming based upon size was my only
>>> thought.  Not exactly portable, as the exact size of any test case is of
>>> course target dependent; but, if we get close enough, it can provide enough
>>> of a solution I think.
>>>
>>> If people have better ideas in this area, even if you don't want to
>>> implement them, it'd be nice to hear about them.
>>
>>
>> After this commit (r249090), I've noticed that badalloc1.C fails at
>> execution on aarch64 and arm bare-metal targets.
>>
>> It is compiled with -DSTACK_SIZE=16384, maybe that's too small?
>
>
> I think that what's going on is the following:
> - your board description file for aarch64 and arm bare-metal sets
>   gcc,stack_size
> - before I committed the patch, STACK_SIZE was not defined when
>   compiling this testcase, because the activated .exp files do not
>   define it
> - after I committed the patch, STACK_SIZE started to be defined, and
>   the test started to fail
>

I think you are right.

> I'm not sure if this test was ever compiled with STACK_SIZE defined.
>
> Either way, the test-case uses the presence of STACK_SIZE, not the actual
> value, so changing the value of gcc,stack_size won't make a difference.
>
> Ideally you'd find out what the exact reason for the failure is, and update
> the test-case accordingly.
>
> The easiest thing we can do is to remove the STACK_SIZE setting in the
> test-case (and to avoid confusion, remove all the dead STACK_SIZE-enabled
> code), which returns the status quo of before the patch.
>

I tried to compile with -DSTACK_SIZE & execute the test on x86, and
the first call to malloc() (as defined in the testcase) aborts. This call occurs
before entering main() and tries to allocate size=72704, which is
way larger than arena_size = 256 + 8 * 128 (=1280). This is with a
shared libstdc++.

Linking with -static also implies using
-Wl,--allow-multiple-definition, and leads
to a failure to allocate size=5280.

I too wonder whether the test ever worked with STACK_SIZE defined?
(Yet, arena_size was updated when PR64535 was fixed)

The attached patch removes the support for STACK_SIZE in the testcase
as you suggested, and it works fine (cross-tested on aarch64/arm targets)

OK for trunk?

Thanks,

Christophe


> Thanks,
> - Tom
2017-06-19  Christophe Lyon  

gcc/testsuite/
* g++.old-deja/g++.eh/badalloc1.C: Remove code path for
-DSTACK_SIZE.
diff --git a/gcc/testsuite/g++.old-deja/g++.eh/badalloc1.C 
b/gcc/testsuite/g++.old-deja/g++.eh/badalloc1.C
index f63d5c6..b660e84 100644
--- a/gcc/testsuite/g++.old-deja/g++.eh/badalloc1.C
+++ b/gcc/testsuite/g++.old-deja/g++.eh/badalloc1.C
@@ -3,7 +3,6 @@
 // itself call malloc(), and will fail if there is no more
 // memory available.
 // { dg-do run { xfail { { xstormy16-*-* *-*-darwin[3-7]* } || vxworks_rtp } } 
}
-// { dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value 
stack_size]" { target { stack_size } } }
 // Copyright (C) 2000, 2002, 2003, 2010, 2012, 2014 Free Software Foundation, 
Inc.
 // Contributed by Nathan Sidwell 6 June 2000 
 
@@ -16,12 +15,6 @@ extern "C" void *memcpy(void *, const void *, size_t);
 // libstdc++ requires a large initialization time allocation for the
 // emergency EH allocation pool.  Add that to the arena size.
 
-// Assume that STACK_SIZE defined implies a system that does not have a
-// large data space either, and additionally that we're not linking against
-// a shared libstdc++ (which requires quite a bit more initialization space).
-#ifdef STACK_SIZE
-const int arena_size = 256 + 8 * 128;
-#else
 #if defined(__FreeBSD__) || defined(__sun__) || defined(__hpux__)
 // FreeBSD, Solaris and HP-UX require even more space at 

[testsuite] Remove reference to Solaris 2.[56]

2017-06-19 Thread Rainer Orth
I happened to notice that we have one last reference to long obsolete
Solaris versions in the testsuite.  Fixed like this, tested with the
appropriate runtest invocation on i386-pc-solaris2.12, installed on
mainline.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-06-19  Rainer Orth  

* g++.dg/other/unused1.C: Remove *-*-solaris2.[56]* from
dg-skip-if list.

# HG changeset patch
# Parent  815bc356f0b7743fb4cb5c9aa382bb7b1edc66d2
Remove reference to Solaris 2.[56]

diff --git a/gcc/testsuite/g++.dg/other/unused1.C b/gcc/testsuite/g++.dg/other/unused1.C
--- a/gcc/testsuite/g++.dg/other/unused1.C
+++ b/gcc/testsuite/g++.dg/other/unused1.C
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-g" } */
-/* { dg-skip-if "" { { hppa*-*-hpux* *-*-solaris2.[56]* powerpc-ibm-aix* } && { ! hppa*64*-*-* } } } */
+/* { dg-skip-if "" { { hppa*-*-hpux* powerpc-ibm-aix* } && { ! hppa*64*-*-* } } } */
 
 /* Make sure we didn't eliminate casted types because we thought they were
unused.  */


Add quality tracking for profile counter

2017-06-19 Thread Jan Hubicka
Hi,
this patch makes us to track quality of the profile.  This is useful
to disable some agressive optimizations when counts are known to be
unreliable.

Bootstrapped/regtested x86_64-linux,
Honza
* profile-count.c (profile_count::dump): Dump quality.
(profile_count::differs_from_p): Update for unsigned val.
* profile-count.h (profile_count_quality): New enum.
(profile_count): Turn m_val to 62bit unsigned, add quality tracking.
Index: profile-count.c
===
--- profile-count.c (revision 249347)
+++ profile-count.c (working copy)
@@ -37,7 +37,15 @@ profile_count::dump (FILE *f) const
   if (!initialized_p ())
 fprintf (f, "uninitialized");
   else
-fprintf (f, "%" PRId64, m_val);
+{
+  fprintf (f, "%" PRId64, m_val);
+  if (m_quality == count_adjusted)
+   fprintf (f, "(adjusted)");
+  else if (m_quality == count_afdo)
+   fprintf (f, "(auto FDO)");
+  else if (m_quality == count_guessed)
+   fprintf (f, "(guessed)");
+}
 }
 
 void
@@ -51,7 +59,7 @@ profile_count::differs_from_p (profile_c
 {
   if (!initialized_p () || !other.initialized_p ())
 return false;
-  if (m_val - other.m_val < 100 && other.m_val - m_val < 100)
+  if (m_val - other.m_val < 100 || other.m_val - m_val < 100)
 return false;
   if (!other.m_val)
 return true;
@@ -64,6 +72,7 @@ profile_count::stream_in (struct lto_inp
 {
   profile_count ret;
   ret.m_val = streamer_read_gcov_count (ib);
+  ret.m_quality = (profile_count_quality) streamer_read_uhwi (ib);
   return ret;
 }
 
@@ -71,10 +80,12 @@ void
 profile_count::stream_out (struct output_block *ob)
 {
   streamer_write_gcov_count (ob, m_val);
+  streamer_write_uhwi (ob, m_quality);
 }
 
 void
 profile_count::stream_out (struct lto_output_stream *ob)
 {
   streamer_write_gcov_count_stream (ob, m_val);
+  streamer_write_uhwi_stream (ob, m_quality);
 }
Index: profile-count.h
===
--- profile-count.h (revision 249347)
+++ profile-count.h (working copy)
@@ -21,6 +21,22 @@ along with GCC; see the file COPYING3.
 #ifndef GCC_PROFILE_COUNT_H
 #define GCC_PROFILE_COUNT_H
 
+/* Quality of the proflie count.  Because gengtype does not support enums
+   inside of clases, this is in global namespace.  */
+enum profile_count_quality {
+  /* Profile is based on static branch prediction heuristics.  It may or may
+ not reflect the reality.  */
+  count_guessed = 0,
+  /* Profile was determined by autofdo.  */
+  count_afdo = 1,
+  /* Profile was originally based on feedback but it was adjusted 
+ by code duplicating optimization.  It may not precisely reflect the
+ particular code path.  */
+  count_adjusted = 2,
+  /* Profile was read from profile feedback or determined by accurate static
+ method.  */
+  count_read = 3
+};
 
 /* The base value for branch probability notes and edge probabilities.  */
 #define REG_BR_PROB_BASE  1
@@ -58,17 +74,21 @@ along with GCC; see the file COPYING3.
 
  */
 
-
 class GTY(()) profile_count
 {
-  /* Use int64_t to hold basic block counters.  Should be at least
+  /* Use 62bit to hold basic block counters.  Should be at least
  64bit.  Although a counter cannot be negative, we use a signed
  type to hold various extra stages.  */
 
-  int64_t m_val;
+  static const int n_bits = 62;
+  static const uint64_t max_count = ((uint64_t) 1 << n_bits) - 2;
+  static const uint64_t uninitialized_count = ((uint64_t) 1 << n_bits) - 1;
+
+  uint64_t m_val : n_bits;
+  enum profile_count_quality m_quality : 2;
 
   /* Assume numbers smaller than this to multiply.  This is set to make
- testsuite pass, in future we may implement precise multiples in higer
+ testsuite pass, in future we may implement precise multiplication in higer
  rangers.  */
   static const int64_t max_safe_multiplier = 131072;
 public:
@@ -87,7 +107,8 @@ public:
   static profile_count uninitialized ()
 {
   profile_count c;
-  c.m_val = -1;
+  c.m_val = uninitialized_count;
+  c.m_quality = count_guessed;
   return c;
 }
 
@@ -97,8 +118,9 @@ public:
   static profile_count from_gcov_type (gcov_type v)
 {
   profile_count ret;
-  gcc_checking_assert (v>=0);
+  gcc_checking_assert (v >= 0 && (uint64_t) v <= max_count);
   ret.m_val = v;
+  ret.m_quality = count_read;
   return ret;
 }
 
@@ -112,7 +134,7 @@ public:
   /* Return true if value has been initialized.  */
   bool initialized_p () const
 {
-  return m_val != -1;
+  return m_val != uninitialized_count;
 }
   /* Return true if value can be trusted.  */
   bool reliable_p () const
@@ -123,7 +145,7 @@ public:
   /* Basic operations.  */
   bool operator== (const profile_count ) const
 {
-  return m_val == other.m_val;
+  return m_val == other.m_val && m_quality == other.m_quality;
 }
   

Re: [PATCH] PR libstdc++/81092 add std::wstring symbols and bump library version

2017-06-19 Thread Jonathan Wakely

On 19/06/17 11:18 +0200, Rainer Orth wrote:

Hi Jonathan,


On 14/06/17 19:13 +0100, Jonathan Wakely wrote:

There are two symbols defined in GCC 7.1's libstdc++.6.0.23 library
which are not exported on all targets (because I wrote "m" in the
linker script instead of "[jmy]").

This patch bumps the library version on gcc-7-branch to 6.0.24 and
exports the "[jy]" versions of the symbols with version the new
GLIBCXX_3.4.24 symbol version.

This requires bumping the version on trunk to 6.0.25 and moving the
new random_device::_M_get_entropy() symbol to GLIBCXX_3.4.25 (which
will be done by the patch in the following mail).


Here's the patch for trunk.

Target maintainers will need to regenerate the baseline symbols on
gcc-7-branch and trunk.


here are the Solaris patches, tested on {i386,sparc}-*-solaris2.1[012].
Ok for mainline and gcc-7 branch?


OK, thanks.

Thanks to Andreas and H.J. for their updates too.



Re: C/C++ PATCH to implement -Wmultistatement-macros (PR c/80116)

2017-06-19 Thread Marek Polacek
On Tue, Jun 13, 2017 at 03:29:32PM +, Joseph Myers wrote:
> On Tue, 13 Jun 2017, Marek Polacek wrote:
> 
> > * c-parser.c (c_parser_if_body): Set the location of the
> > body of the conditional after parsing all the labels.  Call
> > warn_for_multistatement_macros.
> > (c_parser_else_body): Likewise.
> > (c_parser_switch_statement): Likewise.
> > (c_parser_while_statement): Likewise.
> > (c_parser_for_statement): Likewise.
> > (c_parser_statement): Add a default argument.  Save the location
> > after labels have been parsed.
> > (c_parser_c99_block_statement): Likewise.
> 
> The gcc/c/ changes are OK.

Thanks.

David, do you have any more comments on the patch?

Marek


Re: Prevent infinite recursion between simplification and CSE in FRE

2017-06-19 Thread Marc Glisse

On Mon, 19 Jun 2017, Richard Biener wrote:


On Sat, Jun 17, 2017 at 9:35 AM, Marc Glisse  wrote:

Hello,

see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80887#c10 for the context.
FRE can go into an infinite recursion with some match.pd simplifications
(that have been temporarily reverted).

Limiting the depth of recursive calls is not a great solution, but it is
simple and I don't have a good idea for an alternate solution that does not
disable a lot of desirable optimizations.

There are many ways to write the limiter, I went with

  depth_limiter d;
  if (d > 100) return false;

but I could also have written the class so the use would look like

  depth_limiter d(100);
  if (!d) return false;

for instance.

100 was picked arbitrarily, I don't think it is worth having a param for it,
but we could certainly use a different value.

Bootstrap and testsuite on powerpc64le-unknown-linux-gnu.


I looked into the PR and I can't see anything wrong with the sequence
of events (they are just unfortunate...).  Somehow it feels the fix should
be somewhere in the used mprts_hook because w/o this hook we cannot
run into this kind of recursion.


I would have used the depth trick in a function from FRE or SCCVN if I 
could, but the call stack had only the more general functions. I hadn't 
thought of resetting mprts_hook, that's a nice hack.



We can (and do, there's still at least one open PR ...) run into oscillations
between two simplifications and this also happens for GENERIC folding
and the patch catches this case as well.


Note that my patch was restricted to GIMPLE.


The consequence of stopping the recursion at an arbitrary point is
a missed optimization (in the PR there's no existing value we can
value-number to, so for that specific case it doesn't matter -- maybe
that's always the case with mprts_hook driven recursions).


If there are really cases where the simplification can cascade arbitrarily 
far, we may get a stack overflow from doing normal simplification. Without 
quite reaching a stack overflow, we might also be able to cause quadratic 
time complexity. Restricting the recursion depth (possibly to something 
rather large) seems in line with other caps used in gcc.



So the nice thing about the patch is that we catch all cases but the
bad thing is that we don't anymore ICE on trivially contradicting
patterns ...


Yes :-(


So the following is a SCCVN local recursion prevention - works on the
testcase.  Can you poke holes into it?

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 249358)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1648,8 +1648,21 @@ vn_lookup_simplify_result (code_helper r
  if (!rcode.is_tree_code ())
return NULL_TREE;
  vn_nary_op_t vnresult = NULL;
-  return vn_nary_op_lookup_pieces (TREE_CODE_LENGTH ((tree_code) rcode),
-  (tree_code) rcode, type, ops, );
+  tree res = vn_nary_op_lookup_pieces (TREE_CODE_LENGTH ((tree_code) rcode),
+  (tree_code) rcode, type, ops, );
+  /* We can end up endlessly recursing simplifications if the lookup above
+ presents us with a def-use chain that mirrors the original simplification.
+ See PR80887 for an example.  Limit successful lookup artificially
+ to 10 times if we are called as mprts_hook.  */
+  if (res && mprts_hook)
+{
+  static unsigned cnt;
+  if (cnt == 0)
+   cnt = 9;
+  else if (--cnt == 0)
+   mprts_hook = NULL;
+}
+  return res;
}


I don't see how cnt is getting reset. It looks like after 9 non-recursive 
simplifications, a depth 2 simplification will get arbitrarily disabled. 
Maybe cnt could be moved outside of the function and reset from 
vn_nary_build_or_lookup_1 (next to where we set mprts_hook)? This will not 
distinguish between a skinny tree of depth 10 and an almost-complete tree 
of depth 3, but that's probably not so important (we can always bump the 
limit of 10 a bit).


I'll think about it later, unless you get to it first.

(I wonder how much we would miss with the trivial "mprts_hook = NULL;" in 
place of your new block of code. Probably too much.)


--
Marc Glisse


Re: [PATCH] Initialize live_switch_vars for SWITCH_BODY == STATEMENT_LIST (PR sanitizer/80879).

2017-06-19 Thread Martin Liška
PING^2

On 06/06/2017 08:58 AM, Martin Liška wrote:
> PING^1
> 
> On 05/26/2017 01:05 PM, Martin Liška wrote:
>> Hello.
>>
>> Unfortunately I guarded use-after-scope to track live switch variables just
>> to BIND_EXPR. However the bind expression can be included in a 
>> STATEMENT_LIST.
>> That enables proper tracking and fixes the test added.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
>>
> 



  1   2   >