Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-04 Thread Jakub Jelinek
On Wed, Nov 04, 2015 at 08:58:32PM -0800, Cesar Philippidis wrote:
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index e3f55a7..4424596 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -14395,6 +14395,15 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, 
> bool allow_fields,
>   case OMP_CLAUSE_PRIORITY:
>   case OMP_CLAUSE_ORDERED:
>   case OMP_CLAUSE_HINT:
> + case OMP_CLAUSE_NUM_GANGS:
> + case OMP_CLAUSE_NUM_WORKERS:
> + case OMP_CLAUSE_VECTOR_LENGTH:
> + case OMP_CLAUSE_GANG:

GANG has two arguments, so you want to handle it differently, you need
to tsubst both arguments.

> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_ASYNC:
> + case OMP_CLAUSE_WAIT:
> + case OMP_CLAUSE_TILE:

I don't think tile will work well this way, if the only argument is a
TREE_VEC, then I think you hit:
case TREE_VEC:
  /* A vector of template arguments.  */
  gcc_assert (!type);
  return tsubst_template_args (t, args, complain, in_decl);
which does something very much different from making a copy of the TREE_VEC
and calling tsubst_expr on each argument.
Thus, either you need to handle it manually here, or think about different
representation of OMP_CLAUSE_TILE?  It seems you allow at most one tile
clause, so perhaps you could split the single source tile clause into one
tile clause per expression in there (the only issue is that the FEs
emit the clauses in last to first order, so you'd need to nreverse the
tile clause list before adding it to the list of all clauses).

Otherwise it looks ok, except:

> diff --git a/gcc/testsuite/g++.dg/goacc/template-reduction.C 
> b/gcc/testsuite/g++.dg/goacc/template-reduction.C
> new file mode 100644
> index 000..668eeb3
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/goacc/template-reduction.C
> +++ b/gcc/testsuite/g++.dg/goacc/template.C

the testsuite coverage is orders of magnitude smaller than it should be.
Just look at the amount of OpenMP template tests (both compile time and
runtime):
grep template libgomp/testsuite/libgomp.c++/*[^4] | wc -l; grep -l template 
libgomp/testsuite/libgomp.c++/*[^4] | wc -l; grep template 
gcc/testsuite/g++.dg/gomp/* | wc -l; grep -l template 
gcc/testsuite/g++.dg/gomp/* | wc -l
629 # templates
45 # tests with templates
151 # templates
58 # tests with templates
and even that is really not sufficient.  From my experience, special care is
needed in template tests to test both non-type dependent and type-dependent
cases (e.g. some diagnostics is emitted already when parsing the templates
even when they won't be instantiated at all, other diagnostic is done during
instantiation), or for e.g. references there are interesting cases where
a reference to template parameter typename is used or where a reference to
some time is tsubsted into a template parameter typename.
E.g. you don't have any test coverage for the vector (num: ...)
or gang (static: *, num: type_dep)
or gang (static: type_dep1, type_dep2)
(which would show you the above issue with the gang clause), or sufficient
coverage for tile, etc.
Of course that coverage can be added incrementally.

Jakub


Re: Division Optimization in match and simplify

2015-11-04 Thread Marc Glisse

+/* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
+(for div (exact_div trunc_div)

Actually, it probably works for all integer divisions (floor_div, etc) 
since it is exact and thus does not depend on the rounding.


(sorry for giving the comments small piece by small piece, I write them as 
I think of them...)


+ (simplify
+  (div (convert? (bit_and @0 INTEGER_CST@1)) INTEGER_CST@2)
+  (if (integer_pow2p (@2)
+   && tree_int_cst_sign_bit (@2) == 0

I think the previous test tree_int_cst_sgn > 0 was better: for unsigned, 
it is ok if that bit is set, it is only for signed that we want to avoid 
-1/-1 which gives 1 (while a shift by 31 would give -1, except that 
wi::exact_log2 might return -1 so we would end up with a negative shift).


(actually, we could generate (unsigned)x>>31 in that case, but let's 
forget it for now)


+   && wi::add (@2, @1) == 0
+   && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (rshift (convert @0) { build_int_cst (integer_type_node, wi::exact_log2 
(@2)); }

(does this fit in 80 cols?)

Note that if we port the following to match.pd from wherever it currently 
is, the 'convert?' becomes much less useful on this transformation


(simplify
 (convert (bit_and:s @0 INTEGER_CST@1))
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
  (bit_and (convert @0) (convert @1

--
Marc Glisse


Re: Division Optimization in match and simplify

2015-11-04 Thread Hurugalawadi, Naveen
Hi,

Please find attached the modified patch as per review comments.

>> use :s on both inner rdiv in both patterns.  With that the two patterns are 
>> ok.
Done.

>> Omit the parens around REAL_CST@0
Done.

Regression tested on X86_64.

Thanks,
Naveendiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ee9b349..88dbbdd 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -10163,54 +10163,9 @@ fold_binary_loc (location_t loc,
 	return fold_build2_loc (loc, RDIV_EXPR, type,
 			negate_expr (arg0),
 			TREE_OPERAND (arg1, 0));
-
-  /* Convert A/B/C to A/(B*C).  */
-  if (flag_reciprocal_math
-	  && TREE_CODE (arg0) == RDIV_EXPR)
-	return fold_build2_loc (loc, RDIV_EXPR, type, TREE_OPERAND (arg0, 0),
-			fold_build2_loc (loc, MULT_EXPR, type,
-	 TREE_OPERAND (arg0, 1), arg1));
-
-  /* Convert A/(B/C) to (A/B)*C.  */
-  if (flag_reciprocal_math
-	  && TREE_CODE (arg1) == RDIV_EXPR)
-	return fold_build2_loc (loc, MULT_EXPR, type,
-			fold_build2_loc (loc, RDIV_EXPR, type, arg0,
-	 TREE_OPERAND (arg1, 0)),
-			TREE_OPERAND (arg1, 1));
-
-  /* Convert C1/(X*C2) into (C1/C2)/X.  */
-  if (flag_reciprocal_math
-	  && TREE_CODE (arg1) == MULT_EXPR
-	  && TREE_CODE (arg0) == REAL_CST
-	  && TREE_CODE (TREE_OPERAND (arg1, 1)) == REAL_CST)
-	{
-	  tree tem = const_binop (RDIV_EXPR, arg0,
-  TREE_OPERAND (arg1, 1));
-	  if (tem)
-	return fold_build2_loc (loc, RDIV_EXPR, type, tem,
-TREE_OPERAND (arg1, 0));
-	}
-
   return NULL_TREE;
 
 case TRUNC_DIV_EXPR:
-  /* Optimize (X & (-A)) / A where A is a power of 2,
-	 to X >> log2(A) */
-  if (TREE_CODE (arg0) == BIT_AND_EXPR
-	  && !TYPE_UNSIGNED (type) && TREE_CODE (arg1) == INTEGER_CST
-	  && integer_pow2p (arg1) && tree_int_cst_sgn (arg1) > 0)
-	{
-	  tree sum = fold_binary_loc (loc, PLUS_EXPR, TREE_TYPE (arg1),
-  arg1, TREE_OPERAND (arg0, 1));
-	  if (sum && integer_zerop (sum)) {
-	tree pow2 = build_int_cst (integer_type_node,
-   wi::exact_log2 (arg1));
-	return fold_build2_loc (loc, RSHIFT_EXPR, type,
-TREE_OPERAND (arg0, 0), pow2);
-	  }
-	}
-
   /* Fall through */
   
 case FLOOR_DIV_EXPR:
diff --git a/gcc/match.pd b/gcc/match.pd
index f6c5c07..d11ad79 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -247,6 +247,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (!HONOR_SNANS (type))
   (negate @0)))
 
+(if (flag_reciprocal_math)
+ /* Convert (A/B)/C to A/(B*C)  */
+ (simplify
+  (rdiv (rdiv:s @0 @1) @2)
+   (rdiv @0 (mult @1 @2)))
+
+ /* Convert A/(B/C) to (A/B)*C  */
+ (simplify
+  (rdiv @0 (rdiv:s @1 @2))
+   (mult (rdiv @0 @1) @2)))
+
+/* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
+(for div (exact_div trunc_div)
+ (simplify
+  (div (convert? (bit_and @0 INTEGER_CST@1)) INTEGER_CST@2)
+  (if (integer_pow2p (@2)
+   && tree_int_cst_sign_bit (@2) == 0
+   && wi::add (@2, @1) == 0
+   && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (rshift (convert @0) { build_int_cst (integer_type_node, wi::exact_log2 (@2)); }
+
 /* If ARG1 is a constant, we can convert this to a multiply by the
reciprocal.  This does not have the same rounding properties,
so only do this if -freciprocal-math.  We can actually
@@ -464,6 +485,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (tem)
  (rdiv { tem; } @1)
 
+/* Convert C1/(X*C2) into (C1/C2)/X  */
+(simplify
+ (rdiv REAL_CST@0 (mult @1 REAL_CST@2))
+  (if (flag_reciprocal_math)
+   (with
+{ tree tem = const_binop (RDIV_EXPR, type, @0, @2); }
+(if (tem)
+ (rdiv { tem; } @1)
+
 /* Simplify ~X & X as zero.  */
 (simplify
  (bit_and:c (convert? @0) (convert? (bit_not @0)))


[gomp4] assorted trunk backports

2015-11-04 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which does the following:

 * reverts some of the fortran changes that Jakub rejected for
   trunk, specifically those involving resolve_omp_duplicate_list

 * updates how num_gangs, num_workers and vector_length are parsed by
   the c++ FE

 * changes how errors are reported for acc update when no clauses are
   specified and updates the test cases for c, c++ and fortran

 * shuffles around various code to match trunk in the fortran FE

We probably should update the way the c FE handles num_gangs,
num_workers and vector_length too, but I left it as-is since it works.
The c++ FE would have had problems with template support eventually
because it was doing type checking during scanning.

Cesar
2015-11-04  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_update): Update the error message for
	missing clauses.

	gcc/cp/
	* parser.c (cp_parser_oacc_simple_clause): New function.
	(cp_parser_oacc_clause_vector_length): Delete.
	(cp_parser_omp_clause_num_gangs): Delete.
	(cp_parser_omp_clause_num_workers): Delete.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_simple_clause for
	num_gangs, num_workers and vector_length.
	(cp_parser_oacc_update): Update the error message for missing
	clauses.

	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Update how default is parsed.
	(gfc_match_oacc_update): Update error message.
	(resolve_omp_clauses): Don't use resolve_omp_duplicate_list for
	omp clauses.
	(oacc_code_to_statement): Merge atomic placement from trunk.

	gcc/testsuite/
	* c-c++-common/goacc/update-1.c: Update expected error.
	* gfortran.dg/goacc/update.f95: Likewise.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 45e4248..fa70055 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -14060,7 +14060,7 @@ c_parser_oacc_update (c_parser *parser)
 {
   error_at (loc,
 		"%<#pragma acc update%> must contain at least one "
-		"% or % clause");
+		"% or % or % clause");
   return;
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1e2afdb..ce8bc6a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29652,6 +29652,39 @@ cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
   return c;
 }
 
+ /* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
+
+static tree
+cp_parser_oacc_single_int_clause (cp_parser *parser, omp_clause_code code,
+  const char *str, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+return list;
+
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
+
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+{
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
+  return list;
+}
+
+  check_no_duplicate_clause (list, code, str, loc);
+
+  tree c = build_omp_clause (loc, code);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -29923,45 +29956,6 @@ cp_parser_oacc_clause_tile (cp_parser *parser, tree list, location_t here)
   return c;
 }
 
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
-
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-{
-  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenACC 2.0
Parse wait clause or directive parameters.  */
 
@@ -30345,42 +30339,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to

Re: [PR64164] drop copyrename, integrate into expand

2015-11-04 Thread Alexandre Oliva
On Sep 23, 2015, Alexandre Oliva  wrote:

> @@ -2982,38 +2887,39 @@ assign_parm_setup_block (struct assign_parm_data_all 
> *all,
[snip]
> +  if (GET_CODE (reg) != CONCAT)
> +stack_parm = reg;
> +  else
> +/* This will use or allocate a stack slot that we'd rather
> +   avoid.  FIXME: Could we avoid it in more cases?  */
> +target_reg = reg;

It turns out that we can, and that helps fixing PR67753.  In the end, I
ended up using the ABI-reserved stack slot if there is one, but just
allocating an unsplit complex pseudo fixes all remaining cases that used
to require the allocation of a stack slot.  Yay!

As for pr67753 proper, we emitted the store of the PARALLEL entry_parm
into the stack parm only in the conversion seq, which was ultimately
emitted after the copy from stack_parm to target_reg that was supposed
to copy the value originally in entry_parm.  So we copied an
uninitialized stack slot, and the subsequent store in the conversion seq
was optimized out as dead.

This caused a number of regressions on hppa-linux-gnu.  The fix for this
is to arrange for the copy to target_reg to be emitted in the conversion
seq if the copy to stack_parm was.  I can't determine whether this fix
all reported regressions, but from visual inspection of the generated
code I'm pretty sure it fixes at least gcc.c-torture/execute/pr38969.c.


When we do NOT have an ABI-reserved stack slot, the store of the
PARALLEL entry_parm into the intermediate pseudo doesn't need to go in
the conversion seq (emit_group_store from a PARALLEL to a pseudo only
uses registers, according to another comment in function.c), so I've
simplified that case.


This was regstrapped on x86_64-linux-gnu, i686-linux-gnu,
ppc64-linux-gnu, ppc64el-linux-gnu, and cross-build-tested for all
targets for which I've tested the earlier patches in the patchset.
Ok to install?



[PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg

From: Alexandre Oliva 

In assign_parms_setup_block, the copy of args in PARALLELs from
entry_parm to stack_parm is deferred to the parm conversion insn seq,
but the copy from stack_parm to target_reg was inserted in the normal
copy seq, that is executed before the conversion insn seq.  Oops.

We could do away with the need for an actual stack_parm in general,
which would have avoided the need for emitting the copy to target_reg
in the conversion seq, but at least on pa, due to the need for stack
to copy between SI and SF modes, it seems like using the reserved
stack slot is beneficial, so I put in logic to use a pre-reserved
stack slot when there is one, and emit the copy to target_reg in the
conversion seq if stack_parm was set up there.

for  gcc/ChangeLog

PR rtl-optimization/67753
PR rtl-optimization/64164
* function.c (assign_parm_setup_block): Avoid allocating a
stack slot if we don't have an ABI-reserved one.  Emit the
copy to target_reg in the conversion seq if the copy from
entry_parm is in it too.  Don't use the conversion seq to copy
a PARALLEL to a REG or a CONCAT.
---
 gcc/function.c |   39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index aaf49a4..156c72b 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -2879,6 +2879,7 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
   rtx entry_parm = data->entry_parm;
   rtx stack_parm = data->stack_parm;
   rtx target_reg = NULL_RTX;
+  bool in_conversion_seq = false;
   HOST_WIDE_INT size;
   HOST_WIDE_INT size_stored;
 
@@ -2895,9 +2896,23 @@ assign_parm_setup_block (struct assign_parm_data_all 
*all,
   if (GET_CODE (reg) != CONCAT)
stack_parm = reg;
   else
-   /* This will use or allocate a stack slot that we'd rather
-  avoid.  FIXME: Could we avoid it in more cases?  */
-   target_reg = reg;
+   {
+ target_reg = reg;
+ /* Avoid allocating a stack slot, if there isn't one
+preallocated by the ABI.  It might seem like we should
+always prefer a pseudo, but converting between
+floating-point and integer modes goes through the stack
+on various machines, so it's better to use the reserved
+stack slot than to risk wasting it and allocating more
+for the conversion.  */
+ if (stack_parm == NULL_RTX)
+   {
+ int save = generating_concat_p;
+ generating_concat_p = 0;
+ stack_parm = gen_reg_rtx (mode);
+ generating_concat_p = save;
+   }
+   }
   data->stack_parm = NULL;
 }
 
@@ -2938,7 +2953,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
   mem = validize_mem (copy_rtx (stack_parm));
 
   /* Handle values in multiple non-contiguous locations.  */
-  if (GET_CODE (entry_parm) == PARALLEL)
+  if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (m

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 02:24 AM, Jakub Jelinek wrote:
> Have you verified pt.c does the right thing when instantiating the
> OMP_CLAUSE_TILE clause (I mean primarily the TREE_VEC in there)?
> There really should be testcases for that.

Here's a patch which adds template support for the oacc clauses. Is it
ok for trunk?

Cesar
2015-11-04  Cesar Philippidis  

	gcc/cp/
	* pt.c (tsubst_omp_clauses): Add support for OMP_CLAUSE_{NUM_GANGS,
	NUM_WORKERS,VECTOR_LENGTH,GANG,WORKER,VECTOR,ASYNC,WAIT,TILE,AUTO,
	INDEPENDENT,SEQ}. 
	(tsubst_expr): Add support for OMP_CLAUSE_{KERNELS,PARALLEL,LOOP}.

	gcc/testsuite/
	* g++.dg/goacc/template-reduction.C: New test.
	* g++.dg/goacc/template.C: New test.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e3f55a7..4424596 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -14395,6 +14395,15 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, bool allow_fields,
 	case OMP_CLAUSE_PRIORITY:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_HINT:
+	case OMP_CLAUSE_NUM_GANGS:
+	case OMP_CLAUSE_NUM_WORKERS:
+	case OMP_CLAUSE_VECTOR_LENGTH:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_ASYNC:
+	case OMP_CLAUSE_WAIT:
+	case OMP_CLAUSE_TILE:
 	  OMP_CLAUSE_OPERAND (nc, 0)
 	= tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, 
 			   in_decl, /*integral_constant_expression_p=*/false);
@@ -14449,6 +14458,9 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, bool allow_fields,
 	case OMP_CLAUSE_THREADS:
 	case OMP_CLAUSE_SIMD:
 	case OMP_CLAUSE_DEFAULTMAP:
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_SEQ:
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -15197,6 +15209,15 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   }
   break;
 
+case OACC_KERNELS:
+case OACC_PARALLEL:
+  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false, false, args, complain,
+in_decl);
+  stmt = begin_omp_parallel ();
+  RECUR (OMP_BODY (t));
+  finish_omp_construct (TREE_CODE (t), stmt, tmp);
+  break;
+
 case OMP_PARALLEL:
   r = push_omp_privatization_clauses (OMP_PARALLEL_COMBINED (t));
   tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false, true,
@@ -15227,6 +15248,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 case CILK_FOR:
 case OMP_DISTRIBUTE:
 case OMP_TASKLOOP:
+case OACC_LOOP:
   {
 	tree clauses, body, pre_body;
 	tree declv = NULL_TREE, initv = NULL_TREE, condv = NULL_TREE;
@@ -15235,7 +15257,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 	int i;
 
 	r = push_omp_privatization_clauses (OMP_FOR_INIT (t) == NULL_TREE);
-	clauses = tsubst_omp_clauses (OMP_FOR_CLAUSES (t), false, true,
+	clauses = tsubst_omp_clauses (OMP_FOR_CLAUSES (t), false,
+  TREE_CODE (t) != OACC_LOOP,
   args, complain, in_decl);
 	if (OMP_FOR_INIT (t) != NULL_TREE)
 	  {
@@ -15305,9 +15328,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   pop_omp_privatization_clauses (r);
   break;
 
+case OACC_DATA:
 case OMP_TARGET_DATA:
 case OMP_TARGET:
-  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false, true,
+  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false,
+TREE_CODE (t) != OACC_DATA,
 args, complain, in_decl);
   keep_next_level (true);
   stmt = begin_omp_structured_block ();
@@ -15331,6 +15356,16 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   add_stmt (t);
   break;
 
+case OACC_ENTER_DATA:
+case OACC_EXIT_DATA:
+case OACC_UPDATE:
+  tmp = tsubst_omp_clauses (OMP_STANDALONE_CLAUSES (t), false, false,
+args, complain, in_decl);
+  t = copy_node (t);
+  OMP_STANDALONE_CLAUSES (t) = tmp;
+  add_stmt (t);
+  break;
+
 case OMP_ORDERED:
   tmp = tsubst_omp_clauses (OMP_ORDERED_CLAUSES (t), false, true,
 args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/goacc/template-reduction.C b/gcc/testsuite/g++.dg/goacc/template-reduction.C
new file mode 100644
index 000..668eeb3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/template-reduction.C
@@ -0,0 +1,104 @@
+// This error is temporary.  Remove when support is added for these clauses
+// in the middle end.
+// { dg-prune-output "sorry, unimplemented" }
+
+extern void abort ();
+
+const int n = 100;
+
+// Check explicit template copy map
+
+template T
+sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, array[0:n])
+  for (int i = 0; i < n; i++)
+s += array[i];
+
+  return s;
+}
+
+// Check implicit template copy map
+
+template T
+sum ()
+{
+  T s = 0;
+  T array[n];
+
+  for (int i = 0; i < n; i++)
+array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s)
+  for (int i = 0; i < n; i++)
+s += array[i];
+
+  return s;
+}
+
+// Check present and async
+
+template T
+async_sum 

Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-04 Thread Jason Merrill

On 11/04/2015 07:15 PM, Martin Sebor wrote:

There was a lot of discussion of C++ aliasing rules at the recent
meeting; we really seem to be moving in the direction of being stricter
about which union member is active.  So I think we do want to diagnose
the new-expression above; the user should write new (&u) if that's what
they mean.


Okay. I changed that in the latest patch.


Adjust is negative when the offset to a buffer of known size is
negative. For example:

 char buf [sizeof (int)];
 new (&buf [1] - 1) int;


OK, so because we're looking at the expression from the outside in, we
first see the subtraction and adjust becomes -1, then we see the
array_ref and adjust returns to 0.  We still don't have a negative
adjust by the time we get to the quoted if/else.


I think I see what you mean. I've changed the type of the variables
and the computation to unsigned. That made it possible to eliminate
the final else and do some other cleanup. Attached is an updated
patch.


Hmm, I was suggesting that bytes_avail change to unsigned, but I don't 
think adjust should change; I believe that 0u - 1u is undefined due to 
overflow even though (-1u) and (unsigned)-1 are well defined.  Sorry for 
the muddled messages.  I think let's leave adjust signed and assert that 
it ends up non-negative.


Jason



Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-04 Thread Martin Sebor

On 11/04/2015 06:09 PM, Sandra Loosemore wrote:

On 11/04/2015 04:52 PM, Martin Sebor wrote:


gcc/ChangeLog:
2015-11-04  Martin Sebor  

* opts.c (print_filtered_help): Indicate when an optimization option
is disabled as a result of -O0.
* doc/invoke.texi: Further clarify the effect of -O options
on individual optimization options.


invoke.texi is a huge file.  In ChangeLogs, please use (node name) to
indicate which sections have been changed unless the changes really do
apply throughout the whole file.


@@ -1509,6 +1509,14 @@
 disabled or set to a specific value (assuming that the compiler
 knows this at the point where the @option{--help=} option is used).

+It's important to remember that when a given optimization option is
+enabled, either explicitly on the command line or implicitly, whether
+or not the optimization it controls will be performed during an


s/will be performed/is performed/


+Finally, the following example shows the difference in output for
+an option that, while technically enabled, is disabled as a consequence
+of the implicit @option{-O0} option, and for one that is disabled by
+default.  This distinction is typically only of interest to GCC
developers.


If the distinction is only interesting to developers, why does it need
an example in GCC's user documentation?  :-S


Thank you for the feedback. I'll incorporate it into the next version
of the patch.

The point of the example is to make it clear to users that despite
the different output the effect of both kinds of options on
optimization is the same (i.e., none). I made the output different
only because I expect the distinction between the state of the two
kinds of options to be of interest to GCC developers. Otherwise,
it could just say [disabled] for both.

Martin


Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-04 Thread Sandra Loosemore

On 11/04/2015 04:52 PM, Martin Sebor wrote:


gcc/ChangeLog:
2015-11-04  Martin Sebor  

* opts.c (print_filtered_help): Indicate when an optimization option
is disabled as a result of -O0.
* doc/invoke.texi: Further clarify the effect of -O options
on individual optimization options.


invoke.texi is a huge file.  In ChangeLogs, please use (node name) to 
indicate which sections have been changed unless the changes really do 
apply throughout the whole file.



@@ -1509,6 +1509,14 @@
 disabled or set to a specific value (assuming that the compiler
 knows this at the point where the @option{--help=} option is used).

+It's important to remember that when a given optimization option is
+enabled, either explicitly on the command line or implicitly, whether
+or not the optimization it controls will be performed during an


s/will be performed/is performed/


+Finally, the following example shows the difference in output for
+an option that, while technically enabled, is disabled as a consequence
+of the implicit @option{-O0} option, and for one that is disabled by
+default.  This distinction is typically only of interest to GCC developers.


If the distinction is only interesting to developers, why does it need 
an example in GCC's user documentation?  :-S


I don't have any other comments on the writing aspects of the patch, but 
others may want to comment on technical accuracy, etc.


-Sandra



Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-04 Thread Joseph Myers
On Wed, 4 Nov 2015, Martin Sebor wrote:

> Improving the compiler output is a good idea. The attached patch
> prints "[disabled by -O0]" instead of "[enabled]" when an optimization
> option is enabled by default but when optimization (i.e., -O1 or
> greater) is not enabled.

I don't think it's entirely accurate that all options marked as 
Optimization in *.opt are actually disabled by -O0.  Many are, but it 
depends on the actual logic controlling each optimization.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-04 Thread Martin Sebor

There was a lot of discussion of C++ aliasing rules at the recent
meeting; we really seem to be moving in the direction of being stricter
about which union member is active.  So I think we do want to diagnose
the new-expression above; the user should write new (&u) if that's what
they mean.


Okay. I changed that in the latest patch.


Adjust is negative when the offset to a buffer of known size is
negative. For example:

 char buf [sizeof (int)];
 new (&buf [1] - 1) int;


OK, so because we're looking at the expression from the outside in, we
first see the subtraction and adjust becomes -1, then we see the
array_ref and adjust returns to 0.  We still don't have a negative
adjust by the time we get to the quoted if/else.


I think I see what you mean. I've changed the type of the variables
and the computation to unsigned. That made it possible to eliminate
the final else and do some other cleanup. Attached is an updated
patch.

Tested on x86_64 by botstrapping C and C++ and running make check.

Martin
gcc ChangeLog
2015-11-04  Martin Sebor  

	PR c++/67942
* invoke.texi (-Wplacement-new): Document new option.
	* gcc/testsuite/g++.dg/warn/Wplacement-new-size.C: New test.

gcc/c-family ChangeLog
2015-11-04  Martin Sebor  

	PR c++/67942
* c.opt (-Wplacement-new): New option.

gcc/cp ChangeLog
2015-11-04  Martin Sebor  

	PR c++/67942
	* cp/init.c (warn_placement_new_too_small): New function.
	(build_new_1): Call it.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 47ba070..5e9d7a3 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -760,6 +760,10 @@ Wprotocol
 ObjC ObjC++ Var(warn_protocol) Init(1) Warning
 Warn if inherited methods are unimplemented

+Wplacement-new
+C++ Var(warn_placement_new) Init(1) Warning
+Warn for placement new expressions with undefined behavior
+
 Wredundant-decls
 C ObjC C++ ObjC++ Var(warn_redundant_decls) Warning
 Warn about multiple declarations of the same object
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1ed8f6c..e2285ec 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2269,6 +2269,199 @@ throw_bad_array_new_length (void)
   return build_cxx_call (fn, 0, NULL, tf_warning_or_error);
 }

+/* Attempt to verify that the argument, OPER, of a placement new expression
+   refers to an object sufficiently large for an object of TYPE or an array
+   of NELTS of such objects when NELTS is non-null, and issue a warning when
+   it does not.  SIZE specifies the size needed to construct the object or
+   array and captures the result of NELTS * sizeof (TYPE). (SIZE could be
+   greater when the array under construction requires a cookie to store
+   NELTS.  GCC's placement new expression stores the cookie when invoking
+   a user-defined placement new operator function but not the default one.
+   Placement new expressions with user-defined placement new operator are
+   not diagnosed since we don't know how they use the buffer (this could
+   be a future extension).  */
+static void
+warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper)
+{
+  location_t loc = EXPR_LOC_OR_LOC (oper, input_location);
+
+  /* The number of bytes to subtract from the size of the provided buffer
+ based on an offset into an array or an array element reference.
+ Although intermediate results may be negative (as in a[3] - 2),
+ the ultimate result cannot be and so the computation is done in
+ unsigned HOST_WIDE_INT.  */
+  unsigned HOST_WIDE_INT adjust = 0;
+  /* True when the size of the entire destination object should be used
+ to compute the possibly optimistic estimate of the available space.  */
+  bool use_obj_size = false;
+  /* True when the reference to the destination buffer is an ADDR_EXPR.  */
+  bool addr_expr = false;
+
+  STRIP_NOPS (oper);
+
+  /* Using a function argument or a (non-array) variable as an argument
+ to placement new is not checked since it's unknown what it might
+ point to.  */
+  if (TREE_CODE (oper) == PARM_DECL
+  || TREE_CODE (oper) == VAR_DECL
+  || TREE_CODE (oper) == COMPONENT_REF)
+return;
+
+  /* Evaluate any constant expressions.  */
+  size = fold_non_dependent_expr (size);
+
+  /* Handle the common case of array + offset expression when the offset
+ is a constant.  */
+  if (TREE_CODE (oper) == POINTER_PLUS_EXPR)
+{
+  /* If the offset is comple-time constant, use it to compute a more
+	 accurate estimate of the size of the buffer.  Otherwise, use
+	 the size of the entire array as an optimistic estimate (this
+	 may lead to false negatives).  */
+  const_tree adj = TREE_OPERAND (oper, 1);
+  if (CONSTANT_CLASS_P (adj))
+	adjust += tree_to_uhwi (adj);
+  else
+	use_obj_size = true;
+
+  oper = TREE_OPERAND (oper, 0);
+
+  STRIP_NOPS (oper);
+}
+
+  if (TREE_CODE (oper) == TARGET_EXPR)
+oper = TREE_OPERAND (oper, 1);
+  else if (TREE_CODE (oper) == ADDR_EXPR)
+{
+  addr_expr = true;
+  oper = TREE_OPERAND

Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-04 Thread Martin Sebor

On 11/03/2015 03:19 AM, Alexander Monakov wrote:

On Thu, 22 Oct 2015, Martin Sebor wrote:


[Sending to the right list this time]

The documentation of the -Q --help=optimizers options leads some
to expect that when options are reported as enabled imply the
corresponding optimization will take place.  (See the following
question on gcc-help:
https://gcc.gnu.org/ml/gcc-help/2015-10/msg00133.html)

The patch below tries to make it clear that that's not always
the case.


Hi,

The issue is due to optimization passes being skipped at -O0, and yet
corresponding optimization options not explicitely disabled.  The effect of -O
is an old source of confusion, and now the intro to "Optimization Options"
says,

 Most optimizations are only enabled if an -O level is set on the command
 line.  Otherwise they are disabled, even if individual optimization flags
 are specified.

(added with this patch:
https://gcc.gnu.org/ml/gcc-patches/2009-10/msg00739.html )


Thanks for the reference! Despite the improvement, this continues
to be a recurring problem. Users tend to miss the added text, maybe
because it's on a different HTML page than the --help option. That
certainly seemed to be the case in this post:

  https://gcc.gnu.org/ml/gcc-help/2015-10/msg00135.html

It's also possible that it's because the caveat is mentioned in
a context that doesn't match their use case. In the originally
referenced thread, the user wasn't looking to enable additional
optimizations. Rather, they were trying to see what (if any)
optimizations are enabled by default, with -O0.



As we observe, it's not visible enough, and I'm not sure saying that again in
the documentation (in a different section) is a good way to go.  Maybe we'd
warn for attempts to enable optimizations at -O0, but that's not trivial.
Perhaps go with Richard's suggestion in the end of this mail ("Thus, at the
end of --help-optimizers print ...")?
https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00113.html


Improving the compiler output is a good idea. The attached patch
prints "[disabled by -O0]" instead of "[enabled]" when an optimization
option is enabled by default but when optimization (i.e., -O1 or
greater) is not enabled.

The patch also further clarifies the wording in the documentation
to help users avoid falling into the trap of assuming, based on
an incomplete reading of the manual, that some optimizations are
performed even at -O0.

Martin
gcc/ChangeLog:
2015-11-04  Martin Sebor  

	* opts.c (print_filtered_help): Indicate when an optimization option
	is disabled as a result of -O0.
	* doc/invoke.texi: Further clarify the effect of -O options
	on individual optimization options.

gcc/testsuite/ChangeLog:
2015-11-04  Martin Sebor  

	* testsuite/gcc.misc-tests/help.exp: Verify that optimization options
	are printed as disabled when -O0 is specified and enabled otherwise.

Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 229716)
+++ doc/invoke.texi	(working copy)
@@ -1509,6 +1509,14 @@
 disabled or set to a specific value (assuming that the compiler
 knows this at the point where the @option{--help=} option is used).

+It's important to remember that when a given optimization option is
+enabled, either explicitly on the command line or implicitly, whether
+or not the optimization it controls will be performed during an
+invocation of the compiler may depend on other options, most commonly
+one of the @option{-O} options.  This is because many options control
+various finer aspects of other more general optimizations that must
+be enabled in order for the former option to have any effect.
+
 Here is a truncated example from the ARM port of @command{gcc}:

 @smallexample
@@ -1524,7 +1532,7 @@
 are enabled at @option{-O2} by using:

 @smallexample
--Q -O2 --help=optimizers
+  % gcc -Q -O2 --help=optimizers
 @end smallexample

 Alternatively you can discover which binary optimizations are enabled
@@ -1531,11 +1539,22 @@
 by @option{-O3} by using:

 @smallexample
-gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
-gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
-diff /tmp/O2-opts /tmp/O3-opts | grep enabled
+  % gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
+  % gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
+  % diff /tmp/O2-opts /tmp/O3-opts | grep enabled
 @end smallexample

+Finally, the following example shows the difference in output for
+an option that, while technically enabled, is disabled as a consequence
+of the implicit @option{-O0} option, and for one that is disabled by
+default.  This distinction is typically only of interest to GCC developers.
+@smallexample
+  % gcc -Q --help=optimizers
+  The following options control optimizations:
+  -faggressive-loop-optimizations 	[disabled by -O0]
+  -falign-functions   		[disabled]
+@end smallexample
+
 @item -no-canonical-prefixes
 @opindex no-canonical-prefixes
 Do not expand any symbolic links, resolve references to @samp{/

Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-04 Thread Segher Boessenkool
Hi Kyrill,

On Mon, Nov 02, 2015 at 02:15:27PM +, Kyrill Tkachov wrote:
> This patch attempts to restrict combine from transforming ZERO_EXTEND and 
> SIGN_EXTEND operations into and-bitmask
> and weird SUBREG expressions when they appear inside MULT expressions. This 
> is because a MULT rtx containing these
> extend operations is usually a well understood widening multiply operation.

Right.  I have often wanted for combine to try plain substitution as well
as substitution and simplification: simplification (which uses nonzero_bits
etc.) often makes a mess of even moderately complex instructions.

But since we do not have that yet, and your 24% number is nicely impressive,
let's try to do no simplification for widening mults, as in your patch.

> With this patch on aarch64 I saw increased generation of smaddl and umaddl 
> instructions where
> before the combination of smull and add instructions were rejected because 
> the extend operands
> were being transformed into these weird equivalent expressions.
> 
> Overall, I see 24% more [su]maddl instructions being generated for SPEC2006 
> and with no performance regressions.
> 
> The testcase in the patch is the most minimal one I could get that 
> demonstrates the issue I'm trying to solve.
> 
> Does this approach look ok?

In broad lines it does.  Could you try this patch instead?  Not tested
etc. (other than building an aarch64 cross and your test case); I'll do
that tomorrow if you like the result.


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..3bf7ffb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
  || GET_CODE (SET_DEST (x)) == PC))
fmt = "ie";
 
+  /* Substituting into the operands of a widening MULT is not likely
+to create RTL matching a machine insn.  */
+  if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+   return x;
+
   /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
 constant.  */
   if (fmt[0] == 'e')
@@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned 
HOST_WIDE_INT mask,
   /* ... fall through ...  */
 
 case MULT:
+  /* Substituting into the operands of a widening MULT is not likely to
+create RTL matching a machine insn.  */
+  if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+   return gen_lowpart_or_truncate (mode, x);
+
   /* For PLUS, MINUS and MULT, we need any bits less significant than the
 most significant bit in MASK since carries from those bits will
 affect the bits we are interested in.  */
-- 
1.9.3



Re: [ping] Fix PR debug/66728

2015-11-04 Thread Mike Stump

On Nov 4, 2015, at 12:50 PM, Richard Sandiford  
wrote:

> Mike Stump  writes:
>> Index: dwarf2out.c
>> ===
>> --- dwarf2out.c  (revision 229720)
>> +++ dwarf2out.c  (working copy)
>> @@ -15593,8 +15593,13 @@
>>   return true;
>> 
>> case CONST_WIDE_INT:
>> -  add_AT_wide (die, DW_AT_const_value,
>> -   std::make_pair (rtl, GET_MODE (rtl)));
>> +  {
>> +wide_int w1 = std::make_pair (rtl, MAX_MODE_INT);
>> +int prec = MIN (wi::min_precision (w1, UNSIGNED),
>> +(unsigned int)CONST_WIDE_INT_NUNITS (rtl) * 
>> HOST_BITS_PER_WIDE_INT);
>> +wide_int w = wide_int::from (w1, prec, UNSIGNED);
>> +add_AT_wide (die, DW_AT_const_value, w);
>> +  }
>>   return true;
>> 
>> case CONST_DOUBLE:
> 
> Setting the precision based on CONST_WIDE_INT_NUNITS means that
> we might end up with two different precisions for two values of
> the same variable.  E.g. for a 192-bit type, 1<<64 would be given
> a precision of 128 (because it needs two HWIs to store) but 1<<128
> would be given a precision of 192 (because it needs three HWIs to store).
> We could then abort when comparing them for equality, since equality
> needs both integers to have the same precision.  E.g. from same_dw_val_p:
> 
>case dw_val_class_wide_int:
>  return *v1->v.val_wide == *v2->v.val_wide;

Yeah, seems like we should have a v1.prec == v2.prec && on that.  The bad news, 
there are four of them that are like this.  The good news, 3 of them are 
location operands, and I don’t think they can hit for a very long time.  I 
think this is an oversight from the double_int version of the code where we 
just check the 128 bits for equality.  We can see if Richard wants to weigh in. 
 I think I’d just pre-approve the change, though, I think a helper to perform 
mixed equality testing would be the way to go as there are 4 of them, and I 
pretty sure they should all use the mixed version.  Though, maybe the location 
list versions are never mixed.  If they aren’t, then there is only 1 client, 
so, I’d just do the precision test inline.  Anyone able to comment on the 
location list aspect of this?

Re: [PATCH 1/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-04 Thread Evandro Menezes

Please, ignore the previous patch.  This is the intended patch.

Sorry.

--
Evandro Menezes

On 11/04/2015 05:18 PM, Evandro Menezes wrote:

This patch adds extra tuning information about AArch64 targets:

 * Maximum number of case values before resorting to a jump table
   The default values assumed independently of the specific backends
   may be rather low for modern processors, which sport quite efficient
   direct branch prediction, whereas indirect branch prediction is
   still typically not so efficient.  This value may be specifically
   set for a processor or left at zero to use the default values.
 * L1 cache line size
   The auto-prefetcher uses this information when emitting software
   prefetch insns.

Please, commit if it's alright.

Thank you,



diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 81792bc..ecf4685 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -195,6 +195,9 @@ struct tune_params
   int vec_reassoc_width;
   int min_div_recip_mul_sf;
   int min_div_recip_mul_df;
+  int max_case_values; /* Case values threshold; or 0 for the default.  */
+
+  int cache_line_size; /* Cache line size; or 0 for the default.  */
 
 /* An enum specifying how to take into account CPU autoprefetch capabilities
during instruction scheduling:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..e7f1c07 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -355,6 +355,8 @@ static const struct tune_params generic_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -378,6 +380,8 @@ static const struct tune_params cortexa53_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -401,6 +405,8 @@ static const struct tune_params cortexa57_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)	/* tune_flags.  */
 };
@@ -424,6 +430,8 @@ static const struct tune_params cortexa72_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -446,6 +454,8 @@ static const struct tune_params thunderx_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -468,6 +478,8 @@ static const struct tune_params xgene1_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -3242,6 +3254,20 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
   return aarch64_tls_referenced_p (x);
 }
 
+/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
+
+static unsigned int
+aarch64_case_values_threshold (void)
+{
+  /* Use the specified limit for the number of cases before using jump
+ tables at higher optimization levels.  */
+  if (optimize > 2
+  && selected_cpu->tune->max_case_values != 0)
+return selected_cpu->tune->max_case_values;
+  else
+return default_case_values_threshold ();
+}
+
 /* Return true if register REGNO is a valid index register.
STRICT_P is true if REG_OK_STRICT is in effect.  */
 
@@ -7672,6 +7698,13 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			 opts->x_param_values,
 			 global_options_set.x_param_values);
 
+  /* Set the L1 cache line size.  */
+  if (selected_cpu->tune->cache_line_size != 0)
+maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
+			   selected_cpu->tune->cache_line_size,
+			   opts->x_param_values,
+			   global_options_set.x_param_values);
+
   aarch64_override_options_after_change_1 (opts);
 }
 
@@ -13385,6 +13418,7 @@ aarch64_promoted_type (const_tree t)
 return float_type_node;
   return NULL_TREE;
 }
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -13432,6 +1

[PATCH 1/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-04 Thread Evandro Menezes

This patch adds extra tuning information about AArch64 targets:

 * Maximum number of case values before resorting to a jump table
   The default values assumed independently of the specific backends
   may be rather low for modern processors, which sport quite efficient
   direct branch prediction, whereas indirect branch prediction is
   still typically not so efficient.  This value may be specifically
   set for a processor or left at zero to use the default values.
 * L1 cache line size
   The auto-prefetcher uses this information when emitting software
   prefetch insns.

Please, commit if it's alright.

Thank you,

--
Evandro Menezes


diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 0ab1ca8..66be417 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -43,7 +43,7 @@
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
-AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
+AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, "0x53", "0x001")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000")
 
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 81792bc..ecf4685 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -195,6 +195,9 @@ struct tune_params
   int vec_reassoc_width;
   int min_div_recip_mul_sf;
   int min_div_recip_mul_df;
+  int max_case_values; /* Case values threshold; or 0 for the default.  */
+
+  int cache_line_size; /* Cache line size; or 0 for the default.  */
 
 /* An enum specifying how to take into account CPU autoprefetch capabilities
during instruction scheduling:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..e7f1c07 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -355,6 +355,8 @@ static const struct tune_params generic_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -378,6 +380,8 @@ static const struct tune_params cortexa53_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -401,6 +405,8 @@ static const struct tune_params cortexa57_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)	/* tune_flags.  */
 };
@@ -424,6 +430,8 @@ static const struct tune_params cortexa72_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -446,6 +454,8 @@ static const struct tune_params thunderx_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -468,6 +478,8 @@ static const struct tune_params xgene1_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
@@ -3242,6 +3254,20 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
   return aarch64_tls_referenced_p (x);
 }
 
+/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
+
+static unsigned int
+aarch64_case_values_threshold (void)
+{
+  /* Use the specified limit for the number of cases before using jump
+ tables at higher 

[PATCH 0/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-04 Thread Evandro Menezes
Following the suggestions to add the support for the Exynos M1 models, 
the following series of patches are broken down into:


 * add more target specific tuning data
 * add heuristics tuning
 * add the Exynos M1 cost model
 * add the Exynos M1 scheduling model

Thank you,

--
Evandro Menezes




Re: [PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-04 Thread Jim Wilson
On 11/04/2015 01:45 AM, Jiong Wang wrote:
> So as Jim Wilson commented on the bugzilla, instead of "return !strict",
> we need to only do the check if strict be true, and only does rejection
> which means return FALSE, for all other cases, we need to go through
> those normal checks below.

I was just about to submit the same patch myself; my testsuite run
finished last night.  This testsuite run was with a toolchain configured
"--with-arch=armv8-a --with-fpu=neon-fp-armv8 --with-float=hard
--with-mode=thumb --enable-languages=c,c++".  I see 10 fewer unexpected
failures on the gcc testsuite with the patch, and no changes to the
other testsuite results.

Jim



Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 10:09 AM, Jason Merrill wrote:

> A single function is better, to avoid unnecessary code duplication.

Thanks. I've applied this patch to trunk.

Cesar

2015-11-04  Cesar Philippidis  

	gcc/cp/
	* (cp_parser_oacc_single_int_clause): New function.
	(cp_parser_oacc_clause_vector_length): Delete.
	(cp_parser_omp_clause_num_gangs): Delete.
	(cp_parser_omp_clause_num_workers): Delete.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_single_int_clause
	for num_gangs, num_workers and vector_length.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 12452e6..4f6cd2d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29590,6 +29590,39 @@ cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
   return c;
 }
 
+ /* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
+
+static tree
+cp_parser_oacc_single_int_clause (cp_parser *parser, omp_clause_code code,
+  const char *str, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+return list;
+
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
+
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+{
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
+  return list;
+}
+
+  check_no_duplicate_clause (list, code, str, loc);
+
+  tree c = build_omp_clause (loc, code);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -29713,45 +29746,6 @@ cp_parser_oacc_shape_clause (cp_parser *parser, omp_clause_code kind,
   return list;
 }
 
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
-
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-{
-  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenACC 2.0
Parse wait clause or directive parameters.  */
 
@@ -30130,42 +30124,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenMP 2.5:
num_threads ( expression ) */
 
@@ -30374,43 +30332,6 @@ cp_parser_omp_clause_defaultmap (cp_parser *parser, tree list,
   return list;
 }
 
-/* OpenACC:
-   num_workers ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_workers (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_WORKERS, "num_gangs",
-l

Re: [PATCH] Fix vms targets

2015-11-04 Thread Jeff Law

On 11/04/2015 08:51 AM, Andrew MacLeod wrote:

On 11/02/2015 12:28 AM, Jeff Law wrote:


The header reduction didn't seem to handle the vms targets correctly.
This reverts that part of Andrew's patch which allows the alpha,
alpha64 and ia64 vms targets to build again.

Installed on the trunk.  That covers all the fallout from standard
builds of config-list.mk that I'm aware of.


how weird.. really.  I ran all config-list.mk and saw no failures...
huh.   maybe because I only ran c/c++ instead of 'all' to save time? I'm
still surprised.
It was a link failure with a reference from dwarf2out.o that should have 
been satisfied by vmsdbgout.o IIRC.


As why I saw it and not you.  I have no clue as I'd expect it to have 
failed linking cc1 or cc1plus.


Jeff


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-11-04 Thread Florian Weimer
On 11/04/2015 02:13 PM, Jakub Jelinek wrote:
> On Mon, Jul 06, 2015 at 05:38:35PM +0100, Alan Lawrence wrote:
>> Trying to push these now (svn!), patch 2 is going first.
>>
>> I realize my second iteration of patch 1/2, dropped the testcases from the
>> first version. Okay to include those as per
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00215.html ?
> 
> FYI, it seems that (most likely) the PR65956 changes on gcc-5-branch
> broke libgnat ABI compatibility on arm - it seems that getsubs.adb
> from macrosub proglet (and others) are during make check compiled/linked
> with system gnatmake/gcc, but the program is run at runtime
> against the new libgnat-5.so.  If I run it manually against
> system libgnat, it works, otherwise it hangs, when Fill_Table from
> getsubs.adb calls Get_Line, and indeed it looks like the argument passing
> for Get_Line changed and on the callee side it thinks Item (which is 400
> chars string) has random (and in the hanging case negative) number of chars
> in it.

The patch looks at TYPE_MAIN_VARIANT without checking first if the type
has any qualifiers:

+  if (!AGGREGATE_TYPE_P (type))
+return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) > PARM_BOUNDARY;

I'm not sure if this is valid, and what happens here if the type refers
to a fat pointer type generated by the Ada front end.

Florian


Re: [PATCH], Add power9 support to GCC, patch #1

2015-11-04 Thread Michael Meissner
On Wed, Nov 04, 2015 at 03:15:53PM -0600, Segher Boessenkool wrote:
> Hi,
> 
> Some minor things...
> 
> On Tue, Nov 03, 2015 at 03:29:11PM -0500, Michael Meissner wrote:
> > * config/rs6000/rs6000.opt (-mfusion-toc): Add new switches for
> > ISA 3.0 (power9).
> 
> "-mtoc-fusion" sounds more natural, and is more in line with the other
> switches I think.

That's reasonable.  At present, -mfusion-toc is not documented, as it was
intended to be a debug switch.

David, do you have an opinion one way or the other?

> > +  /* ISA 2.08 vector instructions include ISA 2.07.  */
> 
> ISA 3.0

Thanks for catching that.  I missed a few places that were written earlier
before we decided the new ISA would be 3.0 instead of 2.8.  I'll make those
changes before submitting.

> > +@item -mmodulo
> > +@itemx -mno-modulo
> > +@opindex mmodulo
> > +@opindex mno-module
> > +Generate code that uses (does not use) the ISA 2.08 integer modulo
> > +instructions.  The @option{-mmodulo} option is enabled by default
> > +with the @option{-mcpu=power9} option.
> 
> Again.  I think it was just these two, but please check.
> 
> > +@item -mpower9-fusion
> > +@itemx -mno-power9-fusion
> > +@opindex mpower9-fusion
> > +@opindex mno-power9-fusion
> > +Generate code that keeps (does not keeps) some operations adjacent so
> > +that the instructions can be fused together on power9 and later
> > +processors.
> 
> > +@item -mpower9-vector
> > +@itemx -mno-power9-vector
> > +@opindex mpower9-vector
> > +@opindex mno-power9-vector
> > +Generate code that uses (does not use) the vector and scalar
> > +instructions that were added in version 2.07 of the PowerPC ISA.  Also
> > +enable the use of built-in functions that allow more direct access to
> > +the vector instructions.
> 
> 3.0 here as well?

I only found 3 references to 2.08 in the patch.

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add power9 support to GCC, patch #1

2015-11-04 Thread Segher Boessenkool
Hi,

Some minor things...

On Tue, Nov 03, 2015 at 03:29:11PM -0500, Michael Meissner wrote:
>   * config/rs6000/rs6000.opt (-mfusion-toc): Add new switches for
>   ISA 3.0 (power9).

"-mtoc-fusion" sounds more natural, and is more in line with the other
switches I think.

> +  /* ISA 2.08 vector instructions include ISA 2.07.  */

ISA 3.0

> +@item -mmodulo
> +@itemx -mno-modulo
> +@opindex mmodulo
> +@opindex mno-module
> +Generate code that uses (does not use) the ISA 2.08 integer modulo
> +instructions.  The @option{-mmodulo} option is enabled by default
> +with the @option{-mcpu=power9} option.

Again.  I think it was just these two, but please check.

> +@item -mpower9-fusion
> +@itemx -mno-power9-fusion
> +@opindex mpower9-fusion
> +@opindex mno-power9-fusion
> +Generate code that keeps (does not keeps) some operations adjacent so
> +that the instructions can be fused together on power9 and later
> +processors.

> +@item -mpower9-vector
> +@itemx -mno-power9-vector
> +@opindex mpower9-vector
> +@opindex mno-power9-vector
> +Generate code that uses (does not use) the vector and scalar
> +instructions that were added in version 2.07 of the PowerPC ISA.  Also
> +enable the use of built-in functions that allow more direct access to
> +the vector instructions.

3.0 here as well?


Segher


Re: [PATCH c/c++] use explicit locations for some warnings in c-pragma.c

2015-11-04 Thread Manuel López-Ibáñez
On 4 November 2015 at 09:45, Mike Stump  wrote:
> in the top of the tree.  This is bad as the same line appears in a PASS: and 
> an XFAIL:.  Each test case should be unique.  Should it be updated to 64?

I think it is sufficient to change it to:

/* { dg-warning "24:missing" "wrong column" { xfail *-*-* }  2 } */

This dg-warning is there to show that the column number is wrong and
tell whoever fixes this that there is already a test that only needs
updating. Changing 24 to 64 defeats the purpose of having it in the
first place.

Cheers,

Manuel.


Remove obsolete openacc reduction code

2015-11-04 Thread Nathan Sidwell
I've committed this patch to trunk as obvious.  I'd missed some code obsoleted 
by the new reduction implementation.  This code was simply creating a splay 
tree, not adding anything to it and then processing the splay tree.  Nuked from 
orbit.


nathan
2015-11-04  Nathan Sidwell  

	* omp-low.c (struct omp_context): Remove reduction_map field.
	(lookup_oacc_reduction, maybe_lookup_oacc_reduction): Delete.
	(new_omp_context, delete_omp_context, scan_omp_target): Remove
	reduction_map handling.
	(lower_omp_target): Remove obsolete openacc reduction handling.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 229779)
+++ gcc/omp-low.c	(working copy)
@@ -169,11 +169,6 @@ struct omp_context
  construct.  In the case of a parallel, this is in the child function.  */
   tree block_vars;
 
-  /* A map of reduction pointer variables.  For accelerators, each
- reduction variable is replaced with an array.  Each thread, in turn,
- is assigned to a slot on that array.  */
-  splay_tree reduction_map;
-
   /* Label to which GOMP_cancel{,llation_point} and explicit and implicit
  barriers should jump to during omplower pass.  */
   tree cancel_label;
@@ -1090,23 +1085,6 @@ maybe_lookup_field (tree var, omp_contex
   return maybe_lookup_field ((splay_tree_key) var, ctx);
 }
 
-static inline tree
-lookup_oacc_reduction (const char *id, omp_context *ctx)
-{
-  splay_tree_node n;
-  n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) id);
-  return (tree) n->value;
-}
-
-static inline tree
-maybe_lookup_oacc_reduction (tree var, omp_context *ctx)
-{
-  splay_tree_node n = NULL;
-  if (ctx->reduction_map)
-n = splay_tree_lookup (ctx->reduction_map, (splay_tree_key) var);
-  return n ? (tree) n->value : NULL_TREE;
-}
-
 /* Return true if DECL should be copied by pointer.  SHARED_CTX is
the parallel context if DECL is to be shared.  */
 
@@ -1667,7 +1645,6 @@ new_omp_context (gimple *stmt, omp_conte
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
   ctx->depth = outer_ctx->depth + 1;
-  ctx->reduction_map = outer_ctx->reduction_map;
 }
   else
 {
@@ -1740,13 +1717,6 @@ delete_omp_context (splay_tree_value val
 splay_tree_delete (ctx->field_map);
   if (ctx->sfield_map)
 splay_tree_delete (ctx->sfield_map);
-  /* Reduction map is copied to nested contexts, so only delete it in the
- owner.  */
-  if (ctx->reduction_map
-  && gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
-  && is_gimple_omp_offloaded (ctx->stmt)
-  && is_gimple_omp_oacc (ctx->stmt))
-splay_tree_delete (ctx->reduction_map);
 
   /* We hijacked DECL_ABSTRACT_ORIGIN earlier.  We need to clear it before
  it produces corrupt debug information.  */
@@ -3077,10 +3047,6 @@ scan_omp_target (gomp_target *stmt, omp_
   TYPE_ARTIFICIAL (ctx->record_type) = 1;
   if (offloaded)
 {
-  if (is_gimple_omp_oacc (stmt))
-	ctx->reduction_map = splay_tree_new (splay_tree_compare_pointers,
-	 0, 0);
-
   create_omp_child_function (ctx, false);
   gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
 }
@@ -14549,7 +14515,7 @@ lower_omp_target (gimple_stmt_iterator *
   tree child_fn, t, c;
   gomp_target *stmt = as_a  (gsi_stmt (*gsi_p));
   gbind *tgt_bind, *bind, *dep_bind = NULL;
-  gimple_seq tgt_body, olist, ilist, orlist, irlist, new_body;
+  gimple_seq tgt_body, olist, ilist, new_body;
   location_t loc = gimple_location (stmt);
   bool offloaded, data_region;
   unsigned int map_cnt = 0;
@@ -14602,9 +14568,6 @@ lower_omp_target (gimple_stmt_iterator *
 
   push_gimplify_context ();
 
-  irlist = NULL;
-  orlist = NULL;
-
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
 switch (OMP_CLAUSE_CODE (c))
   {
@@ -14900,13 +14863,8 @@ lower_omp_target (gimple_stmt_iterator *
 	ctx);
 		else
 		  x = build_sender_ref (ovar, ctx);
-		if (maybe_lookup_oacc_reduction (var, ctx))
-		  {
-		gcc_checking_assert (offloaded
-	 && is_gimple_omp_oacc (stmt));
-		gimplify_assign (x, var, &ilist);
-		  }
-		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+
+		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 			 && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
 			 && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
 			 && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
@@ -15553,11 +15511,9 @@ lower_omp_target (gimple_stmt_iterator *
 			tgt_bind ? gimple_bind_block (tgt_bind)
  : NULL_TREE);
   gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true);
-  gimple_bind_add_seq (bind, irlist);
   gimple_bind_add_seq (bind, ilist);
   gimple_bind_add_stmt (bind, stmt);
   gimple_bind_add_seq (bind, olist);
-  gimple_bind_add_seq (bind, orlist);
 
   pop_gimplify_context (NULL);
 


OpenaCC dimension checking

2015-11-04 Thread Nathan Sidwell
Now the core of the execution model is committed, I've applied this patch to 
enable the nvptx dimension checking.  We force the vector size to be 32 in all 
cases, and check the worker size is not above 32.  Warnings are given if the 
user specifies an unacceptable dimension.


This patch  exposed some unacceptable testcases which I've modified to specify 
an acceptable vector length.   I also took the opportunity to fix up their 
reduction variable copying, which is only working right now because we default 
to copy, rather than firstprvate (that's the next patch for review).


Also a new testcase to exercise the error handling.

Committed to trunk.

nathan
2015-11-04  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Add checking.

	libgomp/
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Fix dimensions
	and reduction copy.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: New.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 229771)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -3472,8 +3472,29 @@ nvptx_goacc_validate_dims (tree ARG_UNUS
 {
   bool changed = false;
 
-  /* TODO: Leave dimensions unaltered.  Reductions need
- porting before filtering dimensions makes sense.  */
+  /* The vector size must be 32, unless this is a SEQ routine.  */
+  if (fn_level <= GOMP_DIM_VECTOR
+  && dims[GOMP_DIM_VECTOR] != PTX_VECTOR_LENGTH)
+{
+  if (dims[GOMP_DIM_VECTOR] >= 0 && fn_level < 0)
+	warning_at (DECL_SOURCE_LOCATION (decl), 0,
+		dims[GOMP_DIM_VECTOR]
+		? "using vector_length (%d), ignoring %d"
+		: "using vector_length (%d), ignoring runtime setting",
+		PTX_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]);
+  dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
+  changed = true;
+}
+
+  /* Check the num workers is not too large.  */
+  if (dims[GOMP_DIM_WORKER] > PTX_WORKER_LENGTH)
+{
+  warning_at (DECL_SOURCE_LOCATION (decl), 0,
+		  "using num_workers (%d), ignoring %d",
+		  PTX_WORKER_LENGTH, dims[GOMP_DIM_WORKER]);
+  dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+  changed = true;
+}
 
   return changed;
 }
Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90
===
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90	(revision 229771)
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-1.f90	(working copy)
@@ -5,7 +5,7 @@
 program reduction_1
   implicit none
 
-  integer, parameter:: n = 10, vl = 2
+  integer, parameter:: n = 10, vl = 32
   integer   :: i, vresult, result
   logical   :: lresult, lvresult
   integer, dimension (n) :: array
@@ -19,7 +19,7 @@ program reduction_1
 
   ! '+' reductions
 
-  !$acc parallel vector_length(vl) num_gangs(1)
+  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
   !$acc loop reduction(+:result)
   do i = 1, n
  result = result + array(i)
@@ -38,7 +38,7 @@ program reduction_1
 
   ! '*' reductions
 
-  !$acc parallel vector_length(vl) num_gangs(1)
+  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
   !$acc loop reduction(*:result)
   do i = 1, n
  result = result * array(i)
@@ -57,7 +57,7 @@ program reduction_1
 
   ! 'max' reductions
 
-  !$acc parallel vector_length(vl) num_gangs(1)
+  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
   !$acc loop reduction(max:result)
   do i = 1, n
  result = max (result, array(i))
@@ -76,7 +76,7 @@ program reduction_1
 
   ! 'min' reductions
 
-  !$acc parallel vector_length(vl) num_gangs(1)
+  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
   !$acc loop reduction(min:result)
   do i = 1, n
  result = min (result, array(i))
@@ -95,7 +95,7 @@ program reduction_1
 
   ! 'iand' reductions
 
-  !$acc parallel vector_length(vl) num_gangs(1)
+  !$acc parallel vector_length(vl) num_gangs(1) copy(result)
   !$acc loop reduction(iand:result)
   do i = 1, n
  result = iand (result, array(i))
@@ -114,7 +114,7 @@ program reduction_1
 
   ! 'ior' reductions
 
-  !$acc parallel

Re: [ping] Fix PR debug/66728

2015-11-04 Thread Richard Sandiford
Mike Stump  writes:
> Index: dwarf2out.c
> ===
> --- dwarf2out.c   (revision 229720)
> +++ dwarf2out.c   (working copy)
> @@ -15593,8 +15593,13 @@
>return true;
>  
>  case CONST_WIDE_INT:
> -  add_AT_wide (die, DW_AT_const_value,
> -std::make_pair (rtl, GET_MODE (rtl)));
> +  {
> + wide_int w1 = std::make_pair (rtl, MAX_MODE_INT);
> + int prec = MIN (wi::min_precision (w1, UNSIGNED),
> + (unsigned int)CONST_WIDE_INT_NUNITS (rtl) * 
> HOST_BITS_PER_WIDE_INT);
> + wide_int w = wide_int::from (w1, prec, UNSIGNED);
> + add_AT_wide (die, DW_AT_const_value, w);
> +  }
>return true;
>  
>  case CONST_DOUBLE:

Setting the precision based on CONST_WIDE_INT_NUNITS means that
we might end up with two different precisions for two values of
the same variable.  E.g. for a 192-bit type, 1<<64 would be given
a precision of 128 (because it needs two HWIs to store) but 1<<128
would be given a precision of 192 (because it needs three HWIs to store).
We could then abort when comparing them for equality, since equality
needs both integers to have the same precision.  E.g. from same_dw_val_p:

case dw_val_class_wide_int:
  return *v1->v.val_wide == *v2->v.val_wide;

Thanks,
Richard



Re: [PATCH 10/10] Compress short ranges into source_location

2015-11-04 Thread Dodji Seketeli
[...]

> diff --git a/libcpp/line-map.c b/libcpp/line-map.c

[...]

> +
> +  /* Any ordinary locations ought to be "pure" at this point: no
> + compressed ranges.  */
> +  linemap_assert (locus < RESERVED_LOCATION_COUNT
> +   || locus >= LINE_MAP_MAX_LOCATION_WITH_COLS
> +   || locus >= LINEMAPS_MACRO_LOWEST_LOCATION (set)
> +   || pure_location_p (set, locus));

Just for my own education, why aren't the tests

locus < RESERVED_LOCATION_COUNT
|| locus >= LINE_MAP_MAX_LOCATION_WITH_COLS
|| locus >= LINEMAPS_MACRO_LOWEST_LOCATION (set)

not part of pure_location_p() ?  I mean, would it make sense to say that
a locus that that satisfies that condition is pure?

By the way, I like this great piece of code of yours, kudos!

Cheers,

-- 
Dodji


[PATCH] remove parameter_rename_map

2015-11-04 Thread Sebastian Pop
This map was used in the transition to the new scop detection: with the new scop
detection, we do not need this map anymore.

   * graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_ast_expr_id):
   Remove use of parameter_rename_map.
   (copy_def): Remove.
   (copy_internal_parameters): Remove.
   (graphite_regenerate_ast_isl): Remove call to copy_internal_parameters.
   * sese.c (new_sese_info): Do not initialize parameter_rename_map.
   (free_sese_info): Do not free parameter_rename_map.
   (set_rename): Do not use parameter_rename_map.
   (rename_uses): Update call to set_rename.
   (graphite_copy_stmts_from_block): Do not use parameter_rename_map.
   * sese.h (parameter_rename_map_t): Remove.
   (struct sese_info_t): Remove field parameter_rename_map.
---
 gcc/graphite-isl-ast-to-gimple.c | 73 +---
 gcc/sese.c   | 44 ++--
 gcc/sese.h   |  5 ---
 3 files changed, 4 insertions(+), 118 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 975e106..628fc90 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -288,11 +288,7 @@ gcc_expression_from_isl_ast_expr_id (tree type,
  "Could not map isl_id to tree expression");
   isl_ast_expr_free (expr_id);
   tree t = res->second;
-  tree *val = region->parameter_rename_map->get(t);
-
-  if (!val)
-   val = &t;
-  return fold_convert (type, *val);
+  return fold_convert (type, t);
 }
 
 /* Converts an isl_ast_expr_int expression E to a GCC expression tree of
@@ -1089,70 +1085,6 @@ scop_to_isl_ast (scop_p scop, ivs_params &ip)
   return ast_isl;
 }
 
-/* Copy def from sese REGION to the newly created TO_REGION. TR is defined by
-   DEF_STMT. GSI points to entry basic block of the TO_REGION.  */
-
-static void
-copy_def (tree tr, gimple *def_stmt, sese_info_p region, sese_info_p to_region,
- gimple_stmt_iterator *gsi)
-{
-  if (!defined_in_sese_p (tr, region->region))
-return;
-
-  ssa_op_iter iter;
-  use_operand_p use_p;
-  FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
-{
-  tree use_tr = USE_FROM_PTR (use_p);
-
-  /* Do not copy parameters that have been generated in the header of the
-scop.  */
-  if (region->parameter_rename_map->get(use_tr))
-   continue;
-
-  gimple *def_of_use = SSA_NAME_DEF_STMT (use_tr);
-  if (!def_of_use)
-   continue;
-
-  copy_def (use_tr, def_of_use, region, to_region, gsi);
-}
-
-  gimple *copy = gimple_copy (def_stmt);
-  gsi_insert_after (gsi, copy, GSI_NEW_STMT);
-
-  /* Create new names for all the definitions created by COPY and
- add replacement mappings for each new name.  */
-  def_operand_p def_p;
-  ssa_op_iter op_iter;
-  FOR_EACH_SSA_DEF_OPERAND (def_p, copy, op_iter, SSA_OP_ALL_DEFS)
-{
-  tree old_name = DEF_FROM_PTR (def_p);
-  tree new_name = create_new_def_for (old_name, copy, def_p);
-  region->parameter_rename_map->put(old_name, new_name);
-}
-
-  update_stmt (copy);
-}
-
-static void
-copy_internal_parameters (sese_info_p region, sese_info_p to_region)
-{
-  /* For all the parameters which definitino is in the if_region->false_region,
- insert code on true_region (if_region->true_region->entry). */
-
-  int i;
-  tree tr;
-  gimple_stmt_iterator gsi = gsi_start_bb(to_region->region.entry->dest);
-
-  FOR_EACH_VEC_ELT (region->params, i, tr)
-{
-  // If def is not in region.
-  gimple *def_stmt = SSA_NAME_DEF_STMT (tr);
-  if (def_stmt)
-   copy_def (tr, def_stmt, region, to_region, &gsi);
-}
-}
-
 /* GIMPLE Loop Generator: generates loops from STMT in GIMPLE form for
the given SCOP.  Return true if code generation succeeded.
 
@@ -1192,9 +1124,6 @@ graphite_regenerate_ast_isl (scop_p scop)
 
   context_loop = region->region.entry->src->loop_father;
 
-  /* Copy all the parameters which are defined in the region.  */
-  copy_internal_parameters(if_region->false_region, if_region->true_region);
-
   translate_isl_ast_to_gimple t(region);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   split_edge (e);
diff --git a/gcc/sese.c b/gcc/sese.c
index c176b8a..644c87c 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -259,7 +259,6 @@ new_sese_info (edge entry, edge exit)
   SESE_LOOPS (region) = BITMAP_ALLOC (NULL);
   SESE_LOOP_NEST (region).create (3);
   SESE_PARAMS (region).create (3);
-  region->parameter_rename_map = new parameter_rename_map_t;
   region->bbs.create (3);
 
   return region;
@@ -275,8 +274,6 @@ free_sese_info (sese_info_p region)
 
   SESE_PARAMS (region).release ();
   SESE_LOOP_NEST (region).release ();
-  delete region->parameter_rename_map;
-  region->parameter_rename_map = NULL;
 
   XDELETE (region);
 }
@@ -370,8 +367,7 @@ get_rename (rename_map_type *rename_map, tree old_name)
 /* Register in RENAME_MAP th

Re: [ping] Fix PR debug/66728

2015-11-04 Thread Mike Stump
On Nov 4, 2015, at 4:15 AM, Richard Biener  wrote:
> I wonder if we'll manage to to get mode_for_size return BLKmode
> in case of an original mode that was not of a size multiple of
> HOST_BITS_PER_WIDE_INT (and that's host dependent even…).

> We probably should use smallest_mode_for_size on a precision
> derived from the value

Once we want to go stomping around the value, we might as well just do 
everything.  I prefer this version over one that has a call to assert.  I 
thought about creating a helper function, but since there is only 1 client for 
it, and since it was only 4 lines, I didn’t create one.  I’d propose deferring 
creation until we have more clients.  The generated dwarf remains fixed, as 
expected.

> Please use gcc_checking_assert here.

Done.  My test suite run with the assert did finish, and on x86_64 linux, it 
was never hit.

Any other issues you can spot?


Index: dwarf2out.c
===
--- dwarf2out.c (revision 229720)
+++ dwarf2out.c (working copy)
@@ -15593,8 +15593,13 @@
   return true;
 
 case CONST_WIDE_INT:
-  add_AT_wide (die, DW_AT_const_value,
-  std::make_pair (rtl, GET_MODE (rtl)));
+  {
+   wide_int w1 = std::make_pair (rtl, MAX_MODE_INT);
+   int prec = MIN (wi::min_precision (w1, UNSIGNED),
+   (unsigned int)CONST_WIDE_INT_NUNITS (rtl) * 
HOST_BITS_PER_WIDE_INT);
+   wide_int w = wide_int::from (w1, prec, UNSIGNED);
+   add_AT_wide (die, DW_AT_const_value, w);
+  }
   return true;
 
 case CONST_DOUBLE:
Index: rtl.h
===
--- rtl.h   (revision 229720)
+++ rtl.h   (working copy)
@@ -2086,6 +2086,7 @@
 inline unsigned int
 wi::int_traits ::get_precision (const rtx_mode_t &x)
 {
+  gcc_checking_assert (x.second != BLKmode && x.second != VOIDmode);
   return GET_MODE_PRECISION (x.second);
 }
 



Re: [PATCH] Add support for ARM embedded multilibs

2015-11-04 Thread Jasmin J.
Hello Ramana!

> There are usually features on the embedded-X_X-branch used to create
> releases that may not be on an FSF release branch.
Not on the embedded-5 branch and as far as I analysed it, all changes of
embedded-4.9 branch are now at Trunk.

>> I would like to ask if  you have a copyright assignment on file with the
>> FSF
> So, you don't have one ? In which case it may make more sense for Tejas to
> deal with this given he can handle it under ARM's copyright assignment. If
> you make changes to this without a copyright assignment on file and submit
> it, it may be difficult to review this from a copyright position.
So shall I communicate with Tejas instead of gcc-patches?

On the other hand I just read
" Small changes can be accepted without a copyright disclaimer or a copyright
  assignment on file. "
And at the end, it is the same code as Terry's patch. But I will investigate
the things you mentioned in your first answer, if they are really required and
if no simplify the patch.

> Changing the way in which you build GCC is a real change to GCC that affects
> many developers.
Does it really? It is enabled only, if you use the "--with-multilib-list"
configure option. Without the option, gcc is build exactly the same.
Without the patch "--with-multilib-list" even to allowed for any ARM target.

> I think mapping all the remaining -mcpu=cortex-a* cores and -mcpu=cortex-m*
> cores in there would be a sensible first step.
THX for the hint.

BR
   Jasmin


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-04 Thread Jason Merrill

On 11/04/2015 12:11 PM, Martin Sebor wrote:

On 11/02/2015 07:40 PM, Jason Merrill wrote:

On 10/26/2015 09:48 PM, Martin Sebor wrote:

+  while (TREE_CODE (oper) == NOP_EXPR)
+oper = TREE_OPERAND (oper, 0);


This is STRIP_NOPS.


+ to placement new is not checked since it's unknownwhat it might


Missing space.


+  else if (TREE_CODE (oper) == ADDR_EXPR) {


The brace should go on its own line.


+  /* A possibly optimistic estimate Number of bytes available


Maybe "of the number"?


Thanks for the review. I've fixed the issues above in the latest
patch (attached).




+  /* When the referenced object is a member of a union, use the
size
+ of the entire union as the size of the buffer.  */


Why?  If we're accessing one union member, we should limit the allowed
space to the size of that member.


I followed the more permissive approach taken by _FORTIFY_SOURCE
where the size of the whole object is used (on the assumption that
we will eventually want to adopt the same mechanism here). For
instance, given:

   union U { char c; int i; } u;

GCC doesn't diagnose:

   memset (&u.c, 0, sizeof u);

so I didn't expect we'd want the following diagnosed either:

   new (&u.c) U ();

But if you think it's preferable to use the size of the member
for this iteration of the placement new warning I can change it.
Can you confirm?


There was a lot of discussion of C++ aliasing rules at the recent 
meeting; we really seem to be moving in the direction of being stricter 
about which union member is active.  So I think we do want to diagnose 
the new-expression above; the user should write new (&u) if that's what 
they mean.



+  if (bytes_avail <= abs (adjust))
+bytes_avail = 0;
+  else if (0 <= adjust)
+bytes_avail -= adjust;
+  else
+bytes_avail += adjust;


If adjust is negative, I would think that we would have returned already
because we were dealing with an offset from a pointer of unknown value.


Adjust is negative when the offset to a buffer of known size is
negative. For example:

 char buf [sizeof (int)];
 new (&buf [1] - 1) int;


OK, so because we're looking at the expression from the outside in, we 
first see the subtraction and adjust becomes -1, then we see the 
array_ref and adjust returns to 0.  We still don't have a negative 
adjust by the time we get to the quoted if/else.



It also seems that you're being careful to avoid bytes_avail going
negative, so I wonder why you have it signed and bytes_need unsigned.


+  warning_at (EXPR_LOC_OR_LOC (orig_oper, input_location),


Let's remember this location early on so you don't need orig_oper.


Done.

Martin




Re: [c++-delayed-folding] Introduce convert_to_real_nofold

2015-11-04 Thread Jason Merrill

On 11/04/2015 12:34 PM, Marek Polacek wrote:

On Wed, Nov 04, 2015 at 10:31:44AM -0500, Jason Merrill wrote:

Then let's do that rather than introduce maybe_fold.


Like so?



Bootstrapped/regtested on x86_64-linux, ok for branch?


Yes, thanks.


diff --git gcc/convert.c gcc/convert.c
index ec6ff37..9355f2b 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -119,17 +119,19 @@ convert_to_pointer_nofold (tree type, tree expr)
  /* Convert EXPR to some floating-point type TYPE.

 EXPR must be float, fixed-point, integer, or enumeral;
-   in other cases error is called.  */
+   in other cases error is called.  If FOLD_P is true, try to fold
+   the expression.  */

-tree
-convert_to_real (tree type, tree expr)
+static tree
+convert_to_real_1 (tree type, tree expr, bool fold_p)
  {
enum built_in_function fcode = builtin_mathfn_code (expr);
tree itype = TREE_TYPE (expr);
+  location_t loc = EXPR_LOCATION (expr);

if (TREE_CODE (expr) == COMPOUND_EXPR)
  {
-  tree t = convert_to_real (type, TREE_OPERAND (expr, 1));
+  tree t = convert_to_real_1 (type, TREE_OPERAND (expr, 1), fold_p);
if (t == TREE_OPERAND (expr, 1))
return expr;
return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
@@ -237,14 +239,13 @@ convert_to_real (tree type, tree expr)
  || TYPE_MODE (newtype) == TYPE_MODE (float_type_node)))
{
  tree fn = mathfn_built_in (newtype, fcode);
-
  if (fn)
- {
-   tree arg = fold (convert_to_real (newtype, arg0));
-   expr = build_call_expr (fn, 1, arg);
-   if (newtype == type)
- return expr;
- }
+   {
+ tree arg = convert_to_real_1 (newtype, arg0, fold_p);
+ expr = build_call_expr (fn, 1, arg);
+ if (newtype == type)
+   return expr;
+   }
}
}
default:
@@ -263,9 +264,11 @@ convert_to_real (tree type, tree expr)
  if (!flag_rounding_math
  && FLOAT_TYPE_P (itype)
  && TYPE_PRECISION (type) < TYPE_PRECISION (itype))
-   return build1 (TREE_CODE (expr), type,
-  fold (convert_to_real (type,
- TREE_OPERAND (expr, 0;
+   {
+ tree arg = convert_to_real_1 (type, TREE_OPERAND (expr, 0),
+   fold_p);
+ return build1 (TREE_CODE (expr), type, arg);
+   }
  break;
/* Convert (outertype)((innertype0)a+(innertype1)b)
   into ((newtype)a+(newtype)b) where newtype
@@ -301,8 +304,10 @@ convert_to_real (tree type, tree expr)
  || newtype == dfloat128_type_node)
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+convert_to_real_1 (newtype, arg0,
+   fold_p),
+convert_to_real_1 (newtype, arg1,
+   fold_p));
  if (newtype == type)
return expr;
  break;
@@ -341,8 +346,10 @@ convert_to_real (tree type, tree expr)
  && !excess_precision_type (newtype
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+convert_to_real_1 (newtype, arg0,
+   fold_p),
+convert_to_real_1 (newtype, arg1,
+   fold_p));
  if (newtype == type)
return expr;
}
@@ -373,20 +380,39 @@ convert_to_real (tree type, tree expr)

  case COMPLEX_TYPE:
return convert (type,
- fold_build1 (REALPART_EXPR,
-  TREE_TYPE (TREE_TYPE (expr)), expr));
+ maybe_fold_build1_loc (fold_p, loc, REALPART_EXPR,
+TREE_TYPE (TREE_TYPE (expr)),
+expr));

  case POINTER_TYPE:
  case REFERENCE_TYPE:
error ("pointer value used where a floating point value was expected");
-  return convert_to_real (type, integer_zero_node);
+  return convert_to_real_1 (type, integer_zero_node, fold_p);

  default:
error 

Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-11-04 Thread Jason Merrill

On 10/30/2015 05:21 PM, Cesar Philippidis wrote:

On 10/30/2015 10:05 AM, Jakub Jelinek wrote:

On Fri, Oct 30, 2015 at 07:42:39AM -0700, Cesar Philippidis wrote:



Another thing is what Jason as C++ maintainer wants, it is nice to get rid
of some code redundancies, on the other side the fact that there is one
function per non-terminal in the grammar is also quite nice property.
I know I've violated this a few times too.



That name had some legacy from the c FE in gomp-4_0-branch which the
function was inherited from. On one hand, it doesn't make sense to allow
negative integer values for those clauses, but at the same time, those
values aren't checked during scanning. Maybe it should be renamed
cp_parser_oacc_single_int_clause instead?


That is better.


If you like, I could make a more general
cp_parser_omp_generic_expression that has a scan_list argument so that
it can be used for both general expressions and assignment-expressions.
That way it can be used for both omp and oacc clauses of the form 'foo (
expression )'.


No, that will only confuse readers of the parser.  After all, the code to
parse an expression argument of a clause is not that large.
So, either cp_parser_oacc_single_int_clause or just keeping the old separate
parsing functions, just remove the cruft from those (testing the type,
using cp_parser_condition instead of cp_parser_assignment_expression) is ok
with me.  Please ping Jason on what he prefers from those two.


Jason, what's your preference here? Should I create a single function to
parser num_gangs, num_workers and vector_length since they all accept
the same type of argument or should I just correct the existing
functions as I did in the attached patch? Either one would be specific
to openacc.


A single function is better, to avoid unnecessary code duplication.

Jason




Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 02:24 AM, Jakub Jelinek wrote:
> On Tue, Nov 03, 2015 at 02:16:59PM -0800, Cesar Philippidis wrote:

>> +
>> +  do
>> +{
>> +  if (c_parser_next_token_is (parser, CPP_MULT))
>> +{
>> +  c_parser_consume_token (parser);
>> +  expr = integer_minus_one_node;
>> +}
>> +  else
> 
> Is this right?  If it is either * or (assignment) expression, then
> I'd expect to parse only CPP_MULT followed by CPP_CLOSE_PAREN
> or CPP_COMMA that way (C parser has 2 tokens look-ahead, so it should be
> fine), so that
> tile (a + b + c, *)
> is parsed as
> (a + b + c); -1
> and so is
> tile (*, a + b)
> as
> -1; (a + b)
> while
> tile (*a, *b)
> is
> *a; *b.
> 
> Guess the gang clause parsing that went into the trunk already has the
> same bug,
> gang (static: *)
> or
> gang (static: *, num: 5)
> should be special, but
> gang (static: *ptr)
> should be
> gang (static: (*ptr))

Thanks for catching that. I'll fix the tile and shape scanners in both
front ends.

>> diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
>> index c73dcd0..14d006b 100644
>> --- a/gcc/cp/semantics.c
>> +++ b/gcc/cp/semantics.c
>> @@ -6704,9 +6704,51 @@ finish_omp_clauses (tree clauses, bool allow_fields, 
>> bool declare_simd)
>>  case OMP_CLAUSE_DEFAULTMAP:
>>  case OMP_CLAUSE__CILK_FOR_COUNT_:
>>  case OMP_CLAUSE_AUTO:
>> +case OMP_CLAUSE_INDEPENDENT:
>>  case OMP_CLAUSE_SEQ:
>>break;
>>  
>> +case OMP_CLAUSE_TILE:
>> +  {
>> +tree list = OMP_CLAUSE_TILE_LIST (c);
>> +
>> +while (list)
>> +  {
>> +t = TREE_VALUE (list);
>> +
>> +if (t == error_mark_node)
>> +  remove = true;
>> +else if (!type_dependent_expression_p (t)
>> + && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>> +  {
>> +error ("% value must be integral");
>> +remove = true;
>> +  }
>> +else
>> +  {
>> +t = mark_rvalue_use (t);
>> +if (!processing_template_decl)
>> +  {
>> +t = maybe_constant_value (t);
>> +if (TREE_CODE (t) == INTEGER_CST
>> +&& tree_int_cst_sgn (t) != 1
>> +&& t != integer_minus_one_node)
>> +  {
>> +warning_at (OMP_CLAUSE_LOCATION (c), 0,
>> +"% value must be positive");
>> +t = integer_one_node;
>> +  }
>> +  }
>> +t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
>> +  }
>> +
>> +/* Update list item.  */
>> +TREE_VALUE (list) = t;
>> +list = TREE_CHAIN (list);
>> +  }
>> +  }
>> +  break;
> 
> Have you verified pt.c does the right thing when instantiating the
> OMP_CLAUSE_TILE clause (I mean primarily the TREE_VEC in there)?
> There really should be testcases for that.

I don't think we have any support for templates in trunk yet. Should I
add it to this patch, or should I address that in a follow up patch?

By the way, template support for num_gangs, num_worker and vector_length
depend on this patch
 because the
c++ front end is currently incorrectly trying to do type checking as it
scans those clauses.

>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index 03203c0..08b192d 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -6997,7 +6997,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
>> *pre_p,
>>  
>>  case OMP_CLAUSE_DEVICE_RESIDENT:
>>  case OMP_CLAUSE_USE_DEVICE:
>> -case OMP_CLAUSE_INDEPENDENT:
>>remove = true;
>>break;
>>  
>> @@ -7007,6 +7006,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
>> *pre_p,
>>  case OMP_CLAUSE_COLLAPSE:
>>  case OMP_CLAUSE_AUTO:
>>  case OMP_CLAUSE_SEQ:
>> +case OMP_CLAUSE_INDEPENDENT:
>>  case OMP_CLAUSE_MERGEABLE:
>>  case OMP_CLAUSE_PROC_BIND:
>>  case OMP_CLAUSE_SAFELEN:
>> @@ -7014,6 +7014,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
>> *pre_p,
>>  case OMP_CLAUSE_NOGROUP:
>>  case OMP_CLAUSE_THREADS:
>>  case OMP_CLAUSE_SIMD:
>> +case OMP_CLAUSE_TILE:
>>break;
> 
> No gimplification of the expressions in the tile clause?

Ultimately we're just ignoring it, but for completeness' sake I fixed this.

I'm still testing this patch, but something like this OK for trunk?

cesar
2015-11-04  Cesar Philippidis  
	Thomas Schwinge  
	James Norris  

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Add support for
	OMP_CLAUSE_TILE.  Update handling of OMP_CLAUSE_INDEPENDENT.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Add support for OMP_CLAUSE_TILE.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
	* tree-pretty-print

Re: [PATCH] [ARM] PR61551 RFC: Improve costs for NEON addressing modes

2015-11-04 Thread Charles Baylis
On 4 November 2015 at 08:05, Ramana Radhakrishnan
 wrote:
> Hi Charles,
>
> Sorry I missed this completely in my inbox.
>
> On 31/10/15 03:34, Charles Baylis wrote:
>> Hi Ramana,
>>
>> [revisiting https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01593.html]
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551
>>
>> This patch is an initial attempt to rework the ARM rtx costs to better
>> handle the costs of various addressing modes, in particular to remove
>> the incorrect large costs associated with post-indexed addressing in
>> NEON memory operations.
>>
>> This patch introduces per-core tables for the costs of using different
>> addressing modes for different access modes. I have retained the
>> original code so that the calculated costs can be compared. Currently,
>> the tables replicate the costs calculated by the original code, and a
>> debug assert is left in place.
>>
>> Obviously, a fair amount of clean up is needed before this can be
>> applied, but I would like a quick comment on the general approach to
>> check that I haven't completely missed the point before continuing.
>
> No you haven't missed the point - this is the direction I wanted this taken 
> in though not expecting this degree of detail.

OK, Thanks :)

>> +struct cbmem_cost_table
>> +{
>> +  enum access_type
>> +  {
>> +REG,
>> +POST_INCDEC,
>> +PRE_INCDEC,
>> +/*PRE_MODIFY,*/
>> +POST_MODIFY,
>> +PLUS,
>> +ACCESS_TYPE_LAST = PLUS
>> +  };
>> +  const int si[ACCESS_TYPE_LAST + 1];
>> +  const int di[ACCESS_TYPE_LAST + 1];
>> +  const int cdi[ACCESS_TYPE_LAST + 1];
>> +  const int sf[ACCESS_TYPE_LAST + 1];
>> +  const int df[ACCESS_TYPE_LAST + 1];
>> +  const int cdf[ACCESS_TYPE_LAST + 1];
>> +  const int blk[ACCESS_TYPE_LAST + 1];
>> +  const int vec64[ACCESS_TYPE_LAST + 1];
>> +  const int vec128[ACCESS_TYPE_LAST + 1];
>> +  const int vec192[ACCESS_TYPE_LAST + 1];
>> +  const int vec256[ACCESS_TYPE_LAST + 1];
>> +  const int vec384[ACCESS_TYPE_LAST + 1];
>> +  const int vec512[ACCESS_TYPE_LAST + 1];
>> +};
>> +
>>
>> After that, I will clean up the coding style, check for impact on the
>> AArch64 backend, remove the debug code and in a separate patch improve
>> the tuning for the vector modes.
>
> I think adding additional costs for zero / sign extension of registers would 
> be appropriate for the AArch64 backend. Further more I think Alan recently 
> had patches to change the use of vector modes to BLKmode in the AArch64 
> backend, so some of the vector costing might become interesting.

The aarch64 already has a mechanism for doing costs for those
operations in aarch64_address_cost(). Using BLKmode will certainly
make this difficult.

> If you can start turning this around quickly I'd like to keep the review 
> momentum going but it will need time and effort from a number of parties to 
> get this working. This is however likely to be a high impact change on the 
> backends as this is an invasive change and I'm not sure if it will meet the 
> Stage3 cutoff point.

I'll see what I can do. In the short term, the only part of the cost
model I want changed is the excessive costs for the pre/post-indexed
addressing on vector modes.

>> From b10c6dd7af1f5b9821946783ba9d96b08c751f2b Mon Sep 17 00:00:00 2001
>> From: Charles Baylis 
>> Date: Wed, 28 Oct 2015 18:48:16 +
>> Subject: [PATCH] WIP
>>
>> Change-Id: If349ffd7dbbe13a814be4a0d022382ddc8270973
>> ---
>>  gcc/config/arm/aarch-common-protos.h |  28 ++
>>  gcc/config/arm/aarch-cost-tables.h   | 328 +
>>  gcc/config/arm/arm.c | 677 
>> ++-
>>  3 files changed, 1023 insertions(+), 10 deletions(-)
>>
>> diff --git a/gcc/config/arm/aarch-common-protos.h 
>> b/gcc/config/arm/aarch-common-protos.h
>> index 348ae74..dae42d7 100644
>> --- a/gcc/config/arm/aarch-common-protos.h
>> +++ b/gcc/config/arm/aarch-common-protos.h
>> @@ -130,6 +130,33 @@ struct vector_cost_table
>>const int alu;
>>  };
>>
>> +struct cbmem_cost_table
>> +{
>> +  enum access_type
>> +  {
>> +REG,
>> +POST_INCDEC,
>> +PRE_INCDEC,
>> +/*PRE_MODIFY,*/
>> +POST_MODIFY,
>> +PLUS,
>> +ACCESS_TYPE_LAST = PLUS
>> +  };
>> +  const int si[ACCESS_TYPE_LAST + 1];
>> +  const int di[ACCESS_TYPE_LAST + 1];
>> +  const int cdi[ACCESS_TYPE_LAST + 1];
>> +  const int sf[ACCESS_TYPE_LAST + 1];
>> +  const int df[ACCESS_TYPE_LAST + 1];
>> +  const int cdf[ACCESS_TYPE_LAST + 1];
>> +  const int blk[ACCESS_TYPE_LAST + 1];
>> +  const int vec64[ACCESS_TYPE_LAST + 1];
>> +  const int vec128[ACCESS_TYPE_LAST + 1];
>> +  const int vec192[ACCESS_TYPE_LAST + 1];
>> +  const int vec256[ACCESS_TYPE_LAST + 1];
>> +  const int vec384[ACCESS_TYPE_LAST + 1];
>> +  const int vec512[ACCESS_TYPE_LAST + 1];
>> +};
>
>
>
>
> I was considering a single table for scalar integer , scalar fp and vector 
> modes mapping scalar fp and vector modes down to scalar integer modes in case 
> of soft float mode or in the absence of a vec

Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Jakub Jelinek
On Wed, Nov 04, 2015 at 09:39:50AM -0800, Cesar Philippidis wrote:
> On 11/04/2015 09:15 AM, Thomas Schwinge wrote:
> 
> >> --- a/gcc/fortran/trans-openmp.c
> >> +++ b/gcc/fortran/trans-openmp.c
> > 
> >> @@ -3449,16 +3478,28 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
> >>  sizeof (construct_clauses));
> >>loop_clauses.collapse = construct_clauses.collapse;
> >> [...]
> >> -  construct_clauses.collapse = 0;
> > 
> > Again I'm confused by this, why this is being removed, as earlier in
> > .
> 
> I'm not sure why, but gfc_trans_omp_do needs it. It's probably an openmp
> thing. If you look at gfc_trans_omp_do, you'll see that two sets of
> clauses are passed into it. code->ext.omp_clauses corresponds to the
> combined construct clauses and do_clauses are the filtered out ones.
> 
> So in order to get collapse to work as expected in combined loops, I
> can't zero out construct_clauses.collapse.

The middle end requires that for composite constructs (i.e. where one
source collapsed loop is split among multiple levels of parallelism)
all those loop constructs contain the same collapse clause, and that all
but the innermost loop construct contain NULLs on the init/cond/step etc.
vectors.

Jakub


Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Thomas Schwinge
Hi!

On Wed, 4 Nov 2015 18:21:36 +0100, Jakub Jelinek  wrote:
> On Wed, Nov 04, 2015 at 06:15:14PM +0100, Thomas Schwinge wrote:
> > > --- a/gcc/fortran/openmp.c
> > > +++ b/gcc/fortran/openmp.c
> > 
> > > @@ -3028,6 +3015,22 @@ resolve_omp_clauses (gfc_code *code, 
> > > gfc_omp_clauses *omp_clauses,
> > >   n->sym->mark = 1;
> > >  }
> > >  
> > > +  /* OpenACC reductions.  */
> > > +  if (openacc)
> > > +{
> > > +  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
> > > + n->sym->mark = 0;
> > 
> > Maybe I'm just confugsed, but if setting all these to zero here...
> > 
> > > +
> > > +  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
> > > + {
> > > +   if (n->sym->mark)
> > > + gfc_error ("Symbol %qs present on multiple clauses at %L",
> > > +n->sym->name, &n->where);
> > > +   else
> > > + n->sym->mark = 1;
> > 
> > ... won't this just always run into the "else" branch?
> 
> The point is to check if some symbol isn't present multiple times in
> reduction clause(s).  So the first loop clears the flag as it could have
> arbitrary values, and the second loop will diagnose an error if n->sym
> is present multiple times in the list.  reduction(+: a, b, a) and the like.
> In C/C++ FE we use bitmaps for this, in Fortran FE we have mark
> field for those purposes.

Thanks for the explanation -- I'd missed the fact that several "n" might
refer to the same "sym"...  ;-)


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 09:15 AM, Thomas Schwinge wrote:

>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
> 
>> @@ -3449,16 +3478,28 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>>sizeof (construct_clauses));
>>loop_clauses.collapse = construct_clauses.collapse;
>> [...]
>> -  construct_clauses.collapse = 0;
> 
> Again I'm confused by this, why this is being removed, as earlier in
> .

I'm not sure why, but gfc_trans_omp_do needs it. It's probably an openmp
thing. If you look at gfc_trans_omp_do, you'll see that two sets of
clauses are passed into it. code->ext.omp_clauses corresponds to the
combined construct clauses and do_clauses are the filtered out ones.

So in order to get collapse to work as expected in combined loops, I
can't zero out construct_clauses.collapse.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
> 
> I suggest you also merge the existing
> gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 into your new test case
> file (consistent naming, with the other combined-directives* files).

OK, but it depends on what type of things combined_loop.f90 is checking.
If it's scanning gimple, it may have to be a separate file.

>> @@ -0,0 +1,152 @@
>> +! Exercise combined OpenACC directives.
>> +
>> +! { dg-do compile }
>> +! { dg-options "-fopenacc -fdump-tree-gimple" }
>> +
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> What's still unimplemented here?  Please add a comment, or put the
> dg-prune-output directive next to the offending OpenACC directive, so
> we'll be sure to remove it later on.

I was still seeing those sorry messages. I'll put a comment on them.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
>> @@ -0,0 +1,363 @@
>> +! { dg-do compile }
>> +! { dg-additional-options "-fmax-errors=100" }
>> +
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> Likewise.
> 
>> +! { dg-prune-output "Error: work-sharing region" }
> 
> What's the intention of this?  If we're expecting this error, place
> dg-error directives where they belong?

Trunk is missing some acc loop nesting verification code in omp-low.c
that's present in gomp4. I'm not sure who's going to port that to trunk.
I'll add a comment in this test to remove it with the sorry messages
when appropriate.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-6.f95
>> @@ -0,0 +1,80 @@
>> +! { dg-do compile }
>> +! { dg-additional-options "-fmax-errors=100" }
>> +
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> Likewise.
> 
>> +! { dg-prune-output "Error: work-sharing region" }
> 
> Likewise.
> 
>> --- a/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
>> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
>> @@ -3,6 +3,9 @@
>>  
>>  ! test for tree-dump-original and spaces-commas
>>  
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> Likewise.
> 
>> +! { dg-prune-output "Error: work-sharing region" }
> 
> Likewise.
> 
>> --- a/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
>> +++ b/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
>> @@ -37,4 +37,3 @@ end program test
>>  
>>  ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 
>> "original" } } 
>>  ! { dg-final { scan-tree-dump-times "private\\(v\\)" 1 "original" } } 
>> -! { dg-final { scan-tree-dump-times "firstprivate\\(w\\)" 1 "original" } } 
> 
> Which of your source code changes does this change related to?

I think Nathan made this change because he found a bug in the test or
something. I just included this test because trunk should be capable to
handle it now.

Cesar



Re: [c++-delayed-folding] Introduce convert_to_real_nofold

2015-11-04 Thread Marek Polacek
On Wed, Nov 04, 2015 at 10:31:44AM -0500, Jason Merrill wrote:
> Then let's do that rather than introduce maybe_fold.

Like so?

Bootstrapped/regtested on x86_64-linux, ok for branch?

diff --git gcc/convert.c gcc/convert.c
index ec6ff37..9355f2b 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -119,17 +119,19 @@ convert_to_pointer_nofold (tree type, tree expr)
 /* Convert EXPR to some floating-point type TYPE.
 
EXPR must be float, fixed-point, integer, or enumeral;
-   in other cases error is called.  */
+   in other cases error is called.  If FOLD_P is true, try to fold
+   the expression.  */
 
-tree
-convert_to_real (tree type, tree expr)
+static tree
+convert_to_real_1 (tree type, tree expr, bool fold_p)
 {
   enum built_in_function fcode = builtin_mathfn_code (expr);
   tree itype = TREE_TYPE (expr);
+  location_t loc = EXPR_LOCATION (expr);
 
   if (TREE_CODE (expr) == COMPOUND_EXPR)
 {
-  tree t = convert_to_real (type, TREE_OPERAND (expr, 1));
+  tree t = convert_to_real_1 (type, TREE_OPERAND (expr, 1), fold_p);
   if (t == TREE_OPERAND (expr, 1))
return expr;
   return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
@@ -237,14 +239,13 @@ convert_to_real (tree type, tree expr)
  || TYPE_MODE (newtype) == TYPE_MODE (float_type_node)))
{
  tree fn = mathfn_built_in (newtype, fcode);
-
  if (fn)
- {
-   tree arg = fold (convert_to_real (newtype, arg0));
-   expr = build_call_expr (fn, 1, arg);
-   if (newtype == type)
- return expr;
- }
+   {
+ tree arg = convert_to_real_1 (newtype, arg0, fold_p);
+ expr = build_call_expr (fn, 1, arg);
+ if (newtype == type)
+   return expr;
+   }
}
}
default:
@@ -263,9 +264,11 @@ convert_to_real (tree type, tree expr)
  if (!flag_rounding_math
  && FLOAT_TYPE_P (itype)
  && TYPE_PRECISION (type) < TYPE_PRECISION (itype))
-   return build1 (TREE_CODE (expr), type,
-  fold (convert_to_real (type,
- TREE_OPERAND (expr, 0;
+   {
+ tree arg = convert_to_real_1 (type, TREE_OPERAND (expr, 0),
+   fold_p);
+ return build1 (TREE_CODE (expr), type, arg);
+   }
  break;
/* Convert (outertype)((innertype0)a+(innertype1)b)
   into ((newtype)a+(newtype)b) where newtype
@@ -301,8 +304,10 @@ convert_to_real (tree type, tree expr)
  || newtype == dfloat128_type_node)
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+convert_to_real_1 (newtype, arg0,
+   fold_p),
+convert_to_real_1 (newtype, arg1,
+   fold_p));
  if (newtype == type)
return expr;
  break;
@@ -341,8 +346,10 @@ convert_to_real (tree type, tree expr)
  && !excess_precision_type (newtype
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+convert_to_real_1 (newtype, arg0,
+   fold_p),
+convert_to_real_1 (newtype, arg1,
+   fold_p));
  if (newtype == type)
return expr;
}
@@ -373,20 +380,39 @@ convert_to_real (tree type, tree expr)
 
 case COMPLEX_TYPE:
   return convert (type,
- fold_build1 (REALPART_EXPR,
-  TREE_TYPE (TREE_TYPE (expr)), expr));
+ maybe_fold_build1_loc (fold_p, loc, REALPART_EXPR,
+TREE_TYPE (TREE_TYPE (expr)),
+expr));
 
 case POINTER_TYPE:
 case REFERENCE_TYPE:
   error ("pointer value used where a floating point value was expected");
-  return convert_to_real (type, integer_zero_node);
+  return convert_to_real_1 (type, integer_zero_node, fold_p);
 
 default:
   error ("aggregate value used where a float was expected");
-  return convert_

Re: OpenACC dimension range propagation optimization

2015-11-04 Thread Nathan Sidwell

On 11/04/15 05:26, Richard Biener wrote:

On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell  wrote:

Richard,



this all seems a little bit fragile and relying on implementation details?
Is the attribute always present?  Is the call argument always a constant
that fits in a HOST_WIDE_INT (or even int here)?  Are there always enough
'dims' in the tree list?  Is the 'dim' value always an INTEGER_CST that
fits a HOST_WIDE_INT (or even an int here)?




If so I'd like to see helper functions to hide these implementation details
from generic code like this.


Like this?

I've added two helper functions to omp-low.c, one to  get the internal fn arg 
number and the other to get a dimension value, given an axis number.  omp-low 
seemed the most appropriate point -- that's  where the dimension processing is, 
and the generation of these internal fn calls.


(Bernd, I'll fixup the dimension folding patch to use these calls before 
applying it.)


ok?

nathan
2015-11-04  Nathan Sidwell  

	* target.def (goacc.dim_limit): New hook.
	* targhooks.h (default_goacc_dim_limit): Declare.
	* doc/tm.texi.in (TARGET_GOACC_DIM_LIMIT): Add.
	* doc/tm.texi: Rebuilt.
	* omp-low.h (get_oacc_fn_dim_size, get_oacc_ifn_dim_arg): Declare.
	* omp-low.c (get_oacc_fn_dim_size, get_oacc_ifn_dim_arg): New.
	(default_goacc_dim_limit): New.
	* config/nvptx/nvptx.c (PTX_VECTOR_LENGTH, PTX_WORKER_LENGTH): New.
	(nvptx_goacc_dim_limit) New.
	(TARGET_GOACC_DIM_LIMIT): Override.
	* tree-vrp.c: Include omp-low.h, target.h.
	(extract_range_basic): Add handling for IFN_GOACC_DIM_SIZE &
	IFN_GOACC_DIM_POS.

Index: omp-low.c
===
--- omp-low.c	(revision 229757)
+++ omp-low.c	(working copy)
@@ -12095,6 +12095,41 @@ get_oacc_fn_attrib (tree fn)
   return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
 }
 
+/* Extract an oacc execution dimension from FN.  FN must be an
+   offloaded function or routine that has already had its execution
+   dimensions lowered to the target-specific values.  */
+
+int
+get_oacc_fn_dim_size (tree fn, int axis)
+{
+  tree attrs = get_oacc_fn_attrib (fn);
+  
+  gcc_assert (axis < GOMP_DIM_MAX);
+
+  tree dims = TREE_VALUE (attrs);
+  while (axis--)
+dims = TREE_CHAIN (dims);
+
+  int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+
+  return size;
+}
+
+/* Extract the dimension axis from an IFN_GOACC_DIM_POS or
+   IFN_GOACC_DIM_SIZE call.  */
+
+int
+get_oacc_ifn_dim_arg (const gimple *stmt)
+{
+  gcc_checking_assert (gimple_call_internal_fn (stmt) == IFN_GOACC_DIM_SIZE
+		   || gimple_call_internal_fn (stmt) == IFN_GOACC_DIM_POS);
+  tree arg = gimple_call_arg (stmt, 0);
+  HOST_WIDE_INT axis = TREE_INT_CST_LOW (arg);
+
+  gcc_checking_assert (axis >= 0 && axis < GOMP_DIM_MAX);
+  return (int) axis;
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -19383,6 +19418,18 @@ default_goacc_validate_dims (tree ARG_UN
   return changed;
 }
 
+/* Default dimension bound is unknown on accelerator and 1 on host. */
+
+int
+default_goacc_dim_limit (int ARG_UNUSED (axis))
+{
+#ifdef ACCEL_COMPILER
+  return 0;
+#else
+  return 1;
+#endif
+}
+
 namespace {
 
 const pass_data pass_data_oacc_device_lower =
Index: omp-low.h
===
--- omp-low.h	(revision 229757)
+++ omp-low.h	(working copy)
@@ -31,6 +31,8 @@ extern bool make_gimple_omp_edges (basic
 extern void omp_finish_file (void);
 extern tree omp_member_access_dummy_var (tree);
 extern tree get_oacc_fn_attrib (tree);
+extern int get_oacc_ifn_dim_arg (const gimple *);
+extern int get_oacc_fn_dim_size (tree, int);
 
 extern GTY(()) vec *offload_funcs;
 extern GTY(()) vec *offload_vars;
Index: targhooks.h
===
--- targhooks.h	(revision 229757)
+++ targhooks.h	(working copy)
@@ -110,6 +110,7 @@ extern void default_destroy_cost_data (v
 
 /* OpenACC hooks.  */
 extern bool default_goacc_validate_dims (tree, int [], int);
+extern int default_goacc_dim_limit (int);
 extern bool default_goacc_fork_join (gcall *, const int [], bool);
 
 /* These are here, and not in hooks.[ch], because not all users of
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 229757)
+++ doc/tm.texi	(working copy)
@@ -5777,6 +5777,11 @@ true, if changes have been made.  You mu
 provide dimensions larger than 1.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_GOACC_DIM_LIMIT (int @var{axis})
+This hook should return the maximum size of a particular dimension,
+or zero if unbounded.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_GOACC_FORK_JOIN (gcall *@var{call}, const int *@var{dims}, bool @var{is_fork})
 This hook can be used to convert IFN_GOACC_FORK and IFN_GOACC_JOIN
 function calls to target-specific gimple, or indicate whether they
Index: doc/tm.texi.in
==

Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Jakub Jelinek
On Wed, Nov 04, 2015 at 06:20:06PM +0100, Thomas Schwinge wrote:
> Hi Jakub!
> 
> On Wed, 4 Nov 2015 11:30:28 +0100, Jakub Jelinek  wrote:
> > >  gfc_match_oacc_update (void)
> > >  {
> > >gfc_omp_clauses *c;
> > > +  locus here = gfc_current_locus;
> > > +
> > >if (gfc_match_omp_clauses (&c, OACC_UPDATE_CLAUSES, false, false, true)
> > >!= MATCH_YES)
> > >  return MATCH_ERROR;
> > >  
> > > +  if (!c->lists[OMP_LIST_MAP])
> > > +{
> > > +  gfc_error ("% must contain at least one "
> > > +  "% or % clause at %L", &here);
> > 
> > There is no host/self clause I'd guess, so you should spell those
> > separately.
> 
> That's the same language as used in the C and C++ front ends; the host
> and self clauses are synonymous.

Then it should be %/% at least, or better
% or %.
Both in Fortran and in the C/C++ FEs.

Jakub


Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Jakub Jelinek
On Wed, Nov 04, 2015 at 06:15:14PM +0100, Thomas Schwinge wrote:
> > --- a/gcc/fortran/openmp.c
> > +++ b/gcc/fortran/openmp.c
> 
> > @@ -3028,6 +3015,22 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
> > *omp_clauses,
> > n->sym->mark = 1;
> >  }
> >  
> > +  /* OpenACC reductions.  */
> > +  if (openacc)
> > +{
> > +  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
> > +   n->sym->mark = 0;
> 
> Maybe I'm just confugsed, but if setting all these to zero here...
> 
> > +
> > +  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
> > +   {
> > + if (n->sym->mark)
> > +   gfc_error ("Symbol %qs present on multiple clauses at %L",
> > +  n->sym->name, &n->where);
> > + else
> > +   n->sym->mark = 1;
> 
> ... won't this just always run into the "else" branch?

The point is to check if some symbol isn't present multiple times in
reduction clause(s).  So the first loop clears the flag as it could have
arbitrary values, and the second loop will diagnose an error if n->sym
is present multiple times in the list.  reduction(+: a, b, a) and the like.
In C/C++ FE we use bitmaps for this, in Fortran FE we have mark
field for those purposes.

Jakub


Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Thomas Schwinge
Hi Jakub!

On Wed, 4 Nov 2015 11:30:28 +0100, Jakub Jelinek  wrote:
> >  gfc_match_oacc_update (void)
> >  {
> >gfc_omp_clauses *c;
> > +  locus here = gfc_current_locus;
> > +
> >if (gfc_match_omp_clauses (&c, OACC_UPDATE_CLAUSES, false, false, true)
> >!= MATCH_YES)
> >  return MATCH_ERROR;
> >  
> > +  if (!c->lists[OMP_LIST_MAP])
> > +{
> > +  gfc_error ("% must contain at least one "
> > +"% or % clause at %L", &here);
> 
> There is no host/self clause I'd guess, so you should spell those
> separately.

That's the same language as used in the C and C++ front ends; the host
and self clauses are synonymous.

$ git grep -C2 host/self upstream/trunk -- gcc/c*/
upstream/trunk:gcc/c/c-parser.c-  error_at (loc,
upstream/trunk:gcc/c/c-parser.c-"%<#pragma acc update%> 
must contain at least one "
upstream/trunk:gcc/c/c-parser.c:"% or 
% clause");
upstream/trunk:gcc/c/c-parser.c-  return;
upstream/trunk:gcc/c/c-parser.c-}
--
upstream/trunk:gcc/cp/parser.c-  error_at (pragma_tok->location,
upstream/trunk:gcc/cp/parser.c- "%<#pragma acc update%> must 
contain at least one "
upstream/trunk:gcc/cp/parser.c: "% or % 
clause");
upstream/trunk:gcc/cp/parser.c-  return NULL_TREE;
upstream/trunk:gcc/cp/parser.c-}


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-04 Thread Martin Sebor

On 11/02/2015 07:40 PM, Jason Merrill wrote:

On 10/26/2015 09:48 PM, Martin Sebor wrote:

+  while (TREE_CODE (oper) == NOP_EXPR)
+oper = TREE_OPERAND (oper, 0);


This is STRIP_NOPS.


+ to placement new is not checked since it's unknownwhat it might


Missing space.


+  else if (TREE_CODE (oper) == ADDR_EXPR) {


The brace should go on its own line.


+  /* A possibly optimistic estimate Number of bytes available


Maybe "of the number"?


Thanks for the review. I've fixed the issues above in the latest
patch (attached).




+  /* When the referenced object is a member of a union, use the size
+ of the entire union as the size of the buffer.  */


Why?  If we're accessing one union member, we should limit the allowed
space to the size of that member.


I followed the more permissive approach taken by _FORTIFY_SOURCE
where the size of the whole object is used (on the assumption that
we will eventually want to adopt the same mechanism here). For
instance, given:

  union U { char c; int i; } u;

GCC doesn't diagnose:

  memset (&u.c, 0, sizeof u);

so I didn't expect we'd want the following diagnosed either:

  new (&u.c) U ();

But if you think it's preferable to use the size of the member
for this iteration of the placement new warning I can change it.
Can you confirm?




+  if (bytes_avail <= abs (adjust))
+bytes_avail = 0;
+  else if (0 <= adjust)
+bytes_avail -= adjust;
+  else
+bytes_avail += adjust;


If adjust is negative, I would think that we would have returned already
because we were dealing with an offset from a pointer of unknown value.


Adjust is negative when the offset to a buffer of known size is
negative. For example:

char buf [sizeof (int)];
new (&buf [1] - 1) int;



It also seems that you're being careful to avoid bytes_avail going
negative, so I wonder why you have it signed and bytes_need unsigned.


+  warning_at (EXPR_LOC_OR_LOC (orig_oper, input_location),


Let's remember this location early on so you don't need orig_oper.


Done.

Martin
gcc ChangeLog
2015-11-04  Martin Sebor  

	PR c++/67942
* invoke.texi (-Wplacement-new): Document new option.
	* gcc/testsuite/g++.dg/warn/Wplacement-new-size.C: New test.

gcc/c-family ChangeLog
2015-11-04  Martin Sebor  

	PR c++/67942
* c.opt (-Wplacement-new): New option.

gcc/cp ChangeLog
2015-11-04  Martin Sebor  

	PR c++/67942
	* cp/init.c (warn_placement_new_too_small): New function.
	(build_new_1): Call it.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 47ba070..5e9d7a3 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -760,6 +760,10 @@ Wprotocol
 ObjC ObjC++ Var(warn_protocol) Init(1) Warning
 Warn if inherited methods are unimplemented

+Wplacement-new
+C++ Var(warn_placement_new) Init(1) Warning
+Warn for placement new expressions with undefined behavior
+
 Wredundant-decls
 C ObjC C++ ObjC++ Var(warn_redundant_decls) Warning
 Warn about multiple declarations of the same object
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1ed8f6c..ebf283d 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2269,6 +2269,200 @@ throw_bad_array_new_length (void)
   return build_cxx_call (fn, 0, NULL, tf_warning_or_error);
 }

+/* Attempt to verify that the argument, OPER, of a placement new expression
+   refers to an object sufficiently large for an object of TYPE or an array
+   of NELTS of such objects when NELTS is non-null, and issue a warning when
+   it does not.  SIZE specifies the size needed to construct the object or
+   array and captures the result of NELTS * sizeof (TYPE). (SIZE could be
+   greater when the array under construction requires a cookie to store
+   NELTS.  GCC's placement new expression stores the cookie when invoking
+   a user-defined placement new operator function but not the default one.
+   Placement new expressions with user-defined placement new operator are
+   not diagnosed since we don't know how they use the buffer (this could
+   be a future extension).  */
+static void
+warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper)
+{
+  location_t loc = EXPR_LOC_OR_LOC (oper, input_location);
+
+  /* The number of bytes to add or subtract from the size of the provided
+ buffer based on an offset into an array or an array element reference.  */
+  HOST_WIDE_INT adjust = 0;
+  /* True when the size of the entire destination object should be used
+ to compute the possibly optimistic estimate of the available space.  */
+  bool use_obj_size = false;
+  /* True when the reference to the destination buffer is an ADDR_EXPR.  */
+  bool addr_expr = false;
+
+  STRIP_NOPS (oper);
+
+  /* Using a function argument or a (non-array) variable as an argument
+ to placement new is not checked since it's unknown what it might
+ point to.  */
+  if (TREE_CODE (oper) == PARM_DECL
+  || TREE_CODE (oper) == VAR_DECL
+  || TREE_CODE (oper) == COMPONENT_REF)
+return;
+
+  /* Evaluate 

Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Thomas Schwinge
Hi Cesar!

On Tue, 3 Nov 2015 19:06:50 -0800, Cesar Philippidis  
wrote:
> This patch updates the fortran front end so that it supports the acc
> loop clauses in a similar manner to both the c and c++ front ends in
> addition to addressing a couple of other loose ends.

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c

> @@ -3028,6 +3015,22 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
> *omp_clauses,
>   n->sym->mark = 1;
>  }
>  
> +  /* OpenACC reductions.  */
> +  if (openacc)
> +{
> +  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
> + n->sym->mark = 0;

Maybe I'm just confugsed, but if setting all these to zero here...

> +
> +  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
> + {
> +   if (n->sym->mark)
> + gfc_error ("Symbol %qs present on multiple clauses at %L",
> +n->sym->name, &n->where);
> +   else
> + n->sym->mark = 1;

... won't this just always run into the "else" branch?

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c

> @@ -3449,16 +3478,28 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
> sizeof (construct_clauses));
>loop_clauses.collapse = construct_clauses.collapse;
> [...]
> -  construct_clauses.collapse = 0;

Again I'm confused by this, why this is being removed, as earlier in
.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90

I suggest you also merge the existing
gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 into your new test case
file (consistent naming, with the other combined-directives* files).

> @@ -0,0 +1,152 @@
> +! Exercise combined OpenACC directives.
> +
> +! { dg-do compile }
> +! { dg-options "-fopenacc -fdump-tree-gimple" }
> +
> +! { dg-prune-output "sorry, unimplemented" }

What's still unimplemented here?  Please add a comment, or put the
dg-prune-output directive next to the offending OpenACC directive, so
we'll be sure to remove it later on.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
> @@ -0,0 +1,363 @@
> +! { dg-do compile }
> +! { dg-additional-options "-fmax-errors=100" }
> +
> +! { dg-prune-output "sorry, unimplemented" }

Likewise.

> +! { dg-prune-output "Error: work-sharing region" }

What's the intention of this?  If we're expecting this error, place
dg-error directives where they belong?

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-6.f95
> @@ -0,0 +1,80 @@
> +! { dg-do compile }
> +! { dg-additional-options "-fmax-errors=100" }
> +
> +! { dg-prune-output "sorry, unimplemented" }

Likewise.

> +! { dg-prune-output "Error: work-sharing region" }

Likewise.

> --- a/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
> @@ -3,6 +3,9 @@
>  
>  ! test for tree-dump-original and spaces-commas
>  
> +! { dg-prune-output "sorry, unimplemented" }

Likewise.

> +! { dg-prune-output "Error: work-sharing region" }

Likewise.

> --- a/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
> +++ b/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
> @@ -37,4 +37,3 @@ end program test
>  
>  ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 
> "original" } } 
>  ! { dg-final { scan-tree-dump-times "private\\(v\\)" 1 "original" } } 
> -! { dg-final { scan-tree-dump-times "firstprivate\\(w\\)" 1 "original" } } 

Which of your source code changes does this change related to?


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [OpenACC] declare directive

2015-11-04 Thread James Norris

Hi Thomas,

On 11/04/2015 10:49 AM, Thomas Schwinge wrote:

Hi Jim!

On Tue, 3 Nov 2015 10:31:32 -0600, James Norris  
wrote:

On 10/27/2015 03:18 PM, James Norris wrote:

  This patch adds the processing of OpenACC declare directive in C
  and C++. (Note: Support in Fortran is already in trunk.)


..., and a patch adjusting some Fortran front end things is awaiting
review,
.


  I've revised the patch since I originally submitted it for review
  (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02967.html). The
  revision is due to Jakub and et al OpenMP 4.5 work in the area of
  'omp declare target'. I now exploit that functionality and have
  revised the patch accordingly.


Oh, wow, you could remove a lot of code!


Yes, I missed that patch when it entered into the code base. My bad.



Just a superficial review on your patch; patch re-ordered a bit for
review.


--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def



+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_UINT, BT_VOID, BT_PTR, BT_INT, BT_UINT)



--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def



+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_UINT, BT_VOID, BT_PTR, BT_INT, BT_UINT)



--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def



+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_STATIC, "GOACC_register_static",
+  BT_FN_VOID_PTR_INT_UINT, ATTR_NOTHROW_LIST)



--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map



+   GOACC_register_static;


I think these changes can be dropped -- assuming you have not
unintentionally dropped the GOACC_register_static function/usage in your
v2 patch.


Will fix.




--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c



@@ -830,6 +830,7 @@ const struct attribute_spec c_common_attribute_table[] =



+  { "oacc declare",   0, -1, true,  false, false, NULL, false },


As far as I can tell, nothing is setting this attribute anymore in your
v2 patch, so I guess this and all handling code ("lookup_attribute",
"remove_attribute") can also be dropped?


Will fix.




--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c



  /* OpenACC 2.0:
+   # pragma acc declare oacc-data-clause[optseq] new-line
+*/
+
+#define OACC_DECLARE_CLAUSE_MASK   \
+   ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)\
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)   \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)\
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR) \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT)   \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK)   \


For uniformity, please use/add a new alias "PRAGMA_OACC_CLAUSE_* =
PRAGMA_OMP_CLAUSE_LINK" instead of using PRAGMA_OMP_CLAUSE_* here, and
also in c_parser_oacc_data_clause, c_parser_oacc_all_clauses, and in the
C++ front end.


Will fix.




+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)   \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY)   \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN) \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)\
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE) )
+
+static void
+c_parser_oacc_declare (c_parser *parser)
+{



--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c



+static tree
+cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
+{
+  [...]
+  tree prev_attr = lookup_attribute ("oacc declare",
+DECL_ATTRIBUTES (decl));


Per my comment above, this would always be NULL_TREE.  The C front end is
different?


Will fix.




+
+  if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_LINK)
+   id = get_identifier ("omp declare target link");
+  else
+id = get_identifier ("omp declare target");
+
+  if (prev_attr)
+   {
+ tree p = TREE_VALUE (prev_attr);
+ tree cl = TREE_VALUE (p);
+
+ if (!devres && OMP_CLAUSE_MAP_KIND (cl) != GOMP_MAP_DEVICE_RESIDENT)
+   {
+ error_at (loc, "variable %qD used more than once with "
+   "%<#pragma acc declare%>", decl);
+ inform (OMP_CLAUSE_LOCATION (TREE_VALUE (p)),
+ "previous directive was here");
+ error = true;
+ continue;
+   }
+   }



--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15314,6 +15314,17 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl,
add_stmt (t);
break;

+case OACC_DECLARE:
+  t = copy_node (t);
+  tmp = tsubst_omp_clauses (OACC_DECLARE_CLAUSES (t), false, false,
+   args, complain, in_decl);
+  OACC_DECLARE_CL

Re: [gomp4,committed] Handle recursive restrict in function parameter

2015-11-04 Thread Tom de Vries

On 03/11/15 14:58, Tom de Vries wrote:

This patch adds handling of all the restrict qualifiers in the type of a
function parameter.



And committed to gomp-4_0-branch.


I've reverted this patch, and backported the version from trunk.

Committed as attached.

Thanks,
- Tom
[PATCH 1/2] Revert "Handle recursive restrict in function parameter"

2015-11-04  Tom de Vries  

	revert:
	2015-10-03  Tom de Vries  

	* tree-ssa-structalias.c (struct fieldoff): Add restrict_var field.
	(push_fields_onto_fieldstack): Add and handle handle_param parameter.
	(create_variable_info_for_1): Add and handle
	handle_param parameter.  Add extra arg to call to
	push_fields_onto_fieldstack.  Handle restrict pointer fields.
	(create_variable_info_for): Call create_variable_info_for_1 with extra
	arg.
	(make_param_constraints): Drop restrict_name parameter.  Ignore
	vi->only_restrict_pointers.
	(intra_create_variable_infos): Call create_variable_info_for_1 with
	extra arg.  Remove restrict handling.  Call make_param_constraints with
	one less arg.

	* gcc.dg/tree-ssa/restrict-7.c: New test.
	* gcc.dg/tree-ssa/restrict-8.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c | 12 
 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c | 17 --
 gcc/tree-ssa-structalias.c | 90 --
 3 files changed, 36 insertions(+), 83 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
deleted file mode 100644
index f7a68c7..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
+++ /dev/null
@@ -1,12 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-fre1" } */
-
-int
-f (int *__restrict__ *__restrict__ *__restrict__ a, int *b)
-{
-  *b = 1;
-  ***a  = 2;
-  return *b;
-}
-
-/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
deleted file mode 100644
index b0ab164..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
+++ /dev/null
@@ -1,17 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-fre1" } */
-
-struct s
-{
-  int *__restrict__ *__restrict__ pp;
-};
-
-int
-f (struct s s, int *b)
-{
-  *b = 1;
-  **s.pp = 2;
-  return *b;
-}
-
-/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 074285c..f4c875f 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -320,7 +320,6 @@ static varinfo_t first_or_preceding_vi_for_offset (varinfo_t,
 		   unsigned HOST_WIDE_INT);
 static varinfo_t lookup_vi_for_tree (tree);
 static inline bool type_can_have_subvars (const_tree);
-static void make_param_constraints (varinfo_t);
 
 /* Pool of variable info structures.  */
 static object_allocator variable_info_pool
@@ -407,7 +406,6 @@ new_var_info (tree t, const char *name, bool add_id)
   return ret;
 }
 
-static varinfo_t create_variable_info_for_1 (tree, const char *, bool, bool);
 
 /* A map mapping call statements to per-stmt variables for uses
and clobbers specific to the call.  */
@@ -5210,8 +5208,6 @@ struct fieldoff
   unsigned may_have_pointers : 1;
 
   unsigned only_restrict_pointers : 1;
-
-  varinfo_t restrict_var;
 };
 typedef struct fieldoff fieldoff_s;
 
@@ -5306,12 +5302,11 @@ field_must_have_pointers (tree t)
OFFSET is used to keep track of the offset in this entire
structure, rather than just the immediately containing structure.
Returns false if the caller is supposed to handle the field we
-   recursed for.  If HANDLE_PARAM is set, we're handling part of a function
-   parameter.  */
+   recursed for.  */
 
 static bool
 push_fields_onto_fieldstack (tree type, vec *fieldstack,
-			 HOST_WIDE_INT offset, bool handle_param)
+			 HOST_WIDE_INT offset)
 {
   tree field;
   bool empty_p = true;
@@ -5337,7 +5332,7 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 	|| TREE_CODE (field_type) == UNION_TYPE)
 	  push = true;
 	else if (!push_fields_onto_fieldstack
-		(field_type, fieldstack, offset + foff, handle_param)
+		(field_type, fieldstack, offset + foff)
 		 && (DECL_SIZE (field)
 		 && !integer_zerop (DECL_SIZE (field
 	  /* Empty structures may have actual size, like in C++.  So
@@ -5358,8 +5353,7 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 	if (!pair
 		&& offset + foff != 0)
 	  {
-		fieldoff_s e = {0, offset + foff, false, false, false, false,
-NULL};
+		fieldoff_s e = {0, offset + foff, false, false, false, false};
 		pair = fieldstack->safe_push (e);
 	  }
 
@@ -5393,19 +5387,6 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 		  = (!has_unknown_size
 		 && POINTER_TYPE_P (field_type)
 		 && TYPE_RESTRICT (field_type));
-		if (handle_param
-

Re: [2/3] OpenACC reductions

2015-11-04 Thread Nathan Sidwell

On 11/04/15 08:27, Bernd Schmidt wrote:


Adjust and applied, thanks!

nathan

2015-11-04  Nathan Sidwell  
	Cesar Philippidis  

	* config/nvptx/nvptx.c: Include gimple headers.
	(worker_red_size, worker_red_align, worker_red_name,
	worker_red_sym): New.
	(nvptx_option_override): Initialize worker reduction buffer.
	(nvptx_file_end): Write out worker reduction buffer var.
	(nvptx_expand_shuffle, nvptx_expand_worker_addr,
	nvptx_expand_cmp_swap): New builtin expanders.
	(enum nvptx_builtins): New.
	(nvptx_builtin_decls): New.
	(nvptx_builtin_decl, nvptx_init_builtins, nvptx_expand_builtin): New
	(PTX_VECTOR_LENGTH, PTX_WORKER_LENGTH): New.
	(nvptx_get_worker_red_addr, nvptx_generate_vector_shuffle,
	nvptx_lockless_update): New helpers.
	(nvptx_goacc_reduction_setup, nvptx_goacc_reduction_init,
	nvptx_goacc_reduction_fini, nvptx_goacc_reduction_teaddown): New.
	(nvptx_goacc_reduction): New.
	(TARGET_INIT_BUILTINS, TARGET_EXPAND_BUILTIN,
	TARGET_BUILTIN_DECL): Override.
	(TARGET_GOACC_REDUCTION): Override.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 229766)
+++ config/nvptx/nvptx.c	(working copy)
@@ -57,6 +57,15 @@
 #include "omp-low.h"
 #include "gomp-constants.h"
 #include "dumpfile.h"
+#include "internal-fn.h"
+#include "gimple-iterator.h"
+#include "stringpool.h"
+#include "tree-ssa-operands.h"
+#include "tree-ssanames.h"
+#include "gimplify.h"
+#include "tree-phinodes.h"
+#include "cfgloop.h"
+#include "fold-const.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -88,16 +97,23 @@ struct tree_hasher : ggc_cache_ptr_hash<
 static GTY((cache)) hash_table *declared_fndecls_htab;
 static GTY((cache)) hash_table *needed_fndecls_htab;
 
-/* Size of buffer needed to broadcast across workers.  This is used
-   for both worker-neutering and worker broadcasting.   It is shared
-   by all functions emitted.  The buffer is placed in shared memory.
-   It'd be nice if PTX supported common blocks, because then this
-   could be shared across TUs (taking the largest size).  */
+/* Buffer needed to broadcast across workers.  This is used for both
+   worker-neutering and worker broadcasting.  It is shared by all
+   functions emitted.  The buffer is placed in shared memory.  It'd be
+   nice if PTX supported common blocks, because then this could be
+   shared across TUs (taking the largest size).  */
 static unsigned worker_bcast_size;
 static unsigned worker_bcast_align;
 #define worker_bcast_name "__worker_bcast"
 static GTY(()) rtx worker_bcast_sym;
 
+/* Buffer needed for worker reductions.  This has to be distinct from
+   the worker broadcast array, as both may be live concurrently.  */
+static unsigned worker_red_size;
+static unsigned worker_red_align;
+#define worker_red_name "__worker_red"
+static GTY(()) rtx worker_red_sym;
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -128,6 +144,9 @@ nvptx_option_override (void)
 
   worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, worker_bcast_name);
   worker_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
+  worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, worker_red_name);
+  worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 }
 
 /* Return the mode to be used when declaring a ptx object for OBJ.
@@ -3246,8 +3265,203 @@ nvptx_file_end (void)
 	   worker_bcast_align,
 	   worker_bcast_name, worker_bcast_size);
 }
+
+  if (worker_red_size)
+{
+  /* Define the reduction buffer.  */
+
+  worker_red_size = ((worker_red_size + worker_red_align - 1)
+			 & ~(worker_red_align - 1));
+  
+  fprintf (asm_out_file, "// BEGIN VAR DEF: %s\n", worker_red_name);
+  fprintf (asm_out_file, ".shared .align %d .u8 %s[%d];\n",
+	   worker_red_align,
+	   worker_red_name, worker_red_size);
+}
+}
+
+/* Expander for the shuffle builtins.  */
+
+static rtx
+nvptx_expand_shuffle (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+return target;
+  
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+  if (!REG_P (src))
+src = copy_to_mode_reg (mode, src);
+
+  rtx idx = expand_expr (CALL_EXPR_ARG (exp, 1),
+			 NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx op = expand_expr (CALL_EXPR_ARG  (exp, 2),
+			NULL_RTX, SImode, EXPAND_NORMAL);
+  
+  if (!REG_P (idx) && GET_CODE (idx) != CONST_INT)
+idx = copy_to_mode_reg (SImode, idx);
+
+  rtx pat = nvptx_gen_shuffle (target, src, idx, INTVAL (op));
+  if (pat)
+emit_insn (pat);
+
+  return target;
+}
+
+/* Worker reduction address expander.  */
+
+static rtx
+nvptx_expand_worker_addr (tree exp, rtx target,
+			  machine_mode ARG_UNUSED (mode), int ignore)
+{
+  if (ignore)
+return target;
+
+  unsigned align = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 2));
+  if (align > worker_red_align)
+worker_red_align = align;
+
+  unsigned offset = TRE

Re: [PATCH] Pass manager: add support for termination of pass list

2015-11-04 Thread Martin Liška
On 11/04/2015 03:46 PM, Richard Biener wrote:
> On Wed, Nov 4, 2015 at 11:38 AM, Martin Liška  wrote:
>> On 11/03/2015 03:44 PM, Richard Biener wrote:
>>> On Tue, Nov 3, 2015 at 3:13 PM, Martin Liška  wrote:
 On 11/03/2015 02:46 PM, Richard Biener wrote:
> On Fri, Oct 30, 2015 at 1:53 PM, Martin Liška  wrote:
>> On 10/30/2015 01:13 PM, Richard Biener wrote:
>>> So I suggest to do the push/pop of cfun there.
>>> do_per_function_toporder can be made static btw.
>>>
>>> Richard.
>>
>> Right, I've done that and it works (bootstrap has been currently 
>> running),
>> feasible for HSA branch too.
>>
>> tree-pass.h:
>>
>> /* Declare for plugins.  */
>> extern void do_per_function_toporder (void (*) (function *, void *), 
>> void *);
>>
>> Attaching the patch that I'm going to test.
>
> Err.
>
> +  cgraph_node::get (current_function_decl)->release_body ();
> +
> +  current_function_decl = NULL;
> +  set_cfun (NULL);
>
> I'd have expected
>
>   tree fn = cfun->decl;
>   pop_cfun ();
>   gcc_assert (!cfun);
>   cgraph_node::get (fn)->release_body ();
>
> here.

 Yeah, that works, but we have to add following hunk:

 diff --git a/gcc/function.c b/gcc/function.c
 index aaf49a4..4718fe1 100644
 --- a/gcc/function.c
 +++ b/gcc/function.c
 @@ -4756,6 +4756,13 @@ push_cfun (struct function *new_cfun)
  void
  pop_cfun (void)
  {
 +  if (cfun_stack.is_empty ())
 +{
 +  set_cfun (NULL);
 +  current_function_decl = NULL_TREE;
 +  return;
 +}
 +
struct function *new_cfun = cfun_stack.pop ();
/* When in_dummy_function, we do have a cfun but current_function_decl 
 is
   NULL.  We also allow pushing NULL cfun and subsequently changing
>>>
>>> Why again?  cfun should be set via push_cfun here so what's having cfun == 
>>> NULL
>>> at the pop_cfun point?  Or rather, what code used set_cfun () instead
>>> of push_cfun ()?
>>>

 If you are fine with that, looks we've fixed all issues related to the 
 change, right?
 Updated version of the is attached.

 Martin

>
>> Martin
>>

>>
>> Hi Richard.
>>
>> There's the patch we talked about yesterday. It contains a few modification 
>> that
>> were necessary to make it working:
>>
>> 1) cgraph_node::expand_thunk uses just set_cfun (NULL) & 
>> current_function_decl = NULL because
>>
>>   bb = then_bb = else_bb = return_bb
>> = init_lowered_empty_function (thunk_fndecl, true, count);
>>
>> in last branch of expand_thunk which calls allocate_struct_function.
>>
>> 2) i386.c and rs6000.c call init_function_start, as I move preamble to 
>> callers, I did the same
>> for these two functions
>>
>> I've been running regression test and bootstrap.
> 
> Looks good to me.
> 
> Thanks,
> Richard.
> 
>> Martin

Thank you for review, installed as r229764. Final version of the patch
(Changelog entry has been added) is attached.

Martin
>From 825439772cd4d90c7253999f18457f49d23b37b4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 3 Nov 2015 16:28:36 +0100
Subject: [PATCH] Pass manager: add support for termination of pass list

gcc/ChangeLog:

2015-11-04  Martin Liska  

	* cgraphunit.c (cgraph_node::expand_thunk): Call
	allocate_struct_function before init_function_start.
	(cgraph_node::expand): Use push_cfun and pop_cfun.
	* config/i386/i386.c (ix86_code_end): Call
	allocate_struct_function before init_function_start.
	* config/rs6000/rs6000.c (rs6000_code_end): Likewise.
	* function.c (init_function_start): Move preamble to all
	callers.
	* passes.c (do_per_function_toporder): Use push_cfun and pop_cfun.
	(execute_one_pass): Handle newly added TODO_discard_function.
	(execute_pass_list_1): Terminate if cfun equals to NULL.
	(execute_pass_list): Do not push and pop cfun, expect that
	cfun is set.
	* tree-pass.h (TODO_discard_function): Define.
---
 gcc/cgraphunit.c   | 10 ++
 gcc/config/i386/i386.c |  1 +
 gcc/config/rs6000/rs6000.c |  1 +
 gcc/function.c |  5 -
 gcc/passes.c   | 32 
 gcc/tree-pass.h|  3 +++
 6 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 43d3185..f73d9a7 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1618,6 +1618,7 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk)
   fn_block = make_node (BLOCK);
   BLOCK_VARS (fn_block) = a;
   DECL_INITIAL (thunk_fndecl) = fn_block;
+  allocate_struct_function (thunk_fndecl, false);
   init_function_start (thunk_fndecl);
   cfun->is_thunk = 1;
   insn_locations_init ();
@@ -1632,7 +1633,6 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk)
   insn_locations_finaliz

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-11-04 Thread Jan Hubicka
> > Are these supposed to be fixed by Richard's change to not use
> > useless_type_conversion for VCE, or is it another issue?
> 
> Richard's change not to use useless_type_conversion for VCE was causing 
> additional GIMPLE verification failures so I didn't pursue; I can try again,
> but all the known regressions are now fixed thanks to Richard's latest change 
> to useless_type_conversion_p itself.

I see, you re-instantiated the TYPE_CANONICAL check for aggregates instead.  I
guess it is most practical way to go right now even though it would be really 
nice
to separate this from TBAA machinery.
At the moment LTO doesn't do globbing where calling conventions should care.
One such case is the globing of array containing char and char which is required
by Fortran standard, but that IMO is a defect in standard - if types are passed
differently by target ABI one can't expect them to be fuly interoperable as 
Fortran
would like.

Thank you very much for looking into this!
Honza
> 
> -- 
> Eric Botcazou


Re: [OpenACC] declare directive

2015-11-04 Thread Thomas Schwinge
Hi Jim!

On Tue, 3 Nov 2015 10:31:32 -0600, James Norris  
wrote:
> On 10/27/2015 03:18 PM, James Norris wrote:
> >  This patch adds the processing of OpenACC declare directive in C
> >  and C++. (Note: Support in Fortran is already in trunk.)

..., and a patch adjusting some Fortran front end things is awaiting
review,
.

>  I've revised the patch since I originally submitted it for review
>  (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02967.html). The
>  revision is due to Jakub and et al OpenMP 4.5 work in the area of
>  'omp declare target'. I now exploit that functionality and have
>  revised the patch accordingly.

Oh, wow, you could remove a lot of code!

Just a superficial review on your patch; patch re-ordered a bit for
review.

> --- a/gcc/builtin-types.def
> +++ b/gcc/builtin-types.def

> +DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_UINT, BT_VOID, BT_PTR, BT_INT, 
> BT_UINT)

> --- a/gcc/fortran/types.def
> +++ b/gcc/fortran/types.def

> +DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_UINT, BT_VOID, BT_PTR, BT_INT, 
> BT_UINT)

> --- a/gcc/omp-builtins.def
> +++ b/gcc/omp-builtins.def

> +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_STATIC, "GOACC_register_static",
> +BT_FN_VOID_PTR_INT_UINT, ATTR_NOTHROW_LIST)

> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map

> + GOACC_register_static;

I think these changes can be dropped -- assuming you have not
unintentionally dropped the GOACC_register_static function/usage in your
v2 patch.

> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c

> @@ -830,6 +830,7 @@ const struct attribute_spec c_common_attribute_table[] =

> +  { "oacc declare",   0, -1, true,  false, false, NULL, false },

As far as I can tell, nothing is setting this attribute anymore in your
v2 patch, so I guess this and all handling code ("lookup_attribute",
"remove_attribute") can also be dropped?

> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c

>  /* OpenACC 2.0:
> +   # pragma acc declare oacc-data-clause[optseq] new-line
> +*/
> +
> +#define OACC_DECLARE_CLAUSE_MASK \
> + ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)\
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT) \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK) \

For uniformity, please use/add a new alias "PRAGMA_OACC_CLAUSE_* =
PRAGMA_OMP_CLAUSE_LINK" instead of using PRAGMA_OMP_CLAUSE_* here, and
also in c_parser_oacc_data_clause, c_parser_oacc_all_clauses, and in the
C++ front end.

> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT) \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY) \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)   \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)  \
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE) )
> +
> +static void
> +c_parser_oacc_declare (c_parser *parser)
> +{

> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c

> +static tree
> +cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
> +{
> +  [...]
> +  tree prev_attr = lookup_attribute ("oacc declare",
> +  DECL_ATTRIBUTES (decl));

Per my comment above, this would always be NULL_TREE.  The C front end is
different?

> +
> +  if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_LINK)
> + id = get_identifier ("omp declare target link");
> +  else
> +id = get_identifier ("omp declare target");
> +
> +  if (prev_attr)
> + {
> +   tree p = TREE_VALUE (prev_attr);
> +   tree cl = TREE_VALUE (p);
> +
> +   if (!devres && OMP_CLAUSE_MAP_KIND (cl) != GOMP_MAP_DEVICE_RESIDENT)
> + {
> +   error_at (loc, "variable %qD used more than once with "
> + "%<#pragma acc declare%>", decl);
> +   inform (OMP_CLAUSE_LOCATION (TREE_VALUE (p)),
> +   "previous directive was here");
> +   error = true;
> +   continue;
> + }
> + }

> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -15314,6 +15314,17 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> complain, tree in_decl,
>add_stmt (t);
>break;
>  
> +case OACC_DECLARE:
> +  t = copy_node (t);
> +  tmp = tsubst_omp_clauses (OACC_DECLARE_CLAUSES (t), false, false,
> + args, complain, in_decl);
> +  OACC_DECLARE_CLAUSES (t) = tmp;
> +  tmp = tsubst_omp_clauses (OACC_DECLARE_RETURN_CLAUSES (t), false, 
> false,
> +  

[PATCH v3 2/2] [PR debug/67192] Further fix C loops' back-jump location

2015-11-04 Thread Andreas Arnez
After parsing an unconditional "while"- or "for"-loop, the C front-end
generates a backward-goto statement and implicitly sets its location to
the current input_location.  But in some cases the parser peeks ahead
first, such that input_location already points to the line after the
loop and the generated backward-goto gets the wrong line number.

One way this can occur is with a loop body consisting of an "if"
statement, because then the parser peeks for an optional "else" before
finishing the loop.

Another way occurred after r223098 ("Implement
-Wmisleading-indentation"), even with a loop body enclosed in braces.
This was because the check for misleading indentation always peeks ahead
one token as well.

This patch avoids the use of input_location and sets the location of the
backward-goto to the start of the loop body instead, or, if there is no
loop body, to the start of the loop.

gcc/c/ChangeLog:

PR debug/67192
* c-typeck.c (c_finish_loop): For unconditional loops, set the
location of the backward-goto to the start of the loop body.

gcc/testsuite/ChangeLog:

PR debug/67192
* gcc.dg/guality/pr67192.c (f3, f4): New functions.
(main): Invoke them.
---
 gcc/c/c-typeck.c   | 10 ++
 gcc/testsuite/gcc.dg/guality/pr67192.c | 26 ++
 2 files changed, 36 insertions(+)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 2363b9b..e4c3720 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -9878,6 +9878,16 @@ c_finish_loop (location_t start_locus, tree cond, tree 
incr, tree body,
exit = fold_build3_loc (input_location,
COND_EXPR, void_type_node, cond, exit, t);
}
+  else
+   {
+ /* For the backward-goto's location of an unconditional loop
+use the beginning of the body, or, if there is none, the
+top of the loop.  */
+ location_t loc = EXPR_LOCATION (expr_first (body));
+ if (loc == UNKNOWN_LOCATION)
+   loc = start_locus;
+ SET_EXPR_LOCATION (exit, loc);
+   }
 
   add_stmt (top);
 }
diff --git a/gcc/testsuite/gcc.dg/guality/pr67192.c 
b/gcc/testsuite/gcc.dg/guality/pr67192.c
index f6382ef..946e68f 100644
--- a/gcc/testsuite/gcc.dg/guality/pr67192.c
+++ b/gcc/testsuite/gcc.dg/guality/pr67192.c
@@ -39,15 +39,41 @@ f2 (void)
   do_it (); /* { dg-final { gdb-test 39 "cnt" "10" } } */
 }
 
+__attribute__((noinline, noclone)) static void
+f3 (void)
+{
+  for (;; do_it())
+if (last ())
+  break;
+  do_it (); /* { dg-final { gdb-test 48 "cnt" "15" } } */
+}
+
+__attribute__((noinline, noclone)) static void
+f4 (void)
+{
+  while (1) /* { dg-final { gdb-test 54 "cnt" "15" } } */
+if (last ())
+  break;
+else
+  do_it ();
+  do_it (); /* { dg-final { gdb-test 59 "cnt" "20" } } */
+}
+
 void (*volatile fnp1) (void) = f1;
 void (*volatile fnp2) (void) = f2;
+void (*volatile fnp3) (void) = f3;
+void (*volatile fnp4) (void) = f4;
 
 int
 main ()
 {
   asm volatile ("" : : "r" (&fnp1) : "memory");
   asm volatile ("" : : "r" (&fnp2) : "memory");
+  asm volatile ("" : : "r" (&fnp3) : "memory");
+  asm volatile ("" : : "r" (&fnp4) : "memory");
   fnp1 ();
   fnp2 ();
+  fnp3 ();
+  fnp4 ();
   return 0;
 }
-- 
2.3.0



[PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-04 Thread Andreas Arnez
Since r223098 ("Implement -Wmisleading-indentation") the backward-jump
generated for a C while- or for-loop can get the wrong line number.
This is because the check for misleading indentation peeks ahead one
token, advancing input_location to after the loop, and then
c_finish_loop() creates the back-jump and calls add_stmt(), which
assigns input_location to the statement by default.

This patch swaps the check for misleading indentation with the finishing
of the loop, such that input_location still has the right value at the
time of any invocations of add_stmt().  Note that this does not fully
cover all cases where the back-jump gets the wrong location.

gcc/c/ChangeLog:

PR debug/67192
* c-parser.c (c_parser_while_statement): Finish the loop before
parsing ahead for misleading indentation.
(c_parser_for_statement): Likewise.

gcc/testsuite/ChangeLog:

PR debug/67192
* gcc.dg/guality/pr67192.c: New test.
---
 gcc/c/c-parser.c   | 13 +
 gcc/testsuite/gcc.dg/guality/pr67192.c | 53 ++
 2 files changed, 60 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr67192.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ec88c65..0c5e4e9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -5435,13 +5435,13 @@ c_parser_while_statement (c_parser *parser, bool ivdep)
 = get_token_indent_info (c_parser_peek_token (parser));
 
   body = c_parser_c99_block_statement (parser);
+  c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
+  add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
 
   token_indent_info next_tinfo
 = get_token_indent_info (c_parser_peek_token (parser));
   warn_for_misleading_indentation (while_tinfo, body_tinfo, next_tinfo);
 
-  c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
-  add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
   c_break_label = save_break;
   c_cont_label = save_cont;
 }
@@ -5725,15 +5725,16 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
 
   body = c_parser_c99_block_statement (parser);
 
-  token_indent_info next_tinfo
-= get_token_indent_info (c_parser_peek_token (parser));
-  warn_for_misleading_indentation (for_tinfo, body_tinfo, next_tinfo);
-
   if (is_foreach_statement)
 objc_finish_foreach_loop (loc, object_expression, collection_expression, 
body, c_break_label, c_cont_label);
   else
 c_finish_loop (loc, cond, incr, body, c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99 || c_dialect_objc 
()));
+
+  token_indent_info next_tinfo
+= get_token_indent_info (c_parser_peek_token (parser));
+  warn_for_misleading_indentation (for_tinfo, body_tinfo, next_tinfo);
+
   c_break_label = save_break;
   c_cont_label = save_cont;
 }
diff --git a/gcc/testsuite/gcc.dg/guality/pr67192.c 
b/gcc/testsuite/gcc.dg/guality/pr67192.c
new file mode 100644
index 000..f6382ef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr67192.c
@@ -0,0 +1,53 @@
+/* PR debug/67192 */
+/* { dg-do run } */
+/* { dg-options "-g -Wmisleading-indentation" } */
+
+volatile int cnt = 0;
+
+__attribute__((noinline, noclone)) static int
+last (void)
+{
+  return ++cnt % 5 == 0;
+}
+
+__attribute__((noinline, noclone)) static void
+do_it (void)
+{
+  asm volatile ("" : : "r" (&cnt) : "memory");
+}
+
+__attribute__((noinline, noclone)) static void
+f1 (void)
+{
+  for (;; do_it())
+{
+  if (last ())
+   break;
+}
+  do_it (); /* { dg-final { gdb-test 27 "cnt" "5" } } */
+}
+
+__attribute__((noinline, noclone)) static void
+f2 (void)
+{
+  while (1)
+{
+  if (last ())
+   break;
+  do_it ();
+}
+  do_it (); /* { dg-final { gdb-test 39 "cnt" "10" } } */
+}
+
+void (*volatile fnp1) (void) = f1;
+void (*volatile fnp2) (void) = f2;
+
+int
+main ()
+{
+  asm volatile ("" : : "r" (&fnp1) : "memory");
+  asm volatile ("" : : "r" (&fnp2) : "memory");
+  fnp1 ();
+  fnp2 ();
+  return 0;
+}
-- 
2.3.0



[PATCH v3 0/2] Fix C loops' back-jump location

2015-11-04 Thread Andreas Arnez
Another iteration of trying to fix the regression caused by r223098
("Implement -Wmisleading-indentation").  Patch #1 is the same as v1,
except for some minor changes to the test case.  Patch #2 fixes some
additional cases where the back-jump's location was set wrongly, and
it removes the dependency on input_location for this purpose.

Tested on s390x without regressions.

Previous versions:
 * v1: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01132.html
 * v2: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02393.html

Andreas Arnez (2):
  [PR debug/67192] Fix C loops' back-jump location
  [PR debug/67192] Further fix C loops' back-jump location

 gcc/c/c-parser.c   | 13 +++---
 gcc/c/c-typeck.c   | 10 +
 gcc/testsuite/gcc.dg/guality/pr67192.c | 79 ++
 3 files changed, 96 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr67192.c

-- 
2.3.0



Re: [PATCH] Fix vms targets

2015-11-04 Thread Andrew MacLeod

On 11/02/2015 12:28 AM, Jeff Law wrote:


The header reduction didn't seem to handle the vms targets correctly. 
This reverts that part of Andrew's patch which allows the alpha, 
alpha64 and ia64 vms targets to build again.


Installed on the trunk.  That covers all the fallout from standard 
builds of config-list.mk that I'm aware of.


how weird.. really.  I ran all config-list.mk and saw no failures... 
huh.   maybe because I only ran c/c++ instead of 'all' to save time?  
I'm still surprised.


Andrew


Re: [PATCH], PowerPC IEEE 128-bit patch #7 (revised #2), Subpatch #17

2015-11-04 Thread David Edelsohn
On Tue, Oct 27, 2015 at 12:44 PM, Michael Meissner
 wrote:
> This patch adds a test to make sure __float128 and __ibm128 are not allowed to
> be combined in binary operations.  I re-ran the test suite on power8 little
> endian, and this test passed. Once the preceeding 16 patches are applied to 
> the
> tree, is this patch ok to commit into trunk?
>
> 2015-10-27  Michael Meissner  
>
> * gcc.target/powerpc/float128-mix.c: New test for IEEE 128-bit
> floating point.

Okay.

Thanks, David


Re: [c++-delayed-folding] Introduce convert_to_real_nofold

2015-11-04 Thread Jason Merrill

On 11/04/2015 09:03 AM, Marek Polacek wrote:

On Wed, Nov 04, 2015 at 10:32:52AM +0100, Richard Biener wrote:

On Tue, Nov 3, 2015 at 5:53 PM, Marek Polacek  wrote:

The last piece for convert.c.  Since convert_to_real uses fold ()
rather than fold_buildN, I defined a new macro to keep the code
more compact.

With this committed, convert.c should be dealt with.  If there's
anything else I could help with, please let me know.

Bootstrapped/regtested on x86_64-linux, ok for branch?


I wonder what happens (on trunk) when you just remove the fold () calls.


Nothing much, at least the following patch passes testing.


Then let's do that rather than introduce maybe_fold.

Jason



Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-04 Thread James Greenhalgh

On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
> >+ i = tmp_i; <- Should be cleaned up
>
> Maybe reword as "Subsequent passes are expected to clean up the
> extra moves", otherwise it sounds like a TODO item.
>
> >+   read back in anotyher SET, as might occur in a swap idiom or
>
> Typo.
>
> >+  if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> >+{
> >+  /* The write to targets[i] is only live until the read
> >+ here.  As the condition codes match, we can propagate
> >+ the set to here.  */
> >+   new_val = SET_SRC (single_set (unmodified_insns[i]));
> >+}
>
> Shouldn't use braces around single statements (also goes for the
> surrounding for loop).
>
> >+  /* We must have at least one real insn to convert, or there will
> >+ be trouble!  */
> >+  unsigned count = 0;
>
> The comment seems a bit strange in this context - I think it's left
> over from the earlier version?
>
> As far as I'm concerned this is otherwise ok.

Thanks,

I've updated the patch with those issues addressed. As the cost model was
controversial in an earlier revision, I'll leave this on list for 24 hours
and, if nobody jumps in to object, commit it tomorrow.

I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
just to check that I got the braces right, with no issues.

Thanks,
James

---
gcc/

2015-11-04  James Greenhalgh  

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
(noce_convert_multiple_sets): Likewise.
(noce_process_if_block): Call them.

gcc/testsuite/

2015-11-04  James Greenhalgh  

* gcc.dg/ifcvt-4.c: New.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f23d9afd..1c33283 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3016,6 +3016,244 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+ if (x > y)
+   { i = a; j = b; k = c; }
+
+   Make it:
+
+ tmp_i = (x > y) ? a : i;
+ tmp_j = (x > y) ? b : j;
+ tmp_k = (x > y) ? c : k;
+ i = tmp_i;
+ j = tmp_j;
+ k = tmp_k;
+
+   Subsequent passes are expected to clean up the extra moves.
+
+   Look for special cases such as writes to one register which are
+   read back in another SET, as might occur in a swap idiom or
+   similar.
+
+   These look like:
+
+   if (x > y)
+ i = a;
+ j = i;
+
+   Which we want to rewrite to:
+
+ tmp_i = (x > y) ? a : i;
+ tmp_j = (x > y) ? tmp_i : j;
+ i = tmp_i;
+ j = tmp_j;
+
+   We can catch these when looking at (SET x y) by keeping a list of the
+   registers we would have targeted before if-conversion and looking back
+   through it for an overlap with Y.  If we find one, we rewire the
+   conditional set to use the temporary we introduced earlier.
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+ overlap.  */
+  vec temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec unmodified_insns = vNULL;
+  int count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+{
+  /* Skip over non-insns.  */
+  if (!active_insn_p (insn))
+	continue;
+
+  rtx set = single_set (insn);
+  gcc_checking_assert (set);
+
+  rtx target = SET_DEST (set);
+  rtx temp = gen_reg_rtx (GET_MODE (target));
+  rtx new_val = SET_SRC (set);
+  rtx old_val = target;
+
+  /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+  for (int i = count - 1; i >= 0; --i)
+	if (reg_overlap_mentioned_p (new_val, targets[i]))
+	  {
+	/* Catch a "swap" style idiom.  */
+	if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+	  /* The write to targets[i] is only live until the read
+		 here.  As the condition codes match, we can propagate
+		 the set to here.  */
+	  new_val = SET_SRC (single_set (unmodified_insns[i]));
+	else
+	  new_val = temporaries[i];
+	break;
+	  }
+
+  /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */

Re: [PATCH 10/9] ENABLE_CHECKING refactoring: remove remaining occurrences

2015-11-04 Thread Mikhail Maltsev
On 11/03/2015 02:35 AM, Jeff Law wrote:
> This is good fore the trunk too.  Please install.
> 
> Thanks!
> 
> jeff

Committed as r229758.

-- 
Regards,
Mikhail Maltsev


[Patch, fortran] PR68196 [4.9/5/6 Regression] ICE on function result with procedure pointer component

2015-11-04 Thread Paul Richard Thomas
Dear All,

The patch for these PRs is fully explained by the the comments and/or
changelogs. PR66465 has no connection with PR68196, other than Damian
asking if it is connected!

Bootstrapped and regtested on x86_64/FC21 - OK for trunk and a few
weeks later 4.9 and 5 branches?

Cheers

Paul

2015-11-04  Paul Thomas  

PR fortran/68196
* class.c (has_finalizer_component): Prevent infinite recursion
through this function if the derived type and that of its
component are the same.
* trans-types.c (gfc_get_derived_type): Do the same for proc
pointers by ignoring the explicit interface for the component.

PR fortran/66465
* check.c (same_type_check): If either of the expressions is
BT_PROCEDURE, use the typespec from the symbol, rather than the
expression.

2015-11-04  Paul Thomas  

PR fortran/68196
* gfortran.dg/proc_ptr_47.f90: New test.

PR fortran/66465
* gfortran.dg/pr66465.f90: New test.
Index: gcc/fortran/check.c
===
*** gcc/fortran/check.c (revision 229571)
--- gcc/fortran/check.c (working copy)
*** less_than_bitsize2 (const char *arg1, gf
*** 399,405 
  static bool
  same_type_check (gfc_expr *e, int n, gfc_expr *f, int m)
  {
!   if (gfc_compare_types (&e->ts, &f->ts))
  return true;
  
gfc_error ("%qs argument of %qs intrinsic at %L must be the same type "
--- 399,413 
  static bool
  same_type_check (gfc_expr *e, int n, gfc_expr *f, int m)
  {
!   gfc_typespec *ets = &e->ts;
!   gfc_typespec *fts = &f->ts;
! 
!   if (e->ts.type == BT_PROCEDURE && e->symtree->n.sym)
! ets = &e->symtree->n.sym->ts;
!   if (f->ts.type == BT_PROCEDURE && f->symtree->n.sym)
! fts = &f->symtree->n.sym->ts;
! 
!   if (gfc_compare_types (ets, fts))
  return true;
  
gfc_error ("%qs argument of %qs intrinsic at %L must be the same type "
Index: gcc/fortran/class.c
===
*** gcc/fortran/class.c (revision 229571)
--- gcc/fortran/class.c (working copy)
*** has_finalizer_component (gfc_symbol *der
*** 843,849 
--- 843,853 
  && c->ts.u.derived->f2k_derived->finalizers)
return true;
  
+   /* Stop infinite recursion through this function by inhibiting
+calls when the derived type and that of the component are
+the same.  */
if (c->ts.type == BT_DERIVED
+ && !gfc_compare_derived_types (derived, c->ts.u.derived)
  && !c->attr.pointer && !c->attr.allocatable
  && has_finalizer_component (c->ts.u.derived))
return true;
Index: gcc/fortran/trans-types.c
===
*** gcc/fortran/trans-types.c   (revision 229571)
--- gcc/fortran/trans-types.c   (working copy)
*** gfc_get_derived_type (gfc_symbol * deriv
*** 2366,2371 
--- 2366,2372 
gfc_component *c;
gfc_dt_list *dt;
gfc_namespace *ns;
+   tree tmp;
  
if (derived->attr.unlimited_polymorphic
|| (flag_coarray == GFC_FCOARRAY_LIB
*** gfc_get_derived_type (gfc_symbol * deriv
*** 2517,2524 
   node as DECL_CONTEXT of each FIELD_DECL.  */
for (c = derived->components; c; c = c->next)
  {
!   if (c->attr.proc_pointer)
field_type = gfc_get_ppc_type (c);
else if (c->ts.type == BT_DERIVED || c->ts.type == BT_CLASS)
  field_type = c->ts.u.derived->backend_decl;
else
--- 2518,2536 
   node as DECL_CONTEXT of each FIELD_DECL.  */
for (c = derived->components; c; c = c->next)
  {
!   /* Prevent infinite recursion, when the procedure pointer type is
!the same as derived, by forcing the procedure pointer component to
!be built as if the explicit interface does not exist.  */
!   if (c->attr.proc_pointer
! && ((c->ts.type != BT_DERIVED && c->ts.type != BT_CLASS)
!  || (c->ts.u.derived
!  && !gfc_compare_derived_types (derived, c->ts.u.derived
field_type = gfc_get_ppc_type (c);
+   else if (c->attr.proc_pointer && derived->backend_decl)
+   {
+ tmp = build_function_type_list (derived->backend_decl, NULL_TREE);
+ field_type = build_pointer_type (tmp);
+   }
else if (c->ts.type == BT_DERIVED || c->ts.type == BT_CLASS)
  field_type = c->ts.u.derived->backend_decl;
else
Index: gcc/testsuite/gfortran.dg/pr66465.f90
===
*** gcc/testsuite/gfortran.dg/pr66465.f90   (revision 0)
--- gcc/testsuite/gfortran.dg/pr66465.f90   (working copy)
***
*** 0 
--- 1,23 
+ ! { dg-do compile }
+ !
+ ! Tests the fix for PR66465, in which the arguments of the call to
+ ! ASSOCIATED were falsly detected to have different type/kind.
+ !
+ ! Contributed by Damian Rouson  
+ !
+   interface
+  real function HandlerInterfa

Re: [gomp4, committed] Implement -foffload-alias

2015-11-04 Thread Tom de Vries

On 04/11/15 09:47, Thomas Schwinge wrote:

+/* Check that the loop has been split off into a function.  */
>+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo._omp_fn.0" 1 
"optimized" } } */

For C we get:

 ;; Function foo._omp_fn.0 (foo._omp_fn.0, funcdef_no=12, decl_uid=2534, 
cgraph_uid=14, symbol_order=14)

..., so that matches, but for C++ we get:

 ;; Function foo(unsigned int*, unsigned int*, unsigned int*) [clone 
._omp_fn.0] (_ZL3fooPjS_S_._omp_fn.0, funcdef_no=12, decl_uid=2416, 
cgraph_uid=14, symbol_order=14)

..., which doesn't match, so this directive FAILs.



Hi Thomas,

thanks for noticing.

Fixed as attached.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Fixup goacc/kernels-loop-offload-alias-none.c

2015-11-04  Tom de Vries  

	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: Fix
	foo._omp_fn.0 function name scanning.
---
 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
index bb96330..79d8daa 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
@@ -49,7 +49,7 @@ main (void)
 }
 
 /* Check that the loop has been split off into a function.  */
-/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
 
 /* { dg-final { scan-tree-dump-times "clique 1 base 1" 3 "alias" } } */
 /* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "alias" } } */
-- 
1.9.1



Re: [PATCH v2] [PR debug/67192] Fix C loops' back-jump location

2015-11-04 Thread Andreas Arnez
On Thu, Oct 29 2015, Andreas Arnez wrote:

> On Thu, Oct 29 2015, Bernd Schmidt wrote:
>> [...]
>> i.e. the breakpoint on the code inside the loop is reached before the
>> while statement itself. This may be the expected behaviour with your
>> patch, but I'm not sure it's really desirable for debugging.
>
> [...]
> Maybe we could also improve the behavior of breaking on "while (1)" by
> generating a NOP for it?  Or by using the first loop body's token
> instead?

I've basically tried the latter, and it seems to work pretty well.  It
solves all the issues discussed in this mail thread, and I haven't found
any other issues with it.  I'll post the patch separately.

>> I'd suggest you commit your original patch to fix the
>> misleading-indent problem while we sort this out.
>
> I can certainly do that.  But note that the original patch does not
> solve the misleading-indent regression caused for f2() in the new
> version of pr67192.c.  Thus the PR is not really fixed by it.

I've slightly changed the test case for that one, so I'll repost it as
well before committing it.

--
Andreas



Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-04 Thread Martin Sebor

I share your concerns, but I'm also sympathetic to the changes that
the Taller Technologies team are trying to make, to allow libstdc++ to
be more useful in exception-free systems.

At the very least the patch to doc/xml/manual/configure.xml must
document that this option enables behaviour that violates the standard
and so produces a non-conforming implementation. It should probably
also explain how to use the resulting implementation, so that users on
embedded systems who might want to enable this know how to write a
suitable new-handler.


If there is a problem with programs too easily getting themselves
into an infinite loop because of improperly written new handlers
it might be worth proposing a change to C++ to make the minimum
number of iterations the loop must make before giving up
implementation-defined. That way this could be handled the same
way regardless of the environment.

Martin



Re: [gomp4] fortran cleanups and c/c++ loop parsing backport

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 03:39 AM, Thomas Schwinge wrote:

> On Tue, 27 Oct 2015 11:36:10 -0700, Cesar Philippidis 
>  wrote:
>>   * Proposed fortran cleanups and enhanced error reporting changes:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html
> 
> ... has now been applied to trunk, in an altered version, so we now got
> some divergence between trunk and gomp-4_0-branch to resolve.

I've been concentrating on trunk lately. I'll backport these changes
when my other fortran changes go in
(https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00287.html).

Cesar



Re: [PATCH] Pass manager: add support for termination of pass list

2015-11-04 Thread Richard Biener
On Wed, Nov 4, 2015 at 11:38 AM, Martin Liška  wrote:
> On 11/03/2015 03:44 PM, Richard Biener wrote:
>> On Tue, Nov 3, 2015 at 3:13 PM, Martin Liška  wrote:
>>> On 11/03/2015 02:46 PM, Richard Biener wrote:
 On Fri, Oct 30, 2015 at 1:53 PM, Martin Liška  wrote:
> On 10/30/2015 01:13 PM, Richard Biener wrote:
>> So I suggest to do the push/pop of cfun there.
>> do_per_function_toporder can be made static btw.
>>
>> Richard.
>
> Right, I've done that and it works (bootstrap has been currently running),
> feasible for HSA branch too.
>
> tree-pass.h:
>
> /* Declare for plugins.  */
> extern void do_per_function_toporder (void (*) (function *, void *), void 
> *);
>
> Attaching the patch that I'm going to test.

 Err.

 +  cgraph_node::get (current_function_decl)->release_body ();
 +
 +  current_function_decl = NULL;
 +  set_cfun (NULL);

 I'd have expected

   tree fn = cfun->decl;
   pop_cfun ();
   gcc_assert (!cfun);
   cgraph_node::get (fn)->release_body ();

 here.
>>>
>>> Yeah, that works, but we have to add following hunk:
>>>
>>> diff --git a/gcc/function.c b/gcc/function.c
>>> index aaf49a4..4718fe1 100644
>>> --- a/gcc/function.c
>>> +++ b/gcc/function.c
>>> @@ -4756,6 +4756,13 @@ push_cfun (struct function *new_cfun)
>>>  void
>>>  pop_cfun (void)
>>>  {
>>> +  if (cfun_stack.is_empty ())
>>> +{
>>> +  set_cfun (NULL);
>>> +  current_function_decl = NULL_TREE;
>>> +  return;
>>> +}
>>> +
>>>struct function *new_cfun = cfun_stack.pop ();
>>>/* When in_dummy_function, we do have a cfun but current_function_decl is
>>>   NULL.  We also allow pushing NULL cfun and subsequently changing
>>
>> Why again?  cfun should be set via push_cfun here so what's having cfun == 
>> NULL
>> at the pop_cfun point?  Or rather, what code used set_cfun () instead
>> of push_cfun ()?
>>
>>>
>>> If you are fine with that, looks we've fixed all issues related to the 
>>> change, right?
>>> Updated version of the is attached.
>>>
>>> Martin
>>>

> Martin
>
>>>
>
> Hi Richard.
>
> There's the patch we talked about yesterday. It contains a few modification 
> that
> were necessary to make it working:
>
> 1) cgraph_node::expand_thunk uses just set_cfun (NULL) & 
> current_function_decl = NULL because
>
>   bb = then_bb = else_bb = return_bb
> = init_lowered_empty_function (thunk_fndecl, true, count);
>
> in last branch of expand_thunk which calls allocate_struct_function.
>
> 2) i386.c and rs6000.c call init_function_start, as I move preamble to 
> callers, I did the same
> for these two functions
>
> I've been running regression test and bootstrap.

Looks good to me.

Thanks,
Richard.

> Martin


Re: OpenACC dimension range propagation optimization

2015-11-04 Thread Richard Biener
On Wed, Nov 4, 2015 at 2:56 PM, Nathan Sidwell  wrote:
> On 11/04/15 05:26, Richard Biener wrote:
>>
>> On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell  wrote:
>>>
>>> Richard,
>>> this patch implements VRP for the 2 openacc axis internal fns I've added.
>>> We know the position within  a dimension cannot exceed that dimensions
>>> extend. Further, if the extend is dynamic, the target backend may well
>>> know
>>> there's a hardware-mandated maximum value.
>>>
>>> Hence, added a new target hook to allow the backend to specify that upper
>>> bound, and added smarts to extract_range_basic to process the two
>>> internal
>>> functions.
>>>
>>> Incidentally, this was the bit I was working on at the cauldron, which
>>> caused me to work on the min/max range combining.
>>>
>>> ok for trunk?
>>
>>
>> + /* Optimizing these two internal functions helps the loop
>> +optimizer eliminate outer comparisons.  Size is [1,N]
>> +and pos is [0,N-1].  */
>> + {
>> +   bool is_pos = ifn_code == IFN_GOACC_DIM_POS;
>> +   tree attr = get_oacc_fn_attrib (current_function_decl);
>> +   tree arg = gimple_call_arg (stmt, 0);
>> +   int axis = TREE_INT_CST_LOW (arg);
>> +   tree dims = TREE_VALUE (attr);
>> +
>> +   for (int ix = axis; ix--;)
>> + dims = TREE_CHAIN (dims);
>> +   int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
>> +
>> +   if (!size)
>> + /* If it's dynamic, the backend might know a hardware
>> +limitation.  */
>> + size = targetm.goacc.dim_limit (axis);
>>
>> this all seems a little bit fragile and relying on implementation details?
>> Is the attribute always present?
>
>
> Yes.  The ifns are inserted by a pass (execute_oacc_device_lower) that only
> operates on such functions. (I considered adding  an assert, but figured the
> resulting segfault  would be loud enough)
>
>> Is the call argument always a constant
>> that fits in a HOST_WIDE_INT (or even int here)?
>
>
> Yes.
>
>
> Are there always enough
>>
>> 'dims' in the tree list?
>
>
> Yes.  The oacc_device_lower pass has already validated and canonicalized the
> dimensions.
>
>
> Is the 'dim' value always an INTEGER_CST that
>>
>> fits a HOST_WIDE_INT (or even an int here)?
>
>
> Yes.  That's part of the validation.  see set_oacc_fn_attrib (omp-low.c)
>
>> I hope all these constraints (esp. 'int' fitting) are verified by the
>> parser.
>
>
> It's an internal function not visible the user.  Generation is entirely
> within the compiler
>
>> If so I'd like to see helper functions to hide these implementation
>> details
>> from generic code like this.
>
>
> ok.
>
>>
>> You miss to provide the known lower bound to VRP when size is 0
>> in the end.  Just unconditioonally do
>>
>>tree type = TREE_TYPE (gimple_call_lhs (stmt));
>>set_value_range (vr, VR_RANGE,
>> build_int_cst (type, is_pos ? 0 : 1),
>> size
>> ? build_int_cst (type, size - is_pos)
>> : vrp_val_max (type), NULL);
>
>
> I'm confused.  If size is zero, we never execute that path, and IIUC
> therefore never specify a range.  What you suggest looks like an additional
> improvement though.  Is that what you meant?

Yes.

> nathan


[PATCH] PR driver/67613 - spell suggestions for misspelled command line options

2015-11-04 Thread David Malcolm
This patch adds hints to the option-not-found error in the driver,
using the Levenshtein distance implementation posted here:
"[PATCH 0/2] Levenshtein-based suggestions (v3)"
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03379.html

It splits out the identifier-based implementation into a new
spellcheck-tree.c, keeping the core in spellcheck.c, since the tree
checking code needed for IDENTIFIER_POINTER etc isn't available in
the driver.

Bootstrapped®rtested on top of the other patch kit (with nit fixes)
on x86_64-pc-linux-gnu; adds 5 PASS results to gcc.sum.

OK for trunk as a followup to that patch kit?

gcc/ChangeLog:
PR driver/67613
* Makefile.in (GCC_OBJS): Add spellcheck.o.
(OBJS): Add spellcheck-tree.o.
* gcc.c: Include "spellcheck.h".
(suggest_option): New function.
(driver::handle_unrecognized_options): Call suggest_option to
provide a hint about misspelled options.
* spellcheck.c: Update file comment.
(levenshtein_distance): Convert 4-param implementation from static
to extern scope.  Remove note about unit tests from leading
comment for const char * implementation.  Move tree
implementation to...
* spellcheck-tree.c: New file.
* spellcheck.h (levenshtein_distance):  Add 4-param decl.

gcc/testsuite/ChangeLog:
PR driver/67613
* gcc/testsuite/gcc.dg/spellcheck-options-1.c: New file.
* gcc/testsuite/gcc.dg/spellcheck-options-2.c: New file.
---
 gcc/Makefile.in |  3 +-
 gcc/gcc.c   | 51 -
 gcc/spellcheck-tree.c   | 39 ++
 gcc/spellcheck.c| 21 ++--
 gcc/spellcheck.h|  4 +++
 gcc/testsuite/gcc.dg/spellcheck-options-1.c |  4 +++
 gcc/testsuite/gcc.dg/spellcheck-options-2.c |  5 +++
 7 files changed, 107 insertions(+), 20 deletions(-)
 create mode 100644 gcc/spellcheck-tree.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-1.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-2.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9fb643e..a57bd40 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1151,7 +1151,7 @@ CXX_TARGET_OBJS=@cxx_target_objs@
 FORTRAN_TARGET_OBJS=@fortran_target_objs@
 
 # Object files for gcc many-languages driver.
-GCC_OBJS = gcc.o gcc-main.o ggc-none.o
+GCC_OBJS = gcc.o gcc-main.o ggc-none.o spellcheck.o
 
 c-family-warn = $(STRICT_WARN)
 
@@ -1395,6 +1395,7 @@ OBJS = \
simplify-rtx.o \
sparseset.o \
spellcheck.o \
+   spellcheck-tree.o \
sreal.o \
stack-ptr-mod.o \
statistics.o \
diff --git a/gcc/gcc.c b/gcc/gcc.c
index bbc9b23..ef0fb4e 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -42,6 +42,7 @@ compilation is specified by a string called a "spec".  */
 #include "opts.h"
 #include "params.h"
 #include "filenames.h"
+#include "spellcheck.h"
 
 
 
@@ -7598,6 +7599,45 @@ driver::maybe_putenv_OFFLOAD_TARGETS () const
   offload_targets = NULL;
 }
 
+/* Helper function for driver::handle_unrecognized_options.
+
+   Given an unrecognized option BAD_OPT (without the leading dash),
+   locate the closest reasonable matching option (again, without the
+   leading dash), or NULL.  */
+
+static const char *
+suggest_option (const char *bad_opt)
+{
+  const cl_option *best_option = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+
+  for (unsigned int i = 0; i < cl_options_count; i++)
+{
+  edit_distance_t dist = levenshtein_distance (bad_opt,
+  cl_options[i].opt_text + 1);
+  if (dist < best_distance)
+   {
+ best_distance = dist;
+ best_option = &cl_options[i];
+   }
+}
+
+  if (!best_option)
+return NULL;
+
+  /* If more than half of the letters were misspelled, the suggestion is
+ likely to be meaningless.  */
+  if (best_option)
+{
+  unsigned int cutoff = MAX (strlen (bad_opt),
+strlen (best_option->opt_text + 1)) / 2;
+  if (best_distance > cutoff)
+   return NULL;
+}
+
+  return best_option->opt_text + 1;
+}
+
 /* Reject switches that no pass was interested in.  */
 
 void
@@ -7605,7 +7645,16 @@ driver::handle_unrecognized_options () const
 {
   for (size_t i = 0; (int) i < n_switches; i++)
 if (! switches[i].validated)
-  error ("unrecognized command line option %<-%s%>", switches[i].part1);
+  {
+   const char *hint = suggest_option (switches[i].part1);
+   if (hint)
+ error ("unrecognized command line option %<-%s%>;"
+" did you mean %<-%s%>?",
+switches[i].part1, hint);
+   else
+ error ("unrecognized command line option %<-%s%>",
+switches[i].part1);
+  }
 }
 
 /* Handle the various -print-* options, returning 0 if the driver
d

Re: [PATCH 3/6] Share code from fold_array_ctor_reference with fold.

2015-11-04 Thread Alan Lawrence
> s/explicitely/explicitly/  And remove the '*' from the 2nd and 3rd lines
> of the comment.
>
> It looks like get_ctor_element_at_index has numerous formatting
> problems.  In particular you didn't indent the braces across the board
> properly.  Also check for tabs vs spaces issues please.

Yes, you are quite right, I'm not quite sure how those crept in. (Well, the
'explicitely' I am sure was a copy-paste error but the others!). Apologies...

I already committed the offending code as r229605, but here's a patch to clean
it up - does it look ok now?

Thanks, Alan
---
 gcc/fold-const.c | 58 
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ee9b349..e977b49 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -11855,16 +11855,16 @@ get_array_ctor_element_at_index (tree ctor, 
offset_int access_index)
   offset_int low_bound = 0;
 
   if (TREE_CODE (TREE_TYPE (ctor)) == ARRAY_TYPE)
-  {
-tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
-if (domain_type && TYPE_MIN_VALUE (domain_type))
 {
-  /* Static constructors for variably sized objects makes no sense.  */
-  gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
-  index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
-  low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
+  tree domain_type = TYPE_DOMAIN (TREE_TYPE (ctor));
+  if (domain_type && TYPE_MIN_VALUE (domain_type))
+   {
+ /* Static constructors for variably sized objects makes no sense.  */
+ gcc_assert (TREE_CODE (TYPE_MIN_VALUE (domain_type)) == INTEGER_CST);
+ index_type = TREE_TYPE (TYPE_MIN_VALUE (domain_type));
+ low_bound = wi::to_offset (TYPE_MIN_VALUE (domain_type));
+   }
 }
-  }
 
   if (index_type)
 access_index = wi::ext (access_index, TYPE_PRECISION (index_type),
@@ -11880,29 +11880,29 @@ get_array_ctor_element_at_index (tree ctor, 
offset_int access_index)
   tree cfield, cval;
 
   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
-  {
-/* Array constructor might explicitely set index, or specify range
- * or leave index NULL meaning that it is next index after previous
- * one.  */
-if (cfield)
 {
-  if (TREE_CODE (cfield) == INTEGER_CST)
-   max_index = index = wi::to_offset (cfield);
+  /* Array constructor might explicitly set index, or specify a range,
+or leave index NULL meaning that it is next index after previous
+one.  */
+  if (cfield)
+   {
+ if (TREE_CODE (cfield) == INTEGER_CST)
+   max_index = index = wi::to_offset (cfield);
+ else
+   {
+ gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
+ index = wi::to_offset (TREE_OPERAND (cfield, 0));
+ max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
+   }
+   }
   else
-  {
-   gcc_assert (TREE_CODE (cfield) == RANGE_EXPR);
-   index = wi::to_offset (TREE_OPERAND (cfield, 0));
-   max_index = wi::to_offset (TREE_OPERAND (cfield, 1));
-  }
-}
-else
-{
-  index += 1;
-  if (index_type)
-   index = wi::ext (index, TYPE_PRECISION (index_type),
-TYPE_SIGN (index_type));
-   max_index = index;
-}
+   {
+ index += 1;
+ if (index_type)
+   index = wi::ext (index, TYPE_PRECISION (index_type),
+TYPE_SIGN (index_type));
+ max_index = index;
+   }
 
 /* Do we have match?  */
 if (wi::cmpu (access_index, index) >= 0
-- 
1.9.1



Re: [PATCH 9/9] ENABLE_CHECKING refactoring: C family front ends

2015-11-04 Thread Mikhail Maltsev
On 11/03/2015 02:34 AM, Jeff Law wrote:
> 
>> @@ -14279,8 +14272,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t
>> complain, tree in_decl)
>> return tsubst_binary_right_fold (t, args, complain, in_decl);
>>
>>   default:
>> -  /* We shouldn't get here, but keep going if !ENABLE_CHECKING.  */
>> -  gcc_checking_assert (false);
>> +  /* We shouldn't get here, but keep going if !flag_checking.  */
>> +  if (!flag_checking)
>> +gcc_unreachable ();
>> return t;
>>   }
>>   }
> I think this condition was reversed, right?
> 
> Please fix that and install on the trunk.
> 
> Thanks again!
> 
> jeff

Fixed and committed as r229756.

-- 
Regards,
Mikhail Maltsev


Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-04 Thread Ramana Radhakrishnan


On 03/11/15 17:09, James Greenhalgh wrote:
> On Tue, Nov 03, 2015 at 04:36:45PM +, Ramana Radhakrishnan wrote:
>> Hi,
>>
>>  Now that PR63304 is fixed and we have an option to address
>> any part of the memory using adrp / add or adrp / ldr instructions
>> it makes sense to switch out literal pools into their own
>> mergeable sections by default.
>>
>> This would mean that by default we could now start getting
>> the benefits of constant sharing across the board, potentially
>> improving code size. The other advantage of doing so, for the
>> security conscious is that this prevents intermingling of literal
>> pools with code.
>>
>> Wilco's kindly done some performance measurements and suggests that
>> there is not really a performance regression in doing this.
>> I've looked at the code size for SPEC2k6 today at -Ofast and
>> in general there is a good code size improvement as expected
>> by sharing said constants.
>>
>> Tested on aarch64-none-elf with no regressions and bootstrapped 
>> and regression tested in my tree for a number of days now.
>>
>> Ok to commit ?
> 
> OK with the comment nits below fixed up.
> 
>>* config/aarch64/aarch64.c (aarch64_select_rtx_section): Switch
>> to default section handling for non PC relative literal loads.
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 5c8604f..9d709e5 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -5244,13 +5244,22 @@ aarch64_use_blocks_for_constant_p (machine_mode mode 
>> ATTRIBUTE_UNUSED,
>>return false;
>>  }
>>  
>> +/* Force all constant pool entries into the current function section.
> 
> Is this comment accurate now? I think it only applied to -mcmodel=large
> but maybe I'm misunderstanding?
> 
>> +   In the large model we cannot reliably address all the address space
>> +   thus for now, inline this with the text.  */
>>  static section *
>> +aarch64_select_rtx_section (machine_mode mode,
>> +rtx x,
>> +unsigned HOST_WIDE_INT align)
>> +{
>> +  /* Force all constant pool entries into the current function section.
>> + In the large model we cannot reliably address all the address space
>> + thus for now, inline this with the text.  */
>> +  if (!aarch64_nopcrelative_literal_loads
>> +  || aarch64_cmodel == AARCH64_CMODEL_LARGE)
>> +return function_section (current_function_decl);
> 
> This is just a copy paste of the text above (and probably the better place
> for it).
> 
> I think we'd want a more general comment at the top of the function,
> then this can stay.
> 

True and I've just been reading more of the backend - We could now start using 
blocks for constant pools as well. So let's do that.

How does something like this look ?

Tested on aarch64-none-elf - no regressions.

2015-11-04  Ramana Radhakrishnan  

* config/aarch64/aarch64.c
(aarch64_can_use_per_function_literal_pools_p): New.
(aarch64_use_blocks_for_constant_p): Adjust declaration
and use aarch64_can_use_function_literal_pools_p.
(aarch64_select_rtx_section): Update.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..4f5e069 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5235,24 +5235,37 @@ aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
   return 0;
 }
 
+/* Constant pools are per function only when PC relative
+   literal loads are true or we are in the large memory
+   model.  */
+static inline bool
+aarch64_can_use_per_function_literal_pools_p (void)
+{
+  return (!aarch64_nopcrelative_literal_loads
+ || aarch64_cmodel == AARCH64_CMODEL_LARGE);
+}
+
 static bool
-aarch64_use_blocks_for_constant_p (machine_mode mode ATTRIBUTE_UNUSED,
-  const_rtx x ATTRIBUTE_UNUSED)
+aarch64_use_blocks_for_constant_p (machine_mode, const_rtx)
 {
   /* We can't use blocks for constants when we're using a per-function
  constant pool.  */
-  return false;
+  return !aarch64_can_use_per_function_literal_pools_p ();
 }
 
+/* Select appropriate section for constants depending
+   on where we place literal pools.  */
+
 static section *
-aarch64_select_rtx_section (machine_mode mode ATTRIBUTE_UNUSED,
-   rtx x ATTRIBUTE_UNUSED,
-   unsigned HOST_WIDE_INT align ATTRIBUTE_UNUSED)
+aarch64_select_rtx_section (machine_mode mode,
+   rtx x,
+   unsigned HOST_WIDE_INT align)
 {
-  /* Force all constant pool entries into the current function section.  */
-  return function_section (current_function_decl);
-}
+  if (aarch64_can_use_per_function_literal_pools_p ())
+return function_section (current_function_decl);
 
+  return default_elf_select_rtx_section (mode, x, align);
+}
 
 /* Costs.  */
 


[committed] Handle recursive restrict in function parameter

2015-11-04 Thread Tom de Vries

On 04/11/15 10:28, Richard Biener wrote:

I think I can postpone the creation of the heapvar till where you suggest in
>create_variable_info_for_1, but I'd still need a means
>to communicate the TREE_TYPE (field_type) from push_fields_onto_fieldstack to
>create_variable_info_for_1.
>
>A simple implementation would be a new field:
>...
>@@ -5195,6 +5197,8 @@ struct fieldoff
>unsigned may_have_pointers : 1;
>
>unsigned only_restrict_pointers : 1;
>+
>+ tree restrict_pointed_type;
>};
>...
>Which AFAIU will change fieldoff size.

It's ok to change fieldoff size if there is a reason to;)

Patch is ok along this line.


Updated patch accordingly.

Bootstrapped and reg-tested on x86_64.

Committed to trunk as attached.

Thanks,
- Tom
Handle recursive restrict in function parameter

2015-11-04  Tom de Vries  

	PR tree-optimization/67742
	* tree-ssa-structalias.c (struct fieldoff): Add restrict_pointed_type
	field.
	(push_fields_onto_fieldstack): Handle restrict_pointed_type field.
	(create_variable_info_for_1): Add and handle handle_param parameter.
	Add restrict handling.
	(create_variable_info_for): Call create_variable_info_for_1 with extra
	arg.
	(make_param_constraints): Drop restrict_name parameter.  Ignore
	vi->only_restrict_pointers.
	(intra_create_variable_infos): Call create_variable_info_for_1 with
	extra arg.  Remove restrict handling.  Call make_param_constraints with
	one less arg.

	* gcc.dg/tree-ssa/restrict-7.c: New test.
	* gcc.dg/tree-ssa/restrict-8.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c | 12 +
 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c | 17 +++
 gcc/tree-ssa-structalias.c | 78 +-
 3 files changed, 74 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
new file mode 100644
index 000..f7a68c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int
+f (int *__restrict__ *__restrict__ *__restrict__ a, int *b)
+{
+  *b = 1;
+  ***a  = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
new file mode 100644
index 000..b0ab164
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct s
+{
+  int *__restrict__ *__restrict__ pp;
+};
+
+int
+f (struct s s, int *b)
+{
+  *b = 1;
+  **s.pp = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 98b5f16..52a35f6 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -307,6 +307,7 @@ static varinfo_t first_or_preceding_vi_for_offset (varinfo_t,
 		   unsigned HOST_WIDE_INT);
 static varinfo_t lookup_vi_for_tree (tree);
 static inline bool type_can_have_subvars (const_tree);
+static void make_param_constraints (varinfo_t);
 
 /* Pool of variable info structures.  */
 static object_allocator variable_info_pool
@@ -393,7 +394,6 @@ new_var_info (tree t, const char *name, bool add_id)
   return ret;
 }
 
-
 /* A map mapping call statements to per-stmt variables for uses
and clobbers specific to the call.  */
 static hash_map *call_stmt_vars;
@@ -5195,6 +5195,8 @@ struct fieldoff
   unsigned may_have_pointers : 1;
 
   unsigned only_restrict_pointers : 1;
+
+  tree restrict_pointed_type;
 };
 typedef struct fieldoff fieldoff_s;
 
@@ -5340,7 +5342,8 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 	if (!pair
 		&& offset + foff != 0)
 	  {
-		fieldoff_s e = {0, offset + foff, false, false, false, false};
+		fieldoff_s e
+		  = {0, offset + foff, false, false, false, false, NULL_TREE};
 		pair = fieldstack->safe_push (e);
 	  }
 
@@ -5374,6 +5377,8 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 		  = (!has_unknown_size
 		 && POINTER_TYPE_P (field_type)
 		 && TYPE_RESTRICT (field_type));
+		if (e.only_restrict_pointers)
+		  e.restrict_pointed_type = TREE_TYPE (field_type);
 		fieldstack->safe_push (e);
 	  }
 	  }
@@ -5642,10 +5647,11 @@ check_for_overlaps (vec fieldstack)
 
 /* Create a varinfo structure for NAME and DECL, and add it to VARMAP.
This will also create any varinfo structures necessary for fields
-   of DECL.  */
+   of DECL.  DECL is a function parameter if HANDLE_PARAM is set.  */
 
 static varinfo_t
-create_variable_info_for_1 (tree decl, const char *name, bool add_id)
+create_variable_info_for_1 (tree decl, const char *name, bool add_id,
+			bool handle_param)
 {
   varinfo_t vi, newvi;
   tree decl_type = TREE_TYPE (decl);
@@ -5721,6 

[trivial, committed] Use decl_type in create_variable_info_for_1

2015-11-04 Thread Tom de Vries

Hi,

this patch uses the the decl_type variable more consistently in 
create_variable_info_for_1.


Bootstrapped and reg-tested on x86_64.

Committed to trunk as trivial.

Thanks,
- Tom
Use decl_type in create_variable_info_for_1

2015-11-04  Tom de Vries  

	* tree-ssa-structalias.c (create_variable_info_for_1): Use decl_type
	variable.
---
 gcc/tree-ssa-structalias.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index ded5a1e..98b5f16 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5718,8 +5718,8 @@ create_variable_info_for_1 (tree decl, const char *name, bool add_id)
   vi->fullsize = tree_to_uhwi (declsize);
   vi->size = vi->fullsize;
   vi->is_full_var = true;
-  if (POINTER_TYPE_P (TREE_TYPE (decl))
-	  && TYPE_RESTRICT (TREE_TYPE (decl)))
+  if (POINTER_TYPE_P (decl_type)
+	  && TYPE_RESTRICT (decl_type))
 	vi->only_restrict_pointers = 1;
   fieldstack.release ();
   return vi;
-- 
1.9.1



Re: [2/3] OpenACC reductions

2015-11-04 Thread Nathan Sidwell

On 11/04/15 08:27, Bernd Schmidt wrote:

On 11/02/2015 05:35 PM, Nathan Sidwell wrote:





There are two such switch statements, and it's possible to write this more
compactly:
   if (!INTEGRAL_MODE_P (...))
 code = VIEW_CONVERT_EXPR;
   if (GET_MODE_SIZE (...) == 8)
 fn = CMP_SWAPLL;
Not required, you can decide which you like better.


thanks, not noticed that (the switch was originally more complex)

nathan


Re: [1/3] OpenACC reductions

2015-11-04 Thread Jakub Jelinek
On Wed, Nov 04, 2015 at 08:58:26AM -0500, Nathan Sidwell wrote:
> On 11/04/15 05:31, Jakub Jelinek wrote:
> >On Tue, Nov 03, 2015 at 11:01:57AM -0500, Nathan Sidwell wrote:
> >>On 11/03/15 10:46, Jakub Jelinek wrote:
> >>>On Mon, Nov 02, 2015 at 11:18:37AM -0500, Nathan Sidwell wrote:
> This is the core execution bits of OpenACC reductions.
> 
> We have a new internal fn 'IFN_GOACC_REDUCTION' and a new target hook
> goacc.reduction, to lower it on the target compiler.
> >>>
> >>>So, let me start with a few questions:
> >>>1) does OpenACC allow UDRs or only the built-in reductions?  If it
> >>>does not allow UDRs, do you have it covered by testcases that you
> >>>disallow parsing of them (e.g. when you have
> >>
> >>no UDR reductions.  Will check test cases for that.
> >
> >BTW, what about min/max reductions for C/C++?  Those were added in OpenMP
> >3.1, so perhaps OpenACC copied them.
> 
> OpenACC has min/max, and this is exercised on gomp4.  we'll get to porting
> more testcases after this rush is done, ok?  (Or is there something specific
> about min/max?)

No, just wanted to know what you need to disable in the reduction clause
parsing...
For e.g. C it might be enough to add
if (!openacc)
{
and } around:
reduc_id = c_parser_peek_token (parser)->value;
break;

Jakub


Re: [c++-delayed-folding] Introduce convert_to_real_nofold

2015-11-04 Thread Marek Polacek
On Wed, Nov 04, 2015 at 10:32:52AM +0100, Richard Biener wrote:
> On Tue, Nov 3, 2015 at 5:53 PM, Marek Polacek  wrote:
> > The last piece for convert.c.  Since convert_to_real uses fold ()
> > rather than fold_buildN, I defined a new macro to keep the code
> > more compact.
> >
> > With this committed, convert.c should be dealt with.  If there's
> > anything else I could help with, please let me know.
> >
> > Bootstrapped/regtested on x86_64-linux, ok for branch?
> 
> I wonder what happens (on trunk) when you just remove the fold () calls.

Nothing much, at least the following patch passes testing.

Bootstrapped/regtested on x86_64-linux.

2015-11-04  Marek Polacek  

* convert.c (convert_to_real): Remove calls to fold.

diff --git gcc/convert.c gcc/convert.c
index 113c11f..8ef1949 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -211,7 +211,7 @@ convert_to_real (tree type, tree expr)
 
  if (fn)
  {
-   tree arg = fold (convert_to_real (newtype, arg0));
+   tree arg = convert_to_real (newtype, arg0);
expr = build_call_expr (fn, 1, arg);
if (newtype == type)
  return expr;
@@ -235,8 +235,7 @@ convert_to_real (tree type, tree expr)
  && FLOAT_TYPE_P (itype)
  && TYPE_PRECISION (type) < TYPE_PRECISION (itype))
return build1 (TREE_CODE (expr), type,
-  fold (convert_to_real (type,
- TREE_OPERAND (expr, 0;
+  convert_to_real (type, TREE_OPERAND (expr, 0)));
  break;
/* Convert (outertype)((innertype0)a+(innertype1)b)
   into ((newtype)a+(newtype)b) where newtype
@@ -272,8 +271,8 @@ convert_to_real (tree type, tree expr)
  || newtype == dfloat128_type_node)
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+convert_to_real (newtype, arg0),
+convert_to_real (newtype, arg1));
  if (newtype == type)
return expr;
  break;
@@ -312,8 +311,8 @@ convert_to_real (tree type, tree expr)
  && !excess_precision_type (newtype
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+convert_to_real (newtype, arg0),
+convert_to_real (newtype, arg1));
  if (newtype == type)
return expr;
}

Marek


Re: [1/3] OpenACC reductions

2015-11-04 Thread Nathan Sidwell

On 11/04/15 05:31, Jakub Jelinek wrote:

On Tue, Nov 03, 2015 at 11:01:57AM -0500, Nathan Sidwell wrote:

On 11/03/15 10:46, Jakub Jelinek wrote:

On Mon, Nov 02, 2015 at 11:18:37AM -0500, Nathan Sidwell wrote:

This is the core execution bits of OpenACC reductions.

We have a new internal fn 'IFN_GOACC_REDUCTION' and a new target hook
goacc.reduction, to lower it on the target compiler.


So, let me start with a few questions:
1) does OpenACC allow UDRs or only the built-in reductions?  If it
does not allow UDRs, do you have it covered by testcases that you
disallow parsing of them (e.g. when you have


no UDR reductions.  Will check test cases for that.


BTW, what about min/max reductions for C/C++?  Those were added in OpenMP
3.1, so perhaps OpenACC copied them.


OpenACC has min/max, and this is exercised on gomp4.  we'll get to porting  more 
testcases after this rush is done, ok?  (Or is there something specific about 
min/max?)



nathan



Re: [2/3] OpenACC reductions

2015-11-04 Thread Nathan Sidwell

On 11/04/15 05:01, Jakub Jelinek wrote:

On Mon, Nov 02, 2015 at 11:35:34AM -0500, Nathan Sidwell wrote:

2015-11-02  Nathan Sidwell  
Cesar Philippidis  

* config/nvptx/nvptx.c: Include gimple headers.
(worker_red_size, worker_red_align, worker_red_name,

...

I think you can approve this yourself, or do you want Bernd to review it?


I was posting for completeness, and Bernd often spots things I missed.

thanks all!

nathan


Re: [PATCH] Better error messages for merge-conflict markers (v3)

2015-11-04 Thread Bernd Schmidt

On 10/30/2015 04:16 PM, David Malcolm wrote:

The idea is to more gracefully handle merger conflict markers
in the source code being compiled.  Specifically, in the C and
C++ frontends, if we're about to emit an error, see if the
source code is at a merger conflict marker, and if so, emit
a more specific message, so that the user gets this:

foo.c:1:1: error: source file contains patch conflict marker
  <<< HEAD
  ^

It's something of a "fit and finish" cosmetic item, but these
things add up.


This seems like fairly low impact but also low cost, so I'm fine with it 
in principle. I wonder whether the length of the marker is the same 
across all versions of patch (and VC tools)?



+static bool
+c_parser_peek_conflict_marker (c_parser *parser, enum cpp_ttype tok1_kind)
+{
+  c_token *token2 = c_parser_peek_2nd_token (parser);
+  if (token2->type != tok1_kind)
+return false;
+  c_token *token3 = c_parser_peek_nth_token (parser, 3);
+  if (token3->type != tok1_kind)
+return false;
+  c_token *token4 = c_parser_peek_nth_token (parser, 4);
+  if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind))
+return false;
+  return true;
+}


Just thinking out loud - I guess it would be too much to hope for to 
share lexers between frontends so that we need only one copy of this?



+extern short some_var; /* this line would lead to a warning */


Would or does? I don't see anything suppressing it?

There seems to be no testcase verifying what happens if the marker is 
not at the start of the line (IMO it should not be interpreted as a marker).


It would be good to have buy-in from the frontend maintainers (Joseph 
commented on v1 and as far as I can see you've addressed his feedback). 
If you do not hear back from them by the end of the week, I'll approve 
it if the start-of-line thing is sorted.



Bernd


Re: OpenACC dimension range propagation optimization

2015-11-04 Thread Nathan Sidwell

On 11/04/15 05:26, Richard Biener wrote:

On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell  wrote:

Richard,
this patch implements VRP for the 2 openacc axis internal fns I've added.
We know the position within  a dimension cannot exceed that dimensions
extend. Further, if the extend is dynamic, the target backend may well know
there's a hardware-mandated maximum value.

Hence, added a new target hook to allow the backend to specify that upper
bound, and added smarts to extract_range_basic to process the two internal
functions.

Incidentally, this was the bit I was working on at the cauldron, which
caused me to work on the min/max range combining.

ok for trunk?


+ /* Optimizing these two internal functions helps the loop
+optimizer eliminate outer comparisons.  Size is [1,N]
+and pos is [0,N-1].  */
+ {
+   bool is_pos = ifn_code == IFN_GOACC_DIM_POS;
+   tree attr = get_oacc_fn_attrib (current_function_decl);
+   tree arg = gimple_call_arg (stmt, 0);
+   int axis = TREE_INT_CST_LOW (arg);
+   tree dims = TREE_VALUE (attr);
+
+   for (int ix = axis; ix--;)
+ dims = TREE_CHAIN (dims);
+   int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+
+   if (!size)
+ /* If it's dynamic, the backend might know a hardware
+limitation.  */
+ size = targetm.goacc.dim_limit (axis);

this all seems a little bit fragile and relying on implementation details?
Is the attribute always present?


Yes.  The ifns are inserted by a pass (execute_oacc_device_lower) that only 
operates on such functions. (I considered adding  an assert, but figured the 
resulting segfault  would be loud enough)



Is the call argument always a constant
that fits in a HOST_WIDE_INT (or even int here)?


Yes.


Are there always enough

'dims' in the tree list?


Yes.  The oacc_device_lower pass has already validated and canonicalized the 
dimensions.



Is the 'dim' value always an INTEGER_CST that

fits a HOST_WIDE_INT (or even an int here)?


Yes.  That's part of the validation.  see set_oacc_fn_attrib (omp-low.c)


I hope all these constraints (esp. 'int' fitting) are verified by the parser.


It's an internal function not visible the user.  Generation is entirely within 
the compiler



If so I'd like to see helper functions to hide these implementation details
from generic code like this.


ok.



You miss to provide the known lower bound to VRP when size is 0
in the end.  Just unconditioonally do

   tree type = TREE_TYPE (gimple_call_lhs (stmt));
   set_value_range (vr, VR_RANGE,
build_int_cst (type, is_pos ? 0 : 1),
size
? build_int_cst (type, size - is_pos)
: vrp_val_max (type), NULL);


I'm confused.  If size is zero, we never execute that path, and IIUC therefore 
never specify a range.  What you suggest looks like an additional improvement 
though.  Is that what you meant?


nathan


Re: [PATCH] Make CCP effectively remove UNDEFINED defs

2015-11-04 Thread Richard Biener
On Wed, 4 Nov 2015, Marc Glisse wrote:

> On Wed, 4 Nov 2015, Richard Biener wrote:
> 
> > On Wed, 4 Nov 2015, Richard Biener wrote:
> > 
> > > On Wed, 4 Nov 2015, Marc Glisse wrote:
> > > 
> > > > On Wed, 4 Nov 2015, Richard Biener wrote:
> > > > 
> > > > > The following patch makes sure that CCP when it computes a lattice
> > > > > value to UNDEFINED ends up replacing uses with default defs (and thus
> > > > > removes such UNDEFINED producing defs).  This optimizes the testcase
> > > > > below to
> > > > > 
> > > > >  :
> > > > >  return _6(D);
> > > > > 
> > > > > in the first CCP.  Note this patch isn't mainly for the optimization
> > > > > it does but for making it easier to discover cases where CCP gets
> > > > > UNDEFINED wrong (similar to VRP2 re-using the range-info that is
> > > > > still live on SSA names - that's to catch bogus range-info).
> > > > > 
> > > > > If the experiment works out (read: bootstrap / test completes
> > > > > successfully) I'm going to enhance VRP and will also look at
> > > > > how value-numbering could do a similar job.
> > > > 
> > > > Cool, the VN part might help with
> > > > https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01556.html
> > > > :-)
> > > 
> > > Actually it "breaks" gcc.dg/tree-ssa/pr44133.c as it transforms
> > > 
> > >   :
> > >   _2 = l_1(D) + -22;
> > >   _4 = _2 + s$i_5(D);
> > >   return _4;
> > > 
> > > into
> > > 
> > >   :
> > >   return _6(D);
> > > 
> > > turning the warning of an uninitialized use of s$i_5(D) (aka s.i)
> > > into a warning of an uninitialized use of '({anonymous})'.
> > > 
> > > So I wonder if a better approach is to handle UNDEFINED differently
> > > during propagation (now that we can propagate copies) and make
> > > the value of _4 not UNDEFINED but a copy of (one of) the operand
> > > that makes it undefined.  Thus we'd turn the above into
> > > 
> > >   :
> > >   return s$i_5(D);
> > > 
> > > there is the chance that the warnings resulting from this will
> > > be confusing (warning for an uninitialized use of sth far away
> > > from its actual use).  Of course the above propagation is also
> > > cheaper to implement (apart from the choice you have when
> > > seeing x_1(D) + y_2(D) - which one do you pick and retain
> > > the warning for?)
> > 
> > Ok, it's actually _not_ easy.  The operand might not have the
> > correct type after all and we'll introduce a not fully propagated
> > lattice that way (the 'copy' value would have a lattice value
> > of UNDEFINED).
> 
> If you create a new variable (that _6 is a default value for), you could also
> give it the same name (gcc may append .1) and location as s.i, even if the
> type differs. It wouldn't be perfect (might be too much of a hack for gcc),
> but should be more readable than '({anonymous})'. Hmm, I now realize that it
> doesn't really help, because you would have to maintain somehow the
> information of which variable's name to copy, and that's what you are saying
> is hard :-( Sorry for the useless comment.

No problem.  I'm holding off on this for now (similar to how I did on
optimistically propagating copies ...).  The uninit var stuff is bad
and I have no time to sort this out before stage1 ends.

Anyway, the following fixes the testcases I found with the patch ;)

Will commit after testing.

Richard.

2015-11-04  Richard Biener  

* gcc.dg/tree-ssa/loadpre2.c: Avoid undefined behavior due to
uninitialized variables.
* gcc.dg/tree-ssa/loadpre21.c: Likewise.
* gcc.dg/tree-ssa/loadpre22.c: Likewise.
* gcc.dg/tree-ssa/loadpre23.c: Likewise.
* gcc.dg/tree-ssa/loadpre24.c: Likewise.
* gcc.dg/tree-ssa/loadpre25.c: Likewise.
* gcc.dg/tree-ssa/loadpre4.c: Likewise.
* gcc.dg/ipa/inlinehint-2.c: Likewise.
* gcc.dg/ipa/pure-const-2.c: Likewise.
* gcc.dg/tree-ssa/loop-1.c: Likewise.
* gcc.dg/tree-ssa/loop-23.c: Likewise.
* gcc.dg/tree-ssa/pr22051-2.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-3.c: Likewise.
* gcc.dg/tree-ssa/ssa-sccvn-3.c: Likewise.
* gcc.dg/vect/pr30858.c: Likewise.
* gcc.dg/vect/pr33866.c: Likewise.
* gcc.dg/vect/pr37027.c: Likewise.
* c-c++-common/ubsan/null-10.c: Likewise.
* gcc.target/i386/incoming-8.c: Likewise.

Index: gcc/testsuite/c-c++-common/ubsan/null-10.c
===
--- gcc/testsuite/c-c++-common/ubsan/null-10.c  (revision 229752)
+++ gcc/testsuite/c-c++-common/ubsan/null-10.c  (working copy)
@@ -2,10 +2,12 @@
 /* { dg-options "-fsanitize=null -w" } */
 /* { dg-shouldfail "ubsan" } */
 
+short x;
+
 int
 main (void)
 {
-  short *p = 0, *u;
+  short *p = 0, *u = &x;
   *(u + *p) = 23;
   return  0;
 }
Index: gcc/testsuite/gcc.dg/ipa/inlinehint-2.c
===
--- gcc/testsuite/gcc.dg/ipa/inlinehint-2.c (revision 229752)
+++ gcc/testsuite/gcc.dg/ipa/inlinehint-2.c (working copy)
@@

Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-04 Thread Kyrill Tkachov


On 04/11/15 11:37, Kyrill Tkachov wrote:


On 02/11/15 22:31, Jeff Law wrote:

On 11/02/2015 07:15 AM, Kyrill Tkachov wrote:

Hi all,

This patch attempts to restrict combine from transforming ZERO_EXTEND
and SIGN_EXTEND operations into and-bitmask
and weird SUBREG expressions when they appear inside MULT expressions.
This is because a MULT rtx containing these
extend operations is usually a well understood widening multiply operation.
However, if the costs for simple zero or sign-extend moves happen to
line up in a particular way,
expand_compound_operation will end up mangling a perfectly innocent
extend+mult+add rtx like:
  (set (reg/f:DI 393)
 (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ]))
 (sign_extend:DI (reg:SI 606)))
  (reg:DI 600)))

into:
  (set (reg/f:DI 393)
 (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0)
 (const_int 202 [0xca]))
 (sign_extend:DI (reg/v:SI 425 [ selected ])))
  (reg:DI 600)))

Going to leave the review side of this for Segher.

If you decide to go forward, there's a section in md.texi WRT canonicalization of these 
RTL codes that probably would need updating. Just search for "canonicalization" 
section and read down a ways.



You mean document a canonical form for these operations as (mult (extend op1) 
(extend op2)) ?



Looking at the issue more deeply I think the concrete problem is the code in 
expand_compound_operation:

  /* If we reach here, we want to return a pair of shifts.  The inner
 shift is a left shift of BITSIZE - POS - LEN bits.  The outer
 shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
 logical depending on the value of UNSIGNEDP.

 If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be
 converted into an AND of a shift.

 We must check for the case where the left shift would have a negative
 count.  This can happen in a case like (x >> 31) & 255 on machines
 that can't shift by a constant.  On those machines, we would first
 combine the shift with the AND to produce a variable-position
 extraction.  Then the constant of 31 would be substituted in
 to produce such a position.  */

  modewidth = GET_MODE_PRECISION (GET_MODE (x));
  if (modewidth >= pos + len)
{
  machine_mode mode = GET_MODE (x);
  tem = gen_lowpart (mode, XEXP (x, 0));
  if (!tem || GET_CODE (tem) == CLOBBER)
return x;
  tem = simplify_shift_const (NULL_RTX, ASHIFT, mode,
  tem, modewidth - pos - len);
  tem = simplify_shift_const (NULL_RTX, unsignedp ? LSHIFTRT : ASHIFTRT,
  mode, tem, modewidth - len);
}
  else if (unsignedp && len < HOST_BITS_PER_WIDE_INT)
tem = simplify_and_const_int (NULL_RTX, GET_MODE (x),
  simplify_shift_const (NULL_RTX, LSHIFTRT,
GET_MODE (x),
XEXP (x, 0), pos),
  ((unsigned HOST_WIDE_INT) 1 << len) - 1);
  else
/* Any other cases we can't handle.  */
return x;


this code ends up unconditionally transforming (zero_extend:DI (reg:SI 125)) 
into:
(and:DI (subreg:DI (reg:SI 125) 0)
(const_int 1 [0x1]))

which then gets substituted as an operand of the mult and nothing matches after 
that.
archaeology shows this code has been there since at least 1992. I guess my 
question is
why do we do this unconditionally? Should we be checking whether the 
zero_extend form
is cheaper than whatever simplify_shift_const returns?
I don't think what expand_compound_operatio is trying to do here is 
canonicalisation...

Thanks,
Kyrill




Jeff








Re: [2/3] OpenACC reductions

2015-11-04 Thread Bernd Schmidt

On 11/02/2015 05:35 PM, Nathan Sidwell wrote:


+/* Size of buffer needed for worker reductions.  This has to be


Maybe "description" rather than "Size" since there's really four 
variables we're covering with the comment.



+  worker_red_size = (worker_red_size + worker_red_align - 1)
+   & ~(worker_red_align - 1);


Formatting. Wrap the entire multi-line expression in parentheses to get 
editors to align the & operator.



+static rtx
+nvptx_expand_cmp_swap (tree exp, rtx target,
+  machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore))


Add a comment. You're using ATTRIBUTE_UNUSED and ARG_UNUSED in this 
patch, it would be good to be consistent - I'm still not sure which 
style is preferred after the switch to C++, so as far as I'm concerned 
just pick one.



+{
+#define DEF(ID, NAME, T)   \
+  (nvptx_builtin_decls[NVPTX_BUILTIN_ ## ID] = \
+   add_builtin_function ("__builtin_nvptx_" NAME,\
+build_function_type_list T,\
+NVPTX_BUILTIN_ ## ID, BUILT_IN_MD, NULL, NULL))


I think the assignment operator should start the line like all others, 
but other code in gcc is pretty inconsistent in that department.



+static tree
+nvptx_get_worker_red_addr (tree type, tree offset)


Add a comment.


+  switch (TYPE_MODE (TREE_TYPE (var)))
+{
+case SFmode:
+  code = VIEW_CONVERT_EXPR;
+  /* FALLTHROUGH */
+case SImode:
+  break;
+
+case DFmode:
+  code = VIEW_CONVERT_EXPR;
+  /* FALLTHROUGH  */
+case DImode:
+  type = long_long_unsigned_type_node;
+  fn = NVPTX_BUILTIN_CMP_SWAPLL;
+  break;
+
+default:
+  gcc_unreachable ();
+}


There are two such switch statements, and it's possible to write this 
more compactly:

  if (!INTEGRAL_MODE_P (...))
code = VIEW_CONVERT_EXPR;
  if (GET_MODE_SIZE (...) == 8)
fn = CMP_SWAPLL;
Not required, you can decide which you like better.

Otherwise I have no objections.


Bernd


Re: Division Optimization in match and simplify

2015-11-04 Thread Richard Biener
On Wed, Nov 4, 2015 at 1:45 PM, Marc Glisse  wrote:
> On Wed, 4 Nov 2015, Richard Biener wrote:
>
>>> I don't really remember what the tests !TYPE_UNSIGNED (type) and
>>> tree_int_cst_sgn are for in the other pattern, but since you are only
>>> moving
>>> the transformation...
>>
>>
>> +/* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
>> +(for div (exact_div trunc_div)
>> + (simplify
>> +  (div (bit_and @0 INTEGER_CST@1) INTEGER_CST@2)
>> +  (if (!TYPE_UNSIGNED (type) && integer_pow2p (@2)
>> +   && tree_int_cst_sgn (@2) > 0
>> +   && wi::add (@2, @1) == 0)
>> +   (rshift @0 { build_int_cst (integer_type_node, wi::exact_log2 (@2));
>> }
>>
>> the TYPE_UNSIGNED test is because right shift of negative values is
>> undefined,
>
>
> tree.def: "Shift means logical shift if done on an unsigned type, arithmetic
> shift if done on a signed type."
> To me, this implies that right shift of negative values is well-defined
> inside gcc.

Ah, it was _left_ shift of negative values that ubsan complains about.

> Also, the test allows *only signed types*, not unsigned.

Doh.  Ok, so maybe that's because the tree_int_cst_sgn test would
"not work" otherwise (just use tree_int_cst_sign_bit (@2) == 0).

>
>> so is a shift with a negative value.  I believe we can safely handle
>> conversions here
>> just like fold-const.c does with
>>
>> (div (convert? (bit_and @0 INTEGER_CST@1) INTEGER_CST@2)
>> (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>>  ...
>>
>> With that the pattern looks ok to me.
>
>
> As long as it comes with (convert @0) in the result... I think the
> fold-const.c pattern is lucky that (int)(x&-4u) gets folded to
> (int)x&-4, or it might ICE for ((int)(x&-4u))/4.

I think the types match ok but I might have looked too quickly (again).

Richard.

> --
> Marc Glisse


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-11-04 Thread Jakub Jelinek
On Mon, Jul 06, 2015 at 05:38:35PM +0100, Alan Lawrence wrote:
> Trying to push these now (svn!), patch 2 is going first.
> 
> I realize my second iteration of patch 1/2, dropped the testcases from the
> first version. Okay to include those as per
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00215.html ?

FYI, it seems that (most likely) the PR65956 changes on gcc-5-branch
broke libgnat ABI compatibility on arm - it seems that getsubs.adb
from macrosub proglet (and others) are during make check compiled/linked
with system gnatmake/gcc, but the program is run at runtime
against the new libgnat-5.so.  If I run it manually against
system libgnat, it works, otherwise it hangs, when Fill_Table from
getsubs.adb calls Get_Line, and indeed it looks like the argument passing
for Get_Line changed and on the callee side it thinks Item (which is 400
chars string) has random (and in the hanging case negative) number of chars
in it.

Jakub


[PATCH/RFC/RFA] Machine modes for address printing (all targets)

2015-11-04 Thread Julian Brown
Hi,

Depending on assembler syntax and supported addressing modes, several
targets need to know the machine mode for a memory access when printing
an address (i.e. for automodify addresses that need to know the size
of their access), but it is not available with the current
TARGET_PRINT_OPERAND_ADDRESS hook. This leads to an ugly corner in the
operand output mechanism, where address printing gets split between
different parts of a backend, or some other hack (e.g. a global
variable) is used to communicate the machine mode to the address
printing hook.

Using a global variable also leads to a latent (?) bug on at least
AArch64: attempts to use the 'a' operand printing code cause final.c
to call output_address (in turn invoking the PRINT_OPERAND_ADDRESS
macro) *without* first setting the magic global
aarch64_memory_reference_mode, which means a stale value will be used
instead.

The full list of targets that use some form of workaround for the lack
of machine mode in the address printing hook is (E&OE):

aarch64: uses magic global.
arc: pre/post inc/dec handled in print_operand.
arm: uses magic global.
c6x
epiphany: offsets handled in print_operand.
m32r: hard-wires 4 for access size.
nds32
tilegx: uses magic global.
tilepro: uses magic global.

That's not all targets by any means, but may be enough to warrant a
change in the interface. I propose that:

* The output_address function should have a machine_mode argument
  added. Bare addresses (e.g. the 'a' case in final.c) should pass
  "VOIDmode" for this argument.

* Other callers of output_address -- actually all in backends -- can
  pass the machine mode for the memory access in question.

* The TARGET_PRINT_OPERAND_ADDRESS hook shall also have a machine_mode
  argument added. The legacy PRINT_OPERAND_ADDRESS hook can be left
  alone. (The documentation for the operand-printing hooks needs fixing
  too, incidentally.)

The attached patch makes this change, fairly mechanically. This removes
(most of) the magic globals for address printing, but I haven't tried
to refactor the targets that use other hacks to print correct
auto-modify addresses (that can be done by their respective
maintainers, hopefully, and should result in a nice cleanup).

Unfortunately I can't hope to test all the targets affected, though the
subset of targets that it's relatively easy for me to build, build
fine. I also ran regression tests for AArch64.

OK to apply, or any comments, or any further testing required?

Thanks,

Julian

ChangeLog

gcc/
* final.c (output_asm_insn): Pass VOIDmode to output_address.
(output_address): Add MODE argument. Pass to print_operand_address
hook.
* targhooks.c (default_print_operand_address): Add MODE argument.
* targhooks.h (default_print_operand_address): Update prototype.
* output.h (output_address): Update prototype.
* target.def (print_operand_address): Add MODE argument.
* config/vax/vax.c (print_operand_address): Pass VOIDmode to
output_address.
(print_operand): Pass access mode to output_address.
* config/mcore/mcore.c (mcore_print_operand_address): Add MODE
argument.
(mcore_print_operand): Update calls to mcore_print_operand_address.
* config/fr30/fr30.c (fr30_print_operand): Pass VOIDmode to
output_address.
* config/lm32/lm32.c (lm32_print_operand): Pass mode in calls to
output_address.
* config/tilegx/tilegx.c (output_memory_reference_mode): Remove
global.
(tilegx_print_operand): Don't set above global. Update calls to
output_address.
(tilegx_print_operand_address): Add MODE argument. Use instead of
output_memory_reference_mode global.
* config/frv/frv.c (frv_print_operand_address): Add MODE argument.
* config/mn10300/mn10300.c (mn10300_print_operand): Pass mode to
output_address.
* config/cris/cris.c (cris_print_operand_address): Add MODE
argument.
(cris_print_operand): Pass mode to output_address calls.
* config/spu/spu.c (print_operand): Pass mode to output_address
calls.
* config/aarch64/aarch64.h (aarch64_print_operand)
(aarch64_print_operand_address): Remove prototypes.
* config/aarch64/aarch64.c (aarch64_memory_reference_mode): Delete
global.
(aarch64_print_operand): Make static. Update calls to
output_address.
(aarch64_print_operand_address): Add MODE argument. Use instead of
aarch64_memory_reference_mode global.
(TARGET_PRINT_OPERAND, TARGET_PRINT_OPERAND_ADDRESS): Define target
hooks.
* config/aarch64/aarch64.h (PRINT_OPERAND, PRINT_OPERAND_ADDRESS):
Delete macro definitions.
* config/pa/pa.c (pa_print_operand): Pass mode in output_address
calls.
* config/xtensa/xtensa.c (print_operand): Pass mode in
output_address calls.
* config/h8300/h8300.c (h8300_print_operand_address): Add MODE
argument.
(h83000_print_operand): Update calls to h8300_print_operand_address
and output_address.
* config/ia64/ia64.c (ia64_print_operand_ad

Re: Division Optimization in match and simplify

2015-11-04 Thread Marc Glisse

On Wed, 4 Nov 2015, Richard Biener wrote:


I don't really remember what the tests !TYPE_UNSIGNED (type) and
tree_int_cst_sgn are for in the other pattern, but since you are only moving
the transformation...


+/* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
+(for div (exact_div trunc_div)
+ (simplify
+  (div (bit_and @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (!TYPE_UNSIGNED (type) && integer_pow2p (@2)
+   && tree_int_cst_sgn (@2) > 0
+   && wi::add (@2, @1) == 0)
+   (rshift @0 { build_int_cst (integer_type_node, wi::exact_log2 (@2)); }

the TYPE_UNSIGNED test is because right shift of negative values is undefined,


tree.def: "Shift means logical shift if done on an unsigned type, arithmetic shift 
if done on a signed type."
To me, this implies that right shift of negative values is well-defined
inside gcc.
Also, the test allows *only signed types*, not unsigned.


so is a shift with a negative value.  I believe we can safely handle
conversions here
just like fold-const.c does with

(div (convert? (bit_and @0 INTEGER_CST@1) INTEGER_CST@2)
(if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 ...

With that the pattern looks ok to me.


As long as it comes with (convert @0) in the result... I think the
fold-const.c pattern is lucky that (int)(x&-4u) gets folded to
(int)x&-4, or it might ICE for ((int)(x&-4u))/4.

--
Marc Glisse


OpenACC declare directive updates

2015-11-04 Thread James Norris


This patch updates the processing of OpenACC declare directive for
Fortran in the following areas:

1) module support
2) device_resident and link clauses
3) clause checking
4) directive generation

Commentary on the changes is included as an attachment (NOTES).

All of the code is in the gomp-4_0-branch.

Regtested on x86_64-linux.

Thanks!
Jim
Background

The declare directive is used to allocate device memory for the
entire scope of a variable / array within a program, function,
or subroutine. Consider the following example.

module vars
  integer b
  !$acc declare device_resident (b)
  integer c
  !$acc declare link (c)
end module vars

program main
  use vars
  integer, parameter :: N = 8
  integer :: a(N)

  a(:) = 2
  c = 12

  !$acc parallel copy (a) present (b) copyin(c)
  do i = 1, N
b = a(i)
c = b
a(i) = c + i
  end do
  !$acc end parallel

end program

In the example, 'b' will be allocated on the device at the outset
of device activity and be available for the duration. Whereas the
allocation of 'c' will be delayed until the parallel region is
entered. The device memory for 'c' will be deallocated upon exit
of the parallel region.

Fortran front-end

The changes are concentrated into four (4) areas.

1) module support
The neccesary functionality has been added to handle the
reading in and writing out of the appropriate attributes
for the declare directive. Additional functionality has
been added at read in time to setup the required declare
handling.

2) device_resident and link clauses
Add the functionality necessary to process the link and
device_resident clauses.

3) clause checking
The clause checking has been cleaned up to better check
for duplicates and correctness.

4) directive generation

Prior to handling the fortran execution body a code
body is created to handle the clause(s) that accompany
the declare directive(s). Each clause is examined and
determined whether the clause need to be modified to 
perform an action at the beginning of the module, function,
subroutine, or program. Furthermore, an additional
clause may be added to the list to perform an action
at the time the function or subroutine returns.

Once all the clauses have been handled, the code body
is added to the chain.

libgomp

TODO

Testing

New compile and runtime tests have been added. Also some have
been modified.
2015-10-29  James Norris  
Cesar Philippidis  

gcc/fortran/
* dump-parse-tree.c (show_namespace): Reimplement.
* f95-lang.c (gfc_attribute_table): New entry.
* gfortran.h (struct symbol_attribute): New fields.
(enum gfc_omp_map_map): Add OMP_MAP_DEVICE_RESIDENT and OMP_MAP_LINK.
(OMP_LIST_LINK): New enum.
(struct gfc_oacc_declare): New structure.
(gfc_get_oacc_declare): New definition.
(struct gfc_namespace): Change type.
(enum gfc_exec_op): Add EXEC_OACC_DECLARE.
(struct gfc_code): New field.
* module.c (enum ab_attribute): Add AB_OACC_DECLARE_CREATE,
AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK
(attr_bits): Add new initializers.
(mio_symbol_attribute): Handle new atributes.
* openmp.c (gfc_free_oacc_declare_clauses): New.
(OMP_CLAUSE_LINK): New definition.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSES): Add OMP_CLAUSE_LINK
(gfc_match_oacc_declare): Add checking and module handling.
(resolve_omp_duplicate_list, resolve_oacc_declare_map): New.
(gfc_resolve_oacc_declare): Use duplicate detection.
* parse.c (case_decl): Add ST_OACC_DECLARE.
(parse_spec): Remove handling.
(parse_progunit): Remove handling.
* parse.h (struct gfc_state_data): Change type.
* resolve.c (gfc_resolve_blocks): Handle EXEC_OACC_DECLARE.
* st.c (gfc_free_statement): Handle EXEC_OACC_DECLARE.
* symbol.c (check_conflict): Add conflict checks.
(gfc_add_oacc_declare_create, gfc_add_oacc_declare_copyin, 
gfc_add_oacc_declare_deviceptr, gfc_add_oacc_declare_device_resident):
New functions.
(gfc_copy_attr): Handle new symbols.
* trans-decl.c (add_attri

Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads

2015-11-04 Thread Arnaud Charlet
> > > > Your ChangeLog entry is not in the proper format, see sections 6.8.1
> > > > and
> > > > 6.8.2 from http://www.gnu.org/prep/standards/standards.html
> > > > 
> > > > The diff itself is OK.
> > > 
> > > Ok, fixed this. See the new diff below.
> > 
> > This is now OK, you can go ahead and commit it.
> > 
> 
> Attached are the 2 final patches. One for trunk and the other one for both the
> gcc-5-branch and the gcc-4_9-branch.
> I don't have write access, so one of you would need to commit it.

I'll let Joel take care of that, and verify that you have your paperwork with
the FSF for copyright assignment in place (as per
https://gcc.gnu.org/contribute.html)

Arno


Re: [PATCH] [ARM] neon-testgen.ml typo

2015-11-04 Thread Ramana Radhakrishnan
On Fri, Oct 30, 2015 at 2:42 PM, Christophe Lyon
 wrote:
> On 30 October 2015 at 15:33, Ramana Radhakrishnan
>  wrote:
>>
>>
>> On 29/10/15 17:23, Jim Wilson wrote:
>>> I noticed a comment typo in this file while using grep to look for
>>> other stuff.  The typo is easy to fix.
>>>
>>> I tried running neon-testgen.ml to verify, but it is apparently no
>>> longer valid ocaml, as it doesn't work with the ocamlc 4.01.0 I have
>>> on Ubuntu 14.04.  I get a syntax error.  Someone who knows ocaml will
>>> have to fix this.  Meanwhile, the patch to fix the typo should still
>>> be OK, as this is a separate problem.
>>>
>>> Jim
>>>
>>
>> This is OK.
>>
>> I'd really like neon-testgen.ml and the tests in gcc.target/arm/neon to be 
>> removed if all the intrinsics are now tested from Christophe's work in 
>> getting his advsimd tests integrated. Where are we on that ?
>>
>
> The tests I added cover all ARMv7 intrinsics. I have converted all my tests.

Good and thank you for doing that.

>
> There were a few additions to support some aarch64 specific intrinsics.
>
> However, most of the tests in gcc.target/arm/neon contain scan-asm
> directives which mine don't.
> My tests do check functionality (they are executable, comparing their
> results against expected values).

I don't think the scan-asm in those tests gives you anything useful at
O0 with undefined behaviour in the testcases. In any case for the more
esoteric intrinsics (i.e. the ones that do saturation etc..) having
the run time tests is a better test. I do not see this as being useful
any more in terms of the testing coverage this provides.

A patch to remove neon-testgen.ml and the tests in gcc.target/arm/neon
is pre-approved.


regards
Ramana



>
> Christophe.
>
>> regards
>> Ramana


Re: [ping] Fix PR debug/66728

2015-11-04 Thread Richard Biener
On Wed, Nov 4, 2015 at 12:57 PM, Mike Stump  wrote:
> On Nov 4, 2015, at 1:43 AM, Richard Biener  wrote:
>> I think you should limit the effect of this patch to the dwarf2out use
>> as the above doesn't make sense to me.
>
> Since dwarf is so special, and since other clients already do something sort 
> of like this anyway, it isn’t unreasonable to make the client be responsible 
> for picking a sensible mode, and asserting if they fail to.  This also 
> transfers the cost of the special case code out to the one client that needs 
> it, and avoids that cost for all the other clients.
>
> The new patch is below for your consideration.
>
> Ok?
>
>> Ideally we'd have an assert that you don't create a rtx_mode_t with
>> VOIDmode or BLKmode.
>
> Added.
>
>> Handling the CONST_WIDE_INT in dwarf2out.c the same as we did before
>> (with CONST_DOUBLE)
>> looks sensible as far of fixing a regression (I assume the diff to the
>> dwarf results in essentially the
>> same DWARF as what was present before wide-int).
>
> Yes, the dwarf is the same.
>
> Index: dwarf2out.c
> ===
> --- dwarf2out.c (revision 229720)
> +++ dwarf2out.c (working copy)
> @@ -15593,8 +15593,15 @@
>return true;
>
>  case CONST_WIDE_INT:
> -  add_AT_wide (die, DW_AT_const_value,
> -  std::make_pair (rtl, GET_MODE (rtl)));
> +  {
> +   machine_mode mode = GET_MODE (rtl);
> +   if (mode == VOIDmode)
> + mode = mode_for_size (CONST_WIDE_INT_NUNITS (rtl)
> +   * HOST_BITS_PER_WIDE_INT,
> +   MODE_INT, 0);
> +   add_AT_wide (die, DW_AT_const_value,
> +std::make_pair (rtl, mode));

I wonder if we'll manage to to get mode_for_size return BLKmode
in case of an original mode that was not of a size multiple of
HOST_BITS_PER_WIDE_INT (and that's host dependent even...).

We probably should use smallest_mode_for_size on a precision
derived from the value (ignore leading ones and zeros or so, exact
details need to be figured out).  Eventually hide this detail in a
smallest_mode_for_const_wide_int () helper.

> +  }
>return true;
>
>  case CONST_DOUBLE:
> Index: rtl.h
> ===
> --- rtl.h   (revision 229720)
> +++ rtl.h   (working copy)
> @@ -2086,6 +2086,7 @@
>  inline unsigned int
>  wi::int_traits ::get_precision (const rtx_mode_t &x)
>  {
> +  gcc_assert (x.second != BLKmode && x.second != VOIDmode);

Please use gcc_checking_assert here.

Richard.

>return GET_MODE_PRECISION (x.second);
>  }
>
>


Re: [PATCH] Make CCP effectively remove UNDEFINED defs

2015-11-04 Thread Marc Glisse

On Wed, 4 Nov 2015, Richard Biener wrote:


On Wed, 4 Nov 2015, Richard Biener wrote:


On Wed, 4 Nov 2015, Marc Glisse wrote:


On Wed, 4 Nov 2015, Richard Biener wrote:


The following patch makes sure that CCP when it computes a lattice
value to UNDEFINED ends up replacing uses with default defs (and thus
removes such UNDEFINED producing defs).  This optimizes the testcase
below to

 :
 return _6(D);

in the first CCP.  Note this patch isn't mainly for the optimization
it does but for making it easier to discover cases where CCP gets
UNDEFINED wrong (similar to VRP2 re-using the range-info that is
still live on SSA names - that's to catch bogus range-info).

If the experiment works out (read: bootstrap / test completes
successfully) I'm going to enhance VRP and will also look at
how value-numbering could do a similar job.


Cool, the VN part might help with
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01556.html
:-)


Actually it "breaks" gcc.dg/tree-ssa/pr44133.c as it transforms

  :
  _2 = l_1(D) + -22;
  _4 = _2 + s$i_5(D);
  return _4;

into

  :
  return _6(D);

turning the warning of an uninitialized use of s$i_5(D) (aka s.i)
into a warning of an uninitialized use of '({anonymous})'.

So I wonder if a better approach is to handle UNDEFINED differently
during propagation (now that we can propagate copies) and make
the value of _4 not UNDEFINED but a copy of (one of) the operand
that makes it undefined.  Thus we'd turn the above into

  :
  return s$i_5(D);

there is the chance that the warnings resulting from this will
be confusing (warning for an uninitialized use of sth far away
from its actual use).  Of course the above propagation is also
cheaper to implement (apart from the choice you have when
seeing x_1(D) + y_2(D) - which one do you pick and retain
the warning for?)


Ok, it's actually _not_ easy.  The operand might not have the
correct type after all and we'll introduce a not fully propagated
lattice that way (the 'copy' value would have a lattice value
of UNDEFINED).


If you create a new variable (that _6 is a default value for), you could 
also give it the same name (gcc may append .1) and location as s.i, even 
if the type differs. It wouldn't be perfect (might be too much of a hack 
for gcc), but should be more readable than '({anonymous})'. Hmm, I now 
realize that it doesn't really help, because you would have to maintain 
somehow the information of which variable's name to copy, and that's what 
you are saying is hard :-( Sorry for the useless comment.


--
Marc Glisse


Re: Division Optimization in match and simplify

2015-11-04 Thread Richard Biener
On Wed, Nov 4, 2015 at 12:18 PM, Marc Glisse  wrote:
> On Wed, 4 Nov 2015, Hurugalawadi, Naveen wrote:
>
 I thought we were mostly using the 'convert?'
 and tree_nop_conversion_p on integers

Yes, on floats they shouldn't be used.

>>
>> Done. Cleared all instances of convert which are not required.
>> However, I am still confused about the use of "convert" in match
>> and simplify.
>
>
> It could be that I am wrong, you'll have to wait for Richard's comments
> anyway. The one place where a convert could be useful is:
> (div (convert? (bit_and @0 INTEGER_CST@1)) INTEGER_CST@2)
> but I didn't check if we want it (it would probably at least require
> (convert @0) instead of plain @0 in the result so the division and the shift
> are done with the same signedness).
>
 So all patterns looking at arg[01] are handling nop-conversions on their
 outermost operands while those looking at op[01] do not.
>>
>>
>> These patterns are looking at arg[01] rather than op[01] ?
>
>
> Might be a case where it doesn't really matter which one you look at, and
> the easiest one in fold-const may not be the same as in match.pd.
>
> +/* Convert (A/B)/C to A/(B*C)  */
> +(simplify
> + (rdiv (rdiv @0 @1) @2)
> +  (if (flag_reciprocal_math)
>
> I would indent this line the same as the one above.

Yep.

> +   (rdiv @0 (mult @1 @2
> +
> +/* Convert A/(B/C) to (A/B)*C  */
> +(simplify
> + (rdiv @0 (rdiv @1 @2))
> +  (if (flag_reciprocal_math)
> +   (mult (rdiv @0 @1) @2)))
>
> I believe you might want a :s on the inner rdiv (?).
> Note that you can factor out the test on flag_reciprocal_math so you write
> it only once for 2 patterns.

Please use :s on both inner rdiv in both patterns.  With that the two patterns
are ok.

+/* Convert C1/(X*C2) into (C1/C2)/X  */
+(simplify
+ (rdiv (REAL_CST@0) (mult @1 REAL_CST@2))

Omit the parens around REAL_CST@0

+  (if (flag_reciprocal_math)
+   (with
+{ tree tem = const_binop (RDIV_EXPR, type, @0, @2); }
+(if (tem)
+ (rdiv { tem; } @1)

with that the pattern is ok.

>
> I don't really remember what the tests !TYPE_UNSIGNED (type) and
> tree_int_cst_sgn are for in the other pattern, but since you are only moving
> the transformation...

+/* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
+(for div (exact_div trunc_div)
+ (simplify
+  (div (bit_and @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (!TYPE_UNSIGNED (type) && integer_pow2p (@2)
+   && tree_int_cst_sgn (@2) > 0
+   && wi::add (@2, @1) == 0)
+   (rshift @0 { build_int_cst (integer_type_node, wi::exact_log2 (@2)); }

the TYPE_UNSIGNED test is because right shift of negative values is undefined,
so is a shift with a negative value.  I believe we can safely handle
conversions here
just like fold-const.c does with

 (div (convert? (bit_and @0 INTEGER_CST@1) INTEGER_CST@2)
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
  ...

With that the pattern looks ok to me.

Richard.


>
> --
> Marc Glisse


Re: [ping] Fix PR debug/66728

2015-11-04 Thread Mike Stump
On Nov 4, 2015, at 1:43 AM, Richard Biener  wrote:
> I think you should limit the effect of this patch to the dwarf2out use
> as the above doesn't make sense to me.

Since dwarf is so special, and since other clients already do something sort of 
like this anyway, it isn’t unreasonable to make the client be responsible for 
picking a sensible mode, and asserting if they fail to.  This also transfers 
the cost of the special case code out to the one client that needs it, and 
avoids that cost for all the other clients.

The new patch is below for your consideration.

Ok?

> Ideally we'd have an assert that you don't create a rtx_mode_t with
> VOIDmode or BLKmode.

Added.

> Handling the CONST_WIDE_INT in dwarf2out.c the same as we did before
> (with CONST_DOUBLE)
> looks sensible as far of fixing a regression (I assume the diff to the
> dwarf results in essentially the
> same DWARF as what was present before wide-int).

Yes, the dwarf is the same.

Index: dwarf2out.c
===
--- dwarf2out.c (revision 229720)
+++ dwarf2out.c (working copy)
@@ -15593,8 +15593,15 @@
   return true;
 
 case CONST_WIDE_INT:
-  add_AT_wide (die, DW_AT_const_value,
-  std::make_pair (rtl, GET_MODE (rtl)));
+  {
+   machine_mode mode = GET_MODE (rtl);
+   if (mode == VOIDmode)
+ mode = mode_for_size (CONST_WIDE_INT_NUNITS (rtl)
+   * HOST_BITS_PER_WIDE_INT,
+   MODE_INT, 0);
+   add_AT_wide (die, DW_AT_const_value,
+std::make_pair (rtl, mode));
+  }
   return true;
 
 case CONST_DOUBLE:
Index: rtl.h
===
--- rtl.h   (revision 229720)
+++ rtl.h   (working copy)
@@ -2086,6 +2086,7 @@
 inline unsigned int
 wi::int_traits ::get_precision (const rtx_mode_t &x)
 {
+  gcc_assert (x.second != BLKmode && x.second != VOIDmode);
   return GET_MODE_PRECISION (x.second);
 }
 



Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads

2015-11-04 Thread Jan Sommer
Am Tuesday 03 November 2015, 20:13:50 schrieb Arnaud Charlet:
> > > Your ChangeLog entry is not in the proper format, see sections 6.8.1 and
> > > 6.8.2 from http://www.gnu.org/prep/standards/standards.html
> > > 
> > > The diff itself is OK.
> > 
> > Ok, fixed this. See the new diff below.
> 
> This is now OK, you can go ahead and commit it.
> 

Attached are the 2 final patches. One for trunk and the other one for both the 
gcc-5-branch and the gcc-4_9-branch.
I don't have write access, so one of you would need to commit it.

Thanks,

   JanIndex: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog	(Revision 229739)
+++ gcc/ada/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2015-11-03  Jan Sommer 
+
+	* s-oscons-tmplt.c: Generate pthread constants for RTEMS
+	* s-osinte-rtems.ads: Declare pthread structs as opaque types in Ada
+	Fixes PR ada/68169
+
 2015-10-09  Eric Botcazou  
 
 	* gcc-interface/Make-lang.in: Make sure that GNAT1_OBJS and not just
Index: gcc/ada/s-oscons-tmplt.c
===
--- gcc/ada/s-oscons-tmplt.c	(Revision 229739)
+++ gcc/ada/s-oscons-tmplt.c	(Arbeitskopie)
@@ -157,7 +157,7 @@ pragma Style_Checks ("M32766");
 # include <_types.h>
 #endif
 
-#ifdef __linux__
+#if defined (__linux__) || defined (__rtems__)
 # include 
 # include 
 #endif
@@ -1456,7 +1456,8 @@ CND(CLOCK_THREAD_CPUTIME_ID, "Thread CPU clock")
 CNS(CLOCK_RT_Ada, "")
 #endif
 
-#if defined (__APPLE__) || defined (__linux__) || defined (DUMMY)
+#if defined (__APPLE__) || defined (__linux__) || defined (__rtems__) || \
+  defined (DUMMY)
 /*
 
--  Sizes of pthread data types
@@ -1499,7 +1500,7 @@ CND(PTHREAD_RWLOCKATTR_SIZE, "pthread_rwlockattr_t
 CND(PTHREAD_RWLOCK_SIZE, "pthread_rwlock_t")
 CND(PTHREAD_ONCE_SIZE,   "pthread_once_t")
 
-#endif /* __APPLE__ || __linux__ */
+#endif /* __APPLE__ || __linux__ || __rtems__*/
 
 /*
 
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads	(Revision 229739)
+++ gcc/ada/s-osinte-rtems.ads	(Arbeitskopie)
@@ -51,6 +51,8 @@
 --  It is designed to be a bottom-level (leaf) package.
 
 with Interfaces.C;
+with System.OS_Constants;
+
 package System.OS_Interface is
pragma Preelaborate;
 
@@ -60,6 +62,7 @@ package System.OS_Interface is
subtype rtems_id   is Interfaces.C.unsigned;
 
subtype intis Interfaces.C.int;
+   subtype char   is Interfaces.C.char;
subtype short  is Interfaces.C.short;
subtype long   is Interfaces.C.long;
subtype unsigned   is Interfaces.C.unsigned;
@@ -68,7 +71,6 @@ package System.OS_Interface is
subtype unsigned_char  is Interfaces.C.unsigned_char;
subtype plain_char is Interfaces.C.plain_char;
subtype size_t is Interfaces.C.size_t;
-
---
-- Errno --
---
@@ -76,11 +78,11 @@ package System.OS_Interface is
function errno return int;
pragma Import (C, errno, "__get_errno");
 
-   EAGAIN: constant := 11;
-   EINTR : constant := 4;
-   EINVAL: constant := 22;
-   ENOMEM: constant := 12;
-   ETIMEDOUT : constant := 116;
+   EAGAIN: constant := System.OS_Constants.EAGAIN;
+   EINTR : constant := System.OS_Constants.EINTR;
+   EINVAL: constant := System.OS_Constants.EINVAL;
+   ENOMEM: constant := System.OS_Constants.ENOMEM;
+   ETIMEDOUT : constant := System.OS_Constants.ETIMEDOUT;
 
-
-- Signals --
@@ -448,6 +450,7 @@ package System.OS_Interface is
   ss_low_priority : int;
   ss_replenish_period : timespec;
   ss_initial_budget   : timespec;
+  sched_ss_max_repl   : int;
end record;
pragma Convention (C, struct_sched_param);
 
@@ -621,43 +624,34 @@ private
end record;
pragma Convention (C, timespec);
 
-   CLOCK_REALTIME :  constant clockid_t := 1;
-   CLOCK_MONOTONIC : constant clockid_t := 4;
+   CLOCK_REALTIME :  constant clockid_t := System.OS_Constants.CLOCK_REALTIME;
+   CLOCK_MONOTONIC : constant clockid_t := System.OS_Constants.CLOCK_MONOTONIC;
 
+   subtype char_array is Interfaces.C.char_array;
+
type pthread_attr_t is record
-  is_initialized  : int;
-  stackaddr   : System.Address;
-  stacksize   : int;
-  contentionscope : int;
-  inheritsched: int;
-  schedpolicy : int;
-  schedparam  : struct_sched_param;
-  cputime_clocked_allowed : int;
-  detatchstate: int;
+  Data : char_array (1 .. OS_Constants.PTHREAD_ATTR_SIZE);
end record;
pragma Convention (C, pthread_attr_t);
+   for pthread_attr_t'Alignment use Interfaces.C.double'Alignment;
 
type pthread_condattr_t is record
-  flags   : int;
-  process_shared  : int;
+  Data : char_array (1 .. OS_Constants.PTHREAD_CONDATTR_SIZE);
end record;
pragma Convention (C, pthread_condattr_t);
+   for pth

Re: [gomp4] fortran cleanups and c/c++ loop parsing backport

2015-11-04 Thread Thomas Schwinge
Hi Cesar!

On Tue, 27 Oct 2015 11:36:10 -0700, Cesar Philippidis  
wrote:
>   * Proposed fortran cleanups and enhanced error reporting changes:
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html

... has now been applied to trunk, in an altered version, so we now got
some divergence between trunk and gomp-4_0-branch to resolve.

> I'll apply this patch to gomp-4_0-branch shortly. I know that I should
> have broken this patch down into smaller patches, but it was already
> arranged as one big patch in my source tree.

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c

> @@ -1473,8 +1468,8 @@ gfc_match_oacc_declare (void)
> if (n->u.map_op != OMP_MAP_FORCE_ALLOC
> && n->u.map_op != OMP_MAP_FORCE_TO)
>   {
> -   gfc_error ("Invalid clause in module with "
> -  "$!ACC DECLARE at %C");
> +   gfc_error ("Invalid clause in module with $!ACC DECLARE at %L",
> +  &n->where);
> return MATCH_ERROR;
>   }
>  
> @@ -1483,29 +1478,29 @@ gfc_match_oacc_declare (void)
>  
>if (ns->proc_name->attr.oacc_function)
>   {
> -   gfc_error ("Invalid declare in routine with " "$!ACC DECLARE at %C");
> +   gfc_error ("Invalid declare in routine with $!ACC DECLARE at %C");
> return MATCH_ERROR;
>   }

Shouldn't the "%C" -> "%L (&n->where)" substitution also be done for that
one?  If yes, please do so; if no, please add a source code comment, why.

>  
>if (s->attr.in_common)
>   {
> -   gfc_error ("Unsupported: variable in a common block with "
> -  "$!ACC DECLARE at %C");
> +   gfc_error ("Variable in a common block with $!ACC DECLARE at %L",
> +  &n->where);
> return MATCH_ERROR;
>   }
>  
>if (s->attr.use_assoc)
>   {
> -   gfc_error ("Unsupported: variable is USE-associated with "
> -  "$!ACC DECLARE at %C");
> +   gfc_error ("Variable is USE-associated with $!ACC DECLARE at %L",
> +  &n->where);
> return MATCH_ERROR;
>   }
>  
>if ((s->attr.dimension || s->attr.codimension)
> && s->attr.dummy && s->as->type != AS_EXPLICIT)
>   {
> -   gfc_error ("Unsupported: assumed-size dummy array with "
> -  "$!ACC DECLARE at %C");
> +   gfc_error ("Assumed-size dummy array with $!ACC DECLARE at %L",
> +  &n->where);
> return MATCH_ERROR;
>   }

> -/* Returns true if clause in list 'list' is compatible with any of
> -   of the clauses in lists [0..list-1].  E.g., a reduction variable may
> -   appear in both reduction and private clauses, so this function
> -   will return true in this case.  */
> +/* Check if a variable appears in multiple clauses.  */
>  
> -static bool
> -oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
> -gfc_symbol *sym, bool openacc)
> +static void
> +resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
> + int list)

If I understand correctly, that has not been approved for trunk, so we
should revert this on gomp-4_0-branch, too?

>  {
>gfc_omp_namelist *n;
> +  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
>  
> -  if (!openacc)
> -return false;
> +  /* OpenACC reduction clauses are compatible with everything.  We only
> + need to check if a reduction variable is used more than once.  */
> +  if (openacc && list == OMP_LIST_REDUCTION)
> +{
> +  hash_set reductions;
>  
> -  if (list != OMP_LIST_REDUCTION)
> -return false;
> +  for (n = clause_list; n; n = n->next)
> + {
> +   if (reductions.contains (n->sym))
> + gfc_error (error_msg, n->sym->name, &n->where);
> +   else
> + reductions.add (n->sym);
> + }
>  
> -  for (n = clauses->lists[OMP_LIST_FIRST]; n; n = n->next)
> -if (n->sym == sym)
> -  return true;
> +  return;
> +}
>  
> -  return false;
> +  /* Ensure that variables are only used in one clause.  */
> +  for (n = clause_list; n; n = n->next)
> +{
> +  if (n->sym->mark)
> + gfc_error (error_msg, n->sym->name, &n->where);
> +  else
> + n->sym->mark = 1;
> +}
>  }

> @@ -4966,64 +4927,27 @@ gfc_resolve_oacc_declare (gfc_namespace *ns)
>  {
>loc = oc->where;

As far as I can tell, given the following changes, the "loc" local
variable is now unused, so should be removed?

>  
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -list <= OMP_LIST_DEVICE_RESIDENT; list++)
> - for (n = oc->clauses->lists[list]; n; n = n->next)
> -   {
> - n->sym->mark = 0;
> - if (n->sym->attr.flavor == FL_PARAMETER)
> -   gfc_error ("PARAMETER object %qs is not allowed at %L",
> -  n->sym->name, &loc);
> -   }
> -
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> - list <= OMP_LIST_DEVICE_RESIDENT;

Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-04 Thread Kyrill Tkachov

Ji Jrgg,

On 02/11/15 22:31, Jeff Law wrote:

On 11/02/2015 07:15 AM, Kyrill Tkachov wrote:

Hi all,

This patch attempts to restrict combine from transforming ZERO_EXTEND
and SIGN_EXTEND operations into and-bitmask
and weird SUBREG expressions when they appear inside MULT expressions.
This is because a MULT rtx containing these
extend operations is usually a well understood widening multiply operation.
However, if the costs for simple zero or sign-extend moves happen to
line up in a particular way,
expand_compound_operation will end up mangling a perfectly innocent
extend+mult+add rtx like:
  (set (reg/f:DI 393)
 (plus:DI (mult:DI (sign_extend:DI (reg/v:SI 425 [ selected ]))
 (sign_extend:DI (reg:SI 606)))
  (reg:DI 600)))

into:
  (set (reg/f:DI 393)
 (plus:DI (mult:DI (and:DI (subreg:DI (reg:SI 606) 0)
 (const_int 202 [0xca]))
 (sign_extend:DI (reg/v:SI 425 [ selected ])))
  (reg:DI 600)))

Going to leave the review side of this for Segher.

If you decide to go forward, there's a section in md.texi WRT canonicalization of these 
RTL codes that probably would need updating. Just search for "canonicalization" 
section and read down a ways.



You mean document a canonical form for these operations as (mult (extend op1) 
(extend op2)) ?



Jeff






Re: Division Optimization in match and simplify

2015-11-04 Thread Marc Glisse

On Wed, 4 Nov 2015, Hurugalawadi, Naveen wrote:


I thought we were mostly using the 'convert?'
and tree_nop_conversion_p on integers


Done. Cleared all instances of convert which are not required.
However, I am still confused about the use of "convert" in match
and simplify.


It could be that I am wrong, you'll have to wait for Richard's comments 
anyway. The one place where a convert could be useful is:

(div (convert? (bit_and @0 INTEGER_CST@1)) INTEGER_CST@2)
but I didn't check if we want it (it would probably at least require 
(convert @0) instead of plain @0 in the result so the division and the 
shift are done with the same signedness).



So all patterns looking at arg[01] are handling nop-conversions on their
outermost operands while those looking at op[01] do not.


These patterns are looking at arg[01] rather than op[01] ?


Might be a case where it doesn't really matter which one you look at, and 
the easiest one in fold-const may not be the same as in match.pd.


+/* Convert (A/B)/C to A/(B*C)  */
+(simplify
+ (rdiv (rdiv @0 @1) @2)
+  (if (flag_reciprocal_math)

I would indent this line the same as the one above.

+   (rdiv @0 (mult @1 @2
+
+/* Convert A/(B/C) to (A/B)*C  */
+(simplify
+ (rdiv @0 (rdiv @1 @2))
+  (if (flag_reciprocal_math)
+   (mult (rdiv @0 @1) @2)))

I believe you might want a :s on the inner rdiv (?).
Note that you can factor out the test on flag_reciprocal_math so you write 
it only once for 2 patterns.


I don't really remember what the tests !TYPE_UNSIGNED (type) and 
tree_int_cst_sgn are for in the other pattern, but since you are only 
moving the transformation...


--
Marc Glisse


Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-04 Thread Bernd Schmidt

On 10/30/2015 07:03 PM, James Greenhalgh wrote:

+ i = tmp_i; <- Should be cleaned up


Maybe reword as "Subsequent passes are expected to clean up the extra 
moves", otherwise it sounds like a TODO item.



+   read back in anotyher SET, as might occur in a swap idiom or


Typo.


+ if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+   {
+ /* The write to targets[i] is only live until the read
+here.  As the condition codes match, we can propagate
+the set to here.  */
+  new_val = SET_SRC (single_set (unmodified_insns[i]));
+   }


Shouldn't use braces around single statements (also goes for the 
surrounding for loop).



+  /* We must have at least one real insn to convert, or there will
+ be trouble!  */
+  unsigned count = 0;


The comment seems a bit strange in this context - I think it's left over 
from the earlier version?


As far as I'm concerned this is otherwise ok.


Bernd


Re: [PATCH][4.9/5 Backport] PR rtl-optimization/67037 Use copy_rtx when necessary

2015-11-04 Thread Grazvydas Ignotas
Can anyone commit this, please?

On Thu, Oct 29, 2015 at 12:48 AM, Grazvydas Ignotas  wrote:
> Hi,
>
> This is the 4.9 and GCC 5 backport of patch from PR67037 that's already in 
> trunk.
> I've build it on 4.9 and confirmed that it works.
>
> Grazvydas
>
> Backport from mainline
> 2015-09-30  Bernd Edlinger  
>
> PR rtl-optimization/67037
> * lra-constraints.c (process_addr_reg): Use copy_rtx when necessary.
>
> testsuite:
> 2015-09-30  Bernd Edlinger  
>
> PR rtl-optimization/67037
> * gcc.c-torture/execute/pr67037.c: New test.
> ---
>  gcc/lra-constraints.c |  2 +-
>  gcc/testsuite/gcc.c-torture/execute/pr67037.c | 49 
> +++
>  2 files changed, 50 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr67037.c
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index ae8f3cd..919b127 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -1203,7 +1203,7 @@ process_addr_reg (rtx *loc, rtx *before, rtx *after, 
> enum reg_class cl)
>if (after != NULL)
>  {
>start_sequence ();
> -  lra_emit_move (reg, new_reg);
> +  lra_emit_move (before_p ? copy_rtx (reg) : reg, new_reg);
>emit_insn (*after);
>*after = get_insns ();
>end_sequence ();
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67037.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr67037.c
> new file mode 100644
> index 000..3119d32
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr67037.c
> @@ -0,0 +1,49 @@
> +long (*extfunc)();
> +
> +static inline void lstrcpynW( short *d, const short *s, int n )
> +{
> +unsigned int count = n;
> +
> +while ((count > 1) && *s)
> +{
> +count--;
> +*d++ = *s++;
> +}
> +if (count) *d = 0;
> +}
> +
> +int __attribute__((noinline,noclone))
> +badfunc(int u0, int u1, int u2, int u3,
> +  short *fsname, unsigned int fsname_len)
> +{
> +static const short ntfsW[] = {'N','T','F','S',0};
> +char superblock[2048+3300];
> +int ret = 0;
> +short *p;
> +
> +if (extfunc())
> +return 0;
> +p = (void *)extfunc();
> +if (p != 0)
> +goto done;
> +
> +extfunc(superblock);
> +
> +lstrcpynW(fsname, ntfsW, fsname_len);
> +
> +ret = 1;
> +done:
> +return ret;
> +}
> +
> +static long f()
> +{
> +return 0;
> +}
> +
> +int main()
> +{
> +short buf[6];
> +extfunc = f;
> +return !badfunc(0, 0, 0, 0, buf, 6);
> +}
> --
> 1.9.1
>


  1   2   >