Re: [PATCH] Make cp_fold more consistent in its output caching

2016-05-31 Thread Patrick Palka
On Tue, May 31, 2016 at 3:00 PM, Patrick Palka  wrote:
> Some code paths of cp_fold return early instead of falling through to
> the end of the function and so in these cases we fail to cache the
> return value of the function into the fold_cache.
>
> This patch splits cp_fold into two functions, with the wrapper function
> being responsible for caching the output of the worker function.  Does
> this look OK to commit after bootstrap and regtest?

On second thought this change is not really beneficial since all the
cases where we return early from cp_fold are very cheap to re-compute
so it would be pointless to cache them.

>
> gcc/cp/ChangeLog:
>
> * cp-gimplify.c (cp_fold_1): Split out from ...
> (cp_fold): ... here.  Cache the output of cp_fold_1 into the
> fold_cache.
> ---
>  gcc/cp/cp-gimplify.c | 33 +++--
>  1 file changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
> index 7b63db4..aaf5f48 100644
> --- a/gcc/cp/cp-gimplify.c
> +++ b/gcc/cp/cp-gimplify.c
> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>  static tree cp_genericize_r (tree *, int *, void *);
>  static tree cp_fold_r (tree *, int *, void *);
>  static void cp_genericize_tree (tree*);
> +static tree cp_fold_1 (tree);
>  static tree cp_fold (tree);
>
>  /* Local declarations.  */
> @@ -1934,11 +1935,7 @@ clear_fold_cache (void)
>  static tree
>  cp_fold (tree x)
>  {
> -  tree op0, op1, op2, op3;
> -  tree org_x = x, r = NULL_TREE;
> -  enum tree_code code;
> -  location_t loc;
> -  bool rval_ops = true;
> +  tree org_x = x;
>
>if (!x || x == error_mark_node)
>  return x;
> @@ -1957,6 +1954,27 @@ cp_fold (tree x)
>if (tree *cached = fold_cache->get (x))
>  return *cached;
>
> +  x = cp_fold_1 (x);
> +
> +  fold_cache->put (org_x, x);
> +  /* Prevent that we try to fold an already folded result again.  */
> +  if (x != org_x)
> +fold_cache->put (x, x);
> +
> +  return x;
> +}
> +
> +/* Worker function for cp_fold.  */
> +
> +static tree
> +cp_fold_1 (tree x)
> +{
> +  tree op0, op1, op2, op3;
> +  tree org_x = x, r = NULL_TREE;
> +  enum tree_code code;
> +  location_t loc;
> +  bool rval_ops = true;
> +
>code = TREE_CODE (x);
>switch (code)
>  {
> @@ -2319,11 +2337,6 @@ cp_fold (tree x)
>return org_x;
>  }
>
> -  fold_cache->put (org_x, x);
> -  /* Prevent that we try to fold an already folded result again.  */
> -  if (x != org_x)
> -fold_cache->put (x, x);
> -
>return x;
>  }
>
> --
> 2.9.0.rc0.29.gabd6606
>


[PATCH], Add PowerPC ISA 3.0 MTVSRDD support

2016-05-31 Thread Michael Meissner
This patch adds support to issue the MTVSRDD on 64-bit ISA 3.0 systems when the
compiler has a 64-bit value in a GPR, and it wants to create a vector that has
the 64-bit value in both sides of the 128-bit value.

In addition, I simplified the alternatives, eliminating the use of separate
alternatives for a preferred register class and a normal register class.  This
is similar to the change I did for the 128-bit moves previously.

I have done bootstraps on a little endian power8 system with no regressions,
and I have a run on a big endian power7 system in progress.  Assuming the
power7 system does not introduce any new errors, is this patch ok to install in
the trunk.  I also plan to install the patch in the 6.2 branch when the other
ISA 3.0 patches are back ported.

[gcc]
2016-05-31  Michael Meissner  

* config/rs6000/vsx.md (vsx_splat_, V2DI/V2DF): Simplify
alternatives, eliminating preferred register class.  Add support
for the MTVSRDD instruction in ISA 3.0.
(vsx_splat_v4si_internal): Use splat_input_operand instead of
reg_or_indexed_operand.
(vsx_splat_v4sf_internal): Likewise.

[gcc/testsuite]
2016-05-31  Michael Meissner  

* gcc.target/powerpc/p9-splat-4.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236935)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000) (working copy)
@@ -2384,18 +2384,15 @@ (define_expand "vsx_mergeh_"
 
 ;; V2DF/V2DI splat
 (define_insn "vsx_splat_"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" 
"=wd,wd,wd,?,?,?")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=,,we")
(vec_duplicate:VSX_D
-(match_operand: 1 "splat_input_operand" 
",f,Z,,,Z")))]
+(match_operand: 1 "splat_input_operand" ",Z,r")))]
   "VECTOR_MEM_VSX_P (mode)"
   "@
xxpermdi %x0,%x1,%x1,0
-   xxpermdi %x0,%x1,%x1,0
lxvdsx %x0,%y1
-   xxpermdi %x0,%x1,%x1,0
-   xxpermdi %x0,%x1,%x1,0
-   lxvdsx %x0,%y1"
-  [(set_attr "type" "vecperm,vecperm,vecload,vecperm,vecperm,vecload")])
+   mtvsrdd %x0,%1,%1"
+  [(set_attr "type" "vecperm,vecload,mftgpr")])
 
 ;; V4SI splat (ISA 3.0)
 ;; When SI's are allowed in VSX registers, add XXSPLTW support
@@ -2414,7 +2411,7 @@ (define_expand "vsx_splat_"
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
(vec_duplicate:V4SI
-(match_operand:SI 1 "reg_or_indexed_operand" "r,Z")))]
+(match_operand:SI 1 "splat_input_operand" "r,Z")))]
   "TARGET_P9_VECTOR"
   "@
mtvsrws %x0,%1
@@ -2425,7 +2422,7 @@ (define_insn "*vsx_splat_v4si_internal"
 (define_insn_and_split "*vsx_splat_v4sf_internal"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,wa")
(vec_duplicate:V4SF
-(match_operand:SF 1 "reg_or_indexed_operand" "Z,wy,r")))]
+(match_operand:SF 1 "splat_input_operand" "Z,wy,r")))]
   "TARGET_P9_VECTOR"
   "@
lxvwsx %x0,%y1

Index: gcc/testsuite/gcc.target/powerpc/p9-splat-4.c
===
--- gcc/testsuite/gcc.target/powerpc/p9-splat-4.c   
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)
 (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p9-splat-4.c   
(.../gcc/testsuite/gcc.target/powerpc)  (revision 236937)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+
+#include 
+
+vector long long foo (long long a) { return (vector long long) { a, a }; }
+
+/* { dg-final { scan-assembler "mtvsrdd" } } */


[PATCH], PR target/71186, Fix PowerPC splat error

2016-05-31 Thread Michael Meissner
Anton Blanchard found that in some cases, setting V4SImode and V8HImode vector
registers to all 0's or all 1's could cause the compiler issue an 'insn does
not satisfy its constraints' error message.  The reason was, 0/-1 treated
specially, since we can generate them using normal ISA 2.06/2.07 instructions.
I added an alternative, to allow these constants for ISA 3.0.

I did a bootstrap and make check with no regressions on a little endian power8,
and I have a similar run going on a big endian power7 system.  Assuming the
power7 system also had no regressions, is this ok to install in the trunk, and
back ported to GCC 6.2 when the previous ISA 3.0 patches get back ported.

[gcc]
2016-05-31  Michael Meissner  

PR target/71186
* config/rs6000/vsx.md (xxspltib__nosplit): Add alternatives
for loading up all 0's or all 1's.

[gcc/testsuite]
2016-05-31  Michael Meissner  

PR target/71186
* gcc.target/powerpc/pr71186.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236935)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000) (working copy)
@@ -776,8 +776,8 @@ (define_insn "xxspltib_v16qi"
   [(set_attr "type" "vecperm")])
 
 (define_insn "xxspltib__nosplit"
-  [(set (match_operand:VSINT_842 0 "vsx_register_operand" "=wa")
-   (match_operand:VSINT_842 1 "xxspltib_constant_nosplit" "wE"))]
+  [(set (match_operand:VSINT_842 0 "vsx_register_operand" "=wa,wa")
+   (match_operand:VSINT_842 1 "xxspltib_constant_nosplit" "jwM,wE"))]
   "TARGET_P9_VECTOR"
 {
   rtx op1 = operands[1];
Index: gcc/testsuite/gcc.target/powerpc/pr71186.c
===
--- gcc/testsuite/gcc.target/powerpc/pr71186.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr71186.c  (revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+
+static unsigned short x[(16384/sizeof(unsigned short))] __attribute__ 
((aligned (16)));
+static unsigned short y[(16384/sizeof(unsigned short))] __attribute__ 
((aligned (16)));
+static unsigned short a;
+
+void obfuscate(void *a, ...);
+
+static void __attribute__((noinline)) do_one(void)
+{
+ unsigned long i;
+
+ obfuscate(x, y, );
+
+ for (i = 0; i < (16384/sizeof(unsigned short)); i++)
+  y[i] = a * x[i];
+
+ obfuscate(x, y, );
+}
+
+int main(void)
+{
+ unsigned long i;
+
+ for (i = 0; i < 100; i++)
+  do_one();
+
+ return 0;
+}


Separate loop exit predictor from extra exit predictor

2016-05-31 Thread Jan Hubicka
Hi,
predict_extra_loop_exits contains poor man's jump threading code for loop exit 
conditionals
pattern matching the following:
   if (foo() || global > 10)
 break;

   This will be translated into:

   BB3:
 loop header...
   BB4:
 if foo() goto BB6 else goto BB5
   BB5:
 if global > 10 goto BB6 else goto BB7
   BB6:
 goto BB7
   BB7:
 iftmp = (PHI 0(BB5), 1(BB6))
 if iftmp == 1 goto BB8 else goto BB3
   BB8:
 outside of the loop...

Here loop exit heuristics will predict "if iftmp == 1" as unlikely but that
will be later jump threaded and result in inconsistent profile.

This patch just makes this predictor to be separate from loop exit predictor
as they are two different things. This way we get different estimates about
effectivity. 

Ideally we should jump thread this before profile is built.

Bootstrappd/regtested x86_64-linux, comitted.

Honza
* g++.d/predict-loop-exit-1.C: Update template for new predictor name.
* g++.d/predict-loop-exit-2.C: Update template for new predictor name.
* g++.d/predict-loop-exit-3.C: Update template for new predictor name.

* predict.def (PRED_LOOP_EXTRA_EXIT): Define.
* predict.c (predict_iv_comparison): Also check PRED_LOOP_EXTRA_EXIT.
(predict_extra_loop_exits): Use PRED_LOOP_EXTRA_EXIT instead of
PRED_LOOP_EXIT.
Index: predict.def
===
--- predict.def (revision 236965)
+++ predict.def (working copy)
@@ -92,6 +92,11 @@ DEF_PREDICTOR (PRED_LOOP_BRANCH, "loop b
 DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (91),
   PRED_FLAG_FIRST_MATCH)
 
+/* Edge causing loop to terminate by computing value used by later conditional.
+   */
+DEF_PREDICTOR (PRED_LOOP_EXTRA_EXIT, "extra loop exit", HITRATE (91),
+  PRED_FLAG_FIRST_MATCH)
+
 /* Pointers are usually not NULL.  */
 DEF_PREDICTOR (PRED_POINTER, "pointer", HITRATE (85), 0)
 DEF_PREDICTOR (PRED_TREE_POINTER, "pointer (on trees)", HITRATE (85), 0)
Index: predict.c
===
--- predict.c   (revision 236965)
+++ predict.c   (working copy)
@@ -1245,7 +1245,8 @@ predict_iv_comparison (struct loop *loop
 
   if (predicted_by_p (bb, PRED_LOOP_ITERATIONS_GUESSED)
   || predicted_by_p (bb, PRED_LOOP_ITERATIONS)
-  || predicted_by_p (bb, PRED_LOOP_EXIT))
+  || predicted_by_p (bb, PRED_LOOP_EXIT)
+  || predicted_by_p (bb, PRED_LOOP_EXTRA_EXIT))
 return;
 
   stmt = last_stmt (bb);
@@ -1418,7 +1419,7 @@ predict_iv_comparison (struct loop *loop
The edge BB7->BB8 is loop exit because BB8 is outside of the loop.
From the dataflow, we can infer that BB4->BB6 and BB5->BB6 are also loop
exits. This function takes BB7->BB8 as input, and finds out the extra loop
-   exits to predict them using PRED_LOOP_EXIT.  */
+   exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
 
 static void
 predict_extra_loop_exits (edge exit_edge)
@@ -1474,12 +1475,12 @@ predict_extra_loop_exits (edge exit_edge
continue;
   if (EDGE_COUNT (e->src->succs) != 1)
{
- predict_paths_leading_to_edge (e, PRED_LOOP_EXIT, NOT_TAKEN);
+ predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
  continue;
}
 
   FOR_EACH_EDGE (e1, ei, e->src->preds)
-   predict_paths_leading_to_edge (e1, PRED_LOOP_EXIT, NOT_TAKEN);
+   predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
 }
 }
 
Index: testsuite/g++.dg/predict-loop-exit-1.C
===
--- testsuite/g++.dg/predict-loop-exit-1.C  (revision 236965)
+++ testsuite/g++.dg/predict-loop-exit-1.C  (working copy)
@@ -9,4 +9,5 @@ void test() {
   return;
 }
 
+/* { dg-final { scan-tree-dump-times "extra loop exit heuristics:" 2 
"profile_estimate"} } */
 /* { dg-final { scan-tree-dump-times "loop exit heuristics:" 3 
"profile_estimate"} } */
Index: testsuite/g++.dg/predict-loop-exit-2.C
===
--- testsuite/g++.dg/predict-loop-exit-2.C  (revision 236965)
+++ testsuite/g++.dg/predict-loop-exit-2.C  (working copy)
@@ -9,4 +9,5 @@ void test() {
   return;
 }
 
+/* { dg-final { scan-tree-dump-times "extra loop exit heuristics:" 1 
"profile_estimate"} } */
 /* { dg-final { scan-tree-dump-times "loop exit heuristics:" 2 
"profile_estimate"} } */
Index: testsuite/g++.dg/predict-loop-exit-3.C
===
--- testsuite/g++.dg/predict-loop-exit-3.C  (revision 236965)
+++ testsuite/g++.dg/predict-loop-exit-3.C  (working copy)
@@ -9,4 +9,5 @@ void test() {
   return;
 }
 
+/* { dg-final { scan-tree-dump-times "extra loop exit heuristics:" 2 
"profile_estimate"} } */
 /* { dg-final { scan-tree-dump-times "loop exit heuristics:" 3 
"profile_estimate"} } */


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-05-31 Thread Martin Sebor

On 05/31/2016 03:50 PM, Jakub Jelinek wrote:

On Sun, May 01, 2016 at 10:39:48AM -0600, Martin Sebor wrote:

--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7878,50 +7878,101 @@ fold_builtin_unordered_cmp (location_t loc, tree 
fndecl, tree arg0, tree arg1,

  static tree
  fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
-tree arg0, tree arg1, tree arg2)
+tree arg0, tree arg1, tree arg2 = NULL_TREE)
  {
enum internal_fn ifn = IFN_LAST;
-  tree type = TREE_TYPE (TREE_TYPE (arg2));
-  tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2);
+  enum tree_code opcode;
+
+  /* For the "generic" overloads, the first two arguments can have different
+ types and the last argument determines the target type to use to check
+ for overflow.  The names of each of the other overloads determines
+ the target type and so the last argument can be omitted.  */
+  tree type = NULL_TREE;
+
switch (fcode)
  {
  case BUILT_IN_ADD_OVERFLOW:
+  type = type ? type : TREE_TYPE (TREE_TYPE (arg2));
  case BUILT_IN_SADD_OVERFLOW:
+  type = type ? type : integer_type_node;
  case BUILT_IN_SADDL_OVERFLOW:
+  type = type ? type : long_integer_type_node;
  case BUILT_IN_SADDLL_OVERFLOW:
+  type = type ? type : long_long_integer_type_node;
  case BUILT_IN_UADD_OVERFLOW:
+  type = type ? type : unsigned_type_node;
  case BUILT_IN_UADDL_OVERFLOW:
+  type = type ? type : long_unsigned_type_node;
  case BUILT_IN_UADDLL_OVERFLOW:
+  type = type ? type : long_long_unsigned_type_node;
ifn = IFN_ADD_OVERFLOW;
+  opcode = PLUS_EXPR;


...

I fail to understand this.  You set type to NULL_TREE, and then (not in any
kind of loop, nor any label you could goto to) do:
   type = type ? type : xxx;
That is necessarily equal to type = xxx;, isn't it?


Each label falls through to the next, so the test prevents
the subsequent labels from overwriting the value assigned
to type by the label that matched the switch expression.
I'm not exactly enamored with these repetitive tests but
the only alternative that occurred to me was to duplicate
the whole switch statement, and this seemed like the lesser
of the two evils.  If you have a suggestion for something
better I'd be happy to change it.


+  /* When the last argument is a NULL pointer and the first two arguments
+ are constant, attempt to fold the built-in call into a constant
+ expression indicating whether or not it detected an overflow but
+ without storing the result.  */
+  if ((!arg2 || zerop (arg2))


As I said in another mail, IMNSHO you don't want to change the clang
compatibility builtins, therefore arg2 should never be NULL.


Allowing the last argument to be null is the subject of
the enhancement request in c/68120 (one of the two PRs
driving this enhancement).  The argument can only be null
for these overloads and not the type-generic ones because
the latter use the type to which the pointer points to
determine the type in which to do the arithmetic.  Without
this change, C or C++ 98 users wouldn't be able to use the
built-ins in constant expressions (C++ 98 doesn't allow
casting a null pointer in those contexts).


As it is a pointer, shouldn't the above be integer_zerop instead?


I suspect I didn't use integer_zerop because the argument
is a pointer and not integer, but I'll change it before
I commit the patch.


Regarding the varargs vs. named args only different diagnostics, if you
think it should change, then IMHO you should submit as independent patch
a change to the diagnostics wording, so that the wording is the same, rather
than changing functions from fixed arguments to varargs (/ typegeneric).


I'm not sure I know what part of the patch you're referring
to.  Are you suggesting to submit a separate patch for the
change from "not enough arguments" to "too few arguments"
below?

-   "not enough arguments to function %qE", fndecl);
+   "too few arguments to function %qE", fndecl);

Martin


Re: Remove the unused OMP_CLAUSE_DEVICE_RESIDENT

2016-05-31 Thread Cesar Philippidis
On 05/31/2016 08:45 AM, Thomas Schwinge wrote:

> While working on something else, I came across the following.  Cesar, can
> you please verify that this is really dead code in the Fortran front end,
> which currently is the only producer of OMP_CLAUSE_DEVICE_RESIDENT?

You're correct. Declare should be using map clauses for device resident
and link. I think I ran into this last week when I backported Jakub's
recent gfc_match_omp_clauses changes to gomp4.

> Also, I noticed that the Fortran front end never creates OMP_CLAUSE_MAP
> with GOMP_MAP_LINK or GOMP_MAP_DEVICE_RESIDENT -- how is OpenACC declare
> working at all?  Cesar, if this is not a simple question to answer,
> please file a GCC PR, for somebody to have a look later.

Declare might not be working in fortran, or at least not all of its
clauses. I'll add this on my todo list. Right now I working on a
subarray problem.

Cesar



Re: [PATCH] c++/60760 - arithmetic on null pointers should not be allowed in constant expressions

2016-05-31 Thread Martin Sebor

On 05/17/2016 01:44 PM, Jason Merrill wrote:

On 05/12/2016 06:34 PM, Martin Sebor wrote:

Attached is a resubmission of the patch for c++/60760 originally
submitted late in the 6.0 cycle along with a patch for c++/67376.
Since c++/60760 was not a regression, it was decided that it
would be safer to defer the fix until after the 6.1.0 release.

While retesting this patch I was happy to notice that it also
fixes another bug: c++/71091 - constexpr reference bound to a null
pointer dereference accepted.


I'm not sure why we need to track nullptr_p through everything.  Can't
we set *non_constant_p instead in the places where it's problematic, as
in cxx_eval_binary_expression?


That would have certainly been my preference over adding another
argument to a subset of the functions, making their declarations
inconsistent.  Unfortunately, I couldn't come up with a simpler
way to make it work.

FWIW, my first inclination was to make the nullptr_p flag part
of constexpr_ctx and move the other xxx_p arguments there, to
simplify the interface, until I noticed that all the functions
take a const constexpr_ctx*, preventing them to modify it.  It
seemed like too much of a change to fix a bug in stage 3, but
I'd be comfortable making it in stage 1 if you think it's
worthwhile (though my preference would be to do in a separate
commit, after fixing this bug).



I understand that the complication comes because of needing to allow

constexpr int *p = &*(int*)0;

but I don't see how cxx_eval_component_reference could come up with a
constant value for the referent of a null pointer, so we already reject

struct A { int i; };
constexpr A* p = nullptr;
constexpr int i = p->i;

In cxx_eval_indirect_ref, we could check !lval and reject the
constant-expression at that point.


The new code in cxx_eval_component_reference diagnoses the following
problem that's not detected otherwise:

  struct S { const S *s; };

  constexpr S s = { 0 };

  constexpr const void *p = >s;

Martin


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-05-31 Thread Jakub Jelinek
On Sun, May 01, 2016 at 10:39:48AM -0600, Martin Sebor wrote:
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -7878,50 +7878,101 @@ fold_builtin_unordered_cmp (location_t loc, tree 
> fndecl, tree arg0, tree arg1,
>  
>  static tree
>  fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
> -  tree arg0, tree arg1, tree arg2)
> +  tree arg0, tree arg1, tree arg2 = NULL_TREE)
>  {
>enum internal_fn ifn = IFN_LAST;
> -  tree type = TREE_TYPE (TREE_TYPE (arg2));
> -  tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2);
> +  enum tree_code opcode;
> +
> +  /* For the "generic" overloads, the first two arguments can have different
> + types and the last argument determines the target type to use to check
> + for overflow.  The names of each of the other overloads determines
> + the target type and so the last argument can be omitted.  */
> +  tree type = NULL_TREE;
> +
>switch (fcode)
>  {
>  case BUILT_IN_ADD_OVERFLOW:
> +  type = type ? type : TREE_TYPE (TREE_TYPE (arg2));
>  case BUILT_IN_SADD_OVERFLOW:
> +  type = type ? type : integer_type_node;
>  case BUILT_IN_SADDL_OVERFLOW:
> +  type = type ? type : long_integer_type_node;
>  case BUILT_IN_SADDLL_OVERFLOW:
> +  type = type ? type : long_long_integer_type_node;
>  case BUILT_IN_UADD_OVERFLOW:
> +  type = type ? type : unsigned_type_node;
>  case BUILT_IN_UADDL_OVERFLOW:
> +  type = type ? type : long_unsigned_type_node;
>  case BUILT_IN_UADDLL_OVERFLOW:
> +  type = type ? type : long_long_unsigned_type_node;
>ifn = IFN_ADD_OVERFLOW;
> +  opcode = PLUS_EXPR;

...

I fail to understand this.  You set type to NULL_TREE, and then (not in any
kind of loop, nor any label you could goto to) do:
  type = type ? type : xxx;
That is necessarily equal to type = xxx;, isn't it?
>break;
>  case BUILT_IN_SUB_OVERFLOW:
> +  type = type ? type : TREE_TYPE (TREE_TYPE (arg2));
>  case BUILT_IN_SSUB_OVERFLOW:
> +  type = type ? type : integer_type_node;
>  case BUILT_IN_SSUBL_OVERFLOW:
> +  type = type ? type : long_integer_type_node;
>  case BUILT_IN_SSUBLL_OVERFLOW:
> +  type = type ? type : long_long_integer_type_node;
>  case BUILT_IN_USUB_OVERFLOW:
> +  type = type ? type : unsigned_type_node;
>  case BUILT_IN_USUBL_OVERFLOW:
> +  type = type ? type : long_unsigned_type_node;
>  case BUILT_IN_USUBLL_OVERFLOW:
> +  type = type ? type : long_long_unsigned_type_node;
>ifn = IFN_SUB_OVERFLOW;
> +  opcode = MINUS_EXPR;
>break;
>  case BUILT_IN_MUL_OVERFLOW:
> +  type = type ? type : TREE_TYPE (TREE_TYPE (arg2));
>  case BUILT_IN_SMUL_OVERFLOW:
> +  type = type ? type : integer_type_node;
>  case BUILT_IN_SMULL_OVERFLOW:
> +  type = type ? type : long_integer_type_node;
>  case BUILT_IN_SMULLL_OVERFLOW:
> +  type = type ? type : long_long_integer_type_node;
>  case BUILT_IN_UMUL_OVERFLOW:
> +  type = type ? type : unsigned_type_node;
>  case BUILT_IN_UMULL_OVERFLOW:
> +  type = type ? type : long_unsigned_type_node;
>  case BUILT_IN_UMULLL_OVERFLOW:
> +  type = type ? type : long_long_unsigned_type_node;
>ifn = IFN_MUL_OVERFLOW;
> +  opcode = MULT_EXPR;
>break;
>  default:
>gcc_unreachable ();
>  }
> +
> +  /* When the last argument is a NULL pointer and the first two arguments
> + are constant, attempt to fold the built-in call into a constant
> + expression indicating whether or not it detected an overflow but
> + without storing the result.  */
> +  if ((!arg2 || zerop (arg2))

As I said in another mail, IMNSHO you don't want to change the clang
compatibility builtins, therefore arg2 should never be NULL.
As it is a pointer, shouldn't the above be integer_zerop instead?

Regarding the varargs vs. named args only different diagnostics, if you
think it should change, then IMHO you should submit as independent patch
a change to the diagnostics wording, so that the wording is the same, rather
than changing functions from fixed arguments to varargs (/ typegeneric).

Jakub


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-05-31 Thread Jakub Jelinek
On Tue, May 31, 2016 at 05:31:48PM -0400, Jason Merrill wrote:
> >I'm not quite sure where to move this hunk so that it could be
> >shared or with what.
> >
> >With the patch, fold_builtin_arith_overflow returns the overflow
> >bit alone when the last argument is null.  Otherwise it returns
> >a complex number with both the overflow bit and the result.  Here
> >we need both parts of the result.  I suppose I could factor out
> >the call to size_binop_loc into its own function, have it return
> >the overflow bit, and set a by-reference argument to the arithmetic
> >result so that it could be built into a complex result here but
> >that doesn't seem like it would gain us much, if anything.
> 
> Yeah, I'm not sure what I was thinking.  The patch is OK.

Sorry for not paying attention, but I think it is wrong to change the clang
compatibility builtins at all.  They are provided for clang compatibility,
nothing else, therefore we shouldn't change anything on them.
It is reasonable to extend the GNU builtins.

Jakub


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-05-31 Thread Jason Merrill

On 05/31/2016 04:48 PM, Martin Sebor wrote:

On 05/17/2016 12:54 PM, Jason Merrill wrote:

On 05/01/2016 12:39 PM, Martin Sebor wrote:

+  if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) ==
INTEGER_CST)
+{
+  if (tree result = size_binop_loc (EXPR_LOC_OR_LOC (t,
input_location),
+opcode, arg0, arg1))
+{
+  if (TREE_OVERFLOW (result))
+{
+  /* Reset TREE_OVERFLOW to avoid warnings for the
overflow.  */
+  TREE_OVERFLOW (result) = 0;
+
+  return build_complex (TREE_TYPE (t), result,
integer_one_node);
+}
+
+  return build_complex (TREE_TYPE (t), result, integer_zero_node);
+}
+}


Should this be in the middle-end somewhere, perhaps shared with
fold_builtin_arith_overflow?  I notice that the comment for that
function says that it folds into normal arithmetic if the operation can
never overflow, but I don't see any code that would accomplish that.


I'm not quite sure where to move this hunk so that it could be
shared or with what.

With the patch, fold_builtin_arith_overflow returns the overflow
bit alone when the last argument is null.  Otherwise it returns
a complex number with both the overflow bit and the result.  Here
we need both parts of the result.  I suppose I could factor out
the call to size_binop_loc into its own function, have it return
the overflow bit, and set a by-reference argument to the arithmetic
result so that it could be built into a complex result here but
that doesn't seem like it would gain us much, if anything.


Yeah, I'm not sure what I was thinking.  The patch is OK.

Jason



Re: [PATCH] Fix missing/wrong function declaration in s-osinte-rtems.ads (ada/71317)

2016-05-31 Thread Eric Botcazou
> I attached a patch for 5 branch which will only add the
> clock_getres-function.

Thanks, applied too.

-- 
Eric Botcazou


Re: [PATCH] Fix missing/wrong function declaration in s-osinte-rtems.ads (ada/71317)

2016-05-31 Thread Jan Sommer
Am Dienstag, 31. Mai 2016, 21:00:07 CEST schrieb Eric Botcazou:
> > this patch fixes the build failures of recent gnat compiler version for
> > RTEMS targets (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71317).
> > Attached are patches for trunk, gcc-5-branch and gcc-6-branch.
> > I don't have write access to the svn, so if the patches pass the review
> > process please commit them.
> 
> Patches applied on mainline and 6 branch, but not on the 5 branch since the 
> signature of Get_Page_Size is correct there.

Thank you. 
I attached a patch for 5 branch which will only add the clock_getres-function.



Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog	(Revision 236948)
+++ gcc/ada/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2016-05-31  Jan Sommer 
+
+	PR ada/71317
+	* s-osinte-rtems.ads: Fix missing/wrong function declarations:
+	Missing: clock_getres
+
 2016-05-06  Eric Botcazou  
 
 	PR ada/70969
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads	(Revision 236948)
+++ gcc/ada/s-osinte-rtems.ads	(Arbeitskopie)
@@ -188,6 +188,11 @@ package System.OS_Interface is
   tp   : access timespec) return int;
pragma Import (C, clock_gettime, "clock_gettime");
 
+   function clock_getres
+ (clock_id : clockid_t;
+  res  : access timespec) return int;
+   pragma Import (C, clock_getres, "clock_getres");
+
function To_Duration (TS : timespec) return Duration;
pragma Inline (To_Duration);
 


Re: Moving backwards/FSM threader into its own pass

2016-05-31 Thread Jeff Law

On 05/31/2016 11:38 AM, Richard Biener wrote:


Essentially we want to limit the backwards substitution to single step
within a single block for that case (which is trivially easy).  That
would allow us to run a very cheap threader during early optimizations.


Just do double check - the pass does both forward and backward threading and 
DOM/VRP now do neither themselves?

Will do.  It's pretty easy.  I suspect there's a lot of low hanging fruit.

The backward pass is strictly backward substitution & simplification and 
independent of DOM/VRP.  I believe we're ultimately going to want a 
param or something to determine how far back it goes in the IL.


The old threader (still called from VRP/DOM) is primarily forward based 
using equivalence tables and ASSERT_EXPRs to simplify expressions as it 
walks statements in the block.


I'm pretty sure everything done by the old threader can be more easily 
and efficiently modeled in the backwards threader.  One a couple 
infrastructure items are addressed, I want to start looking at cutting 
back on the amount of work done in the old threader with the goal of 
starting to eliminate calls into it completely.


Jeff


Re: Thoughts on memcmp expansion (PR43052)

2016-05-31 Thread Bernd Schmidt

On 05/13/2016 10:40 PM, Joseph Myers wrote:

On Fri, 13 May 2016, Bernd Schmidt wrote:


Thanks. So, this would seem to suggest that BITS_PER_UNIT in memcmp/memcpy
expansion is more accurate than a plain 8, although pedantically we might want
to use CHAR_TYPE_SIZE? Should I adjust my patch to use the latter or leave
these parts as-is?


I'd say use CHAR_TYPE_SIZE.


Here's the current patch, retested as before. Since I had to make some 
adjustments since the approval to cope with changes such as the removal 
of memcmp_libfunc, I'll leave it for a few days for comments before 
checking it in.



Bernd

Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 236909)
+++ gcc/builtins.c	(working copy)
@@ -3671,53 +3671,24 @@ expand_cmpstr (insn_code icode, rtx targ
   return NULL_RTX;
 }
 
-/* Try to expand cmpstrn or cmpmem operation ICODE with the given operands.
-   ARG3_TYPE is the type of ARG3_RTX.  Return the result rtx on success,
-   otherwise return null.  */
-
-static rtx
-expand_cmpstrn_or_cmpmem (insn_code icode, rtx target, rtx arg1_rtx,
-			  rtx arg2_rtx, tree arg3_type, rtx arg3_rtx,
-			  HOST_WIDE_INT align)
-{
-  machine_mode insn_mode = insn_data[icode].operand[0].mode;
-
-  if (target && (!REG_P (target) || HARD_REGISTER_P (target)))
-target = NULL_RTX;
-
-  struct expand_operand ops[5];
-  create_output_operand ([0], target, insn_mode);
-  create_fixed_operand ([1], arg1_rtx);
-  create_fixed_operand ([2], arg2_rtx);
-  create_convert_operand_from ([3], arg3_rtx, TYPE_MODE (arg3_type),
-			   TYPE_UNSIGNED (arg3_type));
-  create_integer_operand ([4], align);
-  if (maybe_expand_insn (icode, 5, ops))
-return ops[0].value;
-  return NULL_RTX;
-}
-
 /* Expand expression EXP, which is a call to the memcmp built-in function.
Return NULL_RTX if we failed and the caller should emit a normal call,
-   otherwise try to get the result in TARGET, if convenient.  */
+   otherwise try to get the result in TARGET, if convenient.
+   RESULT_EQ is true if we can relax the returned value to be either zero
+   or nonzero, without caring about the sign.  */
 
 static rtx
-expand_builtin_memcmp (tree exp, rtx target)
+expand_builtin_memcmp (tree exp, rtx target, bool result_eq)
 {
   if (!validate_arglist (exp,
  			 POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
 
-  /* Note: The cmpstrnsi pattern, if it exists, is not suitable for
- implementing memcmp because it will stop if it encounters two
- zero bytes.  */
-  insn_code icode = direct_optab_handler (cmpmem_optab, SImode);
-  if (icode == CODE_FOR_nothing)
-return NULL_RTX;
-
   tree arg1 = CALL_EXPR_ARG (exp, 0);
   tree arg2 = CALL_EXPR_ARG (exp, 1);
   tree len = CALL_EXPR_ARG (exp, 2);
+  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+  location_t loc = EXPR_LOCATION (exp);
 
   unsigned int arg1_align = get_pointer_alignment (arg1) / BITS_PER_UNIT;
   unsigned int arg2_align = get_pointer_alignment (arg2) / BITS_PER_UNIT;
@@ -3726,22 +3697,38 @@ expand_builtin_memcmp (tree exp, rtx tar
   if (arg1_align == 0 || arg2_align == 0)
 return NULL_RTX;
 
-  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
-  location_t loc = EXPR_LOCATION (exp);
   rtx arg1_rtx = get_memory_rtx (arg1, len);
   rtx arg2_rtx = get_memory_rtx (arg2, len);
-  rtx arg3_rtx = expand_normal (fold_convert_loc (loc, sizetype, len));
+  rtx len_rtx = expand_normal (fold_convert_loc (loc, sizetype, len));
 
   /* Set MEM_SIZE as appropriate.  */
-  if (CONST_INT_P (arg3_rtx))
+  if (CONST_INT_P (len_rtx))
 {
-  set_mem_size (arg1_rtx, INTVAL (arg3_rtx));
-  set_mem_size (arg2_rtx, INTVAL (arg3_rtx));
+  set_mem_size (arg1_rtx, INTVAL (len_rtx));
+  set_mem_size (arg2_rtx, INTVAL (len_rtx));
 }
 
-  rtx result = expand_cmpstrn_or_cmpmem (icode, target, arg1_rtx, arg2_rtx,
-	 TREE_TYPE (len), arg3_rtx,
-	 MIN (arg1_align, arg2_align));
+  by_pieces_constfn constfn = NULL;
+
+  const char *src_str = c_getstr (arg1);
+  if (src_str == NULL)
+src_str = c_getstr (arg2);
+  else
+std::swap (arg1_rtx, arg2_rtx);
+
+  /* If SRC is a string constant and block move would be done
+ by pieces, we can avoid loading the string from memory
+ and only stored the computed constants.  */
+  if (src_str
+  && CONST_INT_P (len_rtx)
+  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1)
+constfn = builtin_memcpy_read_str;
+
+  rtx result = emit_block_cmp_hints (arg1_rtx, arg2_rtx, len_rtx,
+ TREE_TYPE (len), target,
+ result_eq, constfn,
+ CONST_CAST (char *, src_str));
+
   if (result)
 {
   /* Return the value in the proper mode for this function.  */
@@ -6073,9 +6060,15 @@ expand_builtin (tree exp, rtx target, rt
 
 case BUILT_IN_BCMP:
 case BUILT_IN_MEMCMP:
-  target = expand_builtin_memcmp (exp, target);
+case BUILT_IN_MEMCMP_EQ:
+  target = 

Re: [PATCH] integer overflow checking builtins in constant expressions

2016-05-31 Thread Martin Sebor

On 05/17/2016 12:54 PM, Jason Merrill wrote:

On 05/01/2016 12:39 PM, Martin Sebor wrote:

+  if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) ==
INTEGER_CST)
+{
+  if (tree result = size_binop_loc (EXPR_LOC_OR_LOC (t,
input_location),
+opcode, arg0, arg1))
+{
+  if (TREE_OVERFLOW (result))
+{
+  /* Reset TREE_OVERFLOW to avoid warnings for the overflow.  */
+  TREE_OVERFLOW (result) = 0;
+
+  return build_complex (TREE_TYPE (t), result,
integer_one_node);
+}
+
+  return build_complex (TREE_TYPE (t), result, integer_zero_node);
+}
+}


Should this be in the middle-end somewhere, perhaps shared with
fold_builtin_arith_overflow?  I notice that the comment for that
function says that it folds into normal arithmetic if the operation can
never overflow, but I don't see any code that would accomplish that.


I'm not quite sure where to move this hunk so that it could be
shared or with what.

With the patch, fold_builtin_arith_overflow returns the overflow
bit alone when the last argument is null.  Otherwise it returns
a complex number with both the overflow bit and the result.  Here
we need both parts of the result.  I suppose I could factor out
the call to size_binop_loc into its own function, have it return
the overflow bit, and set a by-reference argument to the arithmetic
result so that it could be built into a complex result here but
that doesn't seem like it would gain us much, if anything.

Do you have any suggestions?

Martin


Re: [PATCH 4/4] C: add fixit hint to misspelled field names

2016-05-31 Thread David Malcolm
Ping:
  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01834.html


On Thu, 2016-04-28 at 10:28 -0400, David Malcolm wrote:
> Similar to the C++ case, but more involved as the location of the
> pertinent token isn't readily available.  The patch adds it as a
> param
> to build_component_ref.  All callers are updated to provide the info,
> apart from objc_build_component_ref; fixing the latter would lead to
> a cascade of other changes, so it's simplest to provide
> UNKNOWN_LOCATION
> there and have build_component_ref fall back gracefully for this case
> to the old behavior of showing a hint in the message, without a fixit
> replacement in the source view.
> 
> This does slightly change the location of the error; before we had:
> 
> test.c:11:13: error: 'union u' has no member named 'colour'; did you
> mean 'color'?
>return ptr->colour;
>  ^~
> 
> with the patch we have:
> 
> test.c:11:15: error: 'union u' has no member named 'colour'; did you
> mean 'color'?
>return ptr->colour;
>^~
>color
> 
> I think the location change is an improvement.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/c/ChangeLog:
>   * c-parser.c (c_parser_postfix_expression): In
> __builtin_offsetof
>   and structure element reference, capture the location of the
>   element name token and pass it to build_component_ref.
>   (c_parser_postfix_expression_after_primary): Likewise for
>   structure element dereference.
>   (c_parser_omp_variable_list): Likewise for
>   OMP_CLAUSE_{_CACHE, MAP, FROM, TO},
>   * c-tree.h (build_component_ref): Add location_t param.
>   * c-typeck.c (build_component_ref): Add location_t param
>   COMPONENT_LOC.  Use it, if available, when issuing hints about
>   mispelled member names to provide a fixit replacement hint.
> 
> gcc/objc/ChangeLog:
>   * objc-act.c (objc_build_component_ref): Update call
>   to build_component_ref for added param, passing
> UNKNOWN_LOCATION.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/spellcheck-fields-2.c: New test case.
> ---
>  gcc/c/c-parser.c   | 34
> +-
>  gcc/c/c-tree.h |  2 +-
>  gcc/c/c-typeck.c   | 26 +++-
> ---
>  gcc/objc/objc-act.c|  3 ++-
>  gcc/testsuite/gcc.dg/spellcheck-fields-2.c | 19 +
>  5 files changed, 68 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields-2.c
> 
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 36c44ab..19e6772 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -7707,8 +7707,9 @@ c_parser_postfix_expression (c_parser *parser)
>  accept sub structure and sub array references.  */
>   if (c_parser_next_token_is (parser, CPP_NAME))
> {
> + c_token *comp_tok = c_parser_peek_token (parser);
>   offsetof_ref = build_component_ref
> -   (loc, offsetof_ref, c_parser_peek_token (parser)
> ->value);
> +   (loc, offsetof_ref, comp_tok->value, comp_tok
> ->location);
>   c_parser_consume_token (parser);
>   while (c_parser_next_token_is (parser, CPP_DOT)
>  || c_parser_next_token_is (parser,
> @@ -7734,9 +7735,10 @@ c_parser_postfix_expression (c_parser *parser)
>   c_parser_error (parser, "expected
> identifier");
>   break;
> }
> + c_token *comp_tok = c_parser_peek_token
> (parser);
>   offsetof_ref = build_component_ref
> -   (loc, offsetof_ref,
> -c_parser_peek_token (parser)->value);
> +   (loc, offsetof_ref, comp_tok->value,
> +comp_tok->location);
>   c_parser_consume_token (parser);
> }
>   else
> @@ -8213,7 +8215,7 @@ c_parser_postfix_expression_after_primary
> (c_parser *parser,
>  {
>struct c_expr orig_expr;
>tree ident, idx;
> -  location_t sizeof_arg_loc[3];
> +  location_t sizeof_arg_loc[3], comp_loc;
>tree sizeof_arg[3];
>unsigned int literal_zero_mask;
>unsigned int i;
> @@ -8327,7 +8329,11 @@ c_parser_postfix_expression_after_primary
> (c_parser *parser,
> c_parser_consume_token (parser);
> expr = default_function_array_conversion (expr_loc, expr);
> if (c_parser_next_token_is (parser, CPP_NAME))
> - ident = c_parser_peek_token (parser)->value;
> + {
> +   c_token *comp_tok = c_parser_peek_token (parser);
> +   ident = comp_tok->value;
> +   comp_loc = comp_tok->location;
> + }
> else
>   {
> c_parser_error (parser, "expected identifier");
> @@ -8339,7 +8345,8 @@ 

Re: Record likely upper bounds for loops

2016-05-31 Thread Bernhard Reutner-Fischer
On May 27, 2016 1:14:09 PM GMT+02:00, Jan Hubicka  wrote:

>+  fprintf (dump_file, "Loop likely %d iterates at most %i
>times.\n", loop->num,
>+ (int)likely_max_loop_iterations_int (loop));

s/Loop likely %d/Loop %d likely/
please.
thanks,



[patch, fortran] PR52393 I/O: "READ format" statement with parenthesized default-char-expr

2016-05-31 Thread Jerry DeLisle
The attached patch fixes this by adding code to match a default character
expression if the left paren is first matched on a READ statement.  If the
expression is matched the dt-format is set and processing continues.  Otherwise
execution drops through to match a control list as usual.

Regression tested on x86-64-linux. Test case attached.

OK for trunk?

Regards,

Jerry

2016-05-30  Jerry DeLisle  

PR fortran/52393
* io.c (match_io): For READ, try to match a default character
expression. If found, set the dt format expression to this,
otherwise go back and try control list.
diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
index da0e1c5..204cce2 100644
--- a/gcc/fortran/io.c
+++ b/gcc/fortran/io.c
@@ -3689,7 +3689,7 @@ match_io (io_kind k)
   gfc_symbol *sym;
   int comma_flag;
   locus where;
-  locus spec_end;
+  locus spec_end, control;
   gfc_dt *dt;
   match m;
 
@@ -3751,21 +3751,56 @@ match_io (io_kind k)
 {
   /* Before issuing an error for a malformed 'print (1,*)' type of
 	 error, check for a default-char-expr of the form ('(I0)').  */
-  if (k == M_PRINT && m == MATCH_YES)
-	{
-	  /* Reset current locus to get the initial '(' in an expression.  */
-	  gfc_current_locus = where;
-	  dt->format_expr = NULL;
-	  m = match_dt_format (dt);
+  if (m == MATCH_YES)
+{
+	  control = gfc_current_locus;
+	  if (k == M_PRINT)
+	{
+	  /* Reset current locus to get the initial '(' in an expression.  */
+	  gfc_current_locus = where;
+	  dt->format_expr = NULL;
+	  m = match_dt_format (dt);
 
-	  if (m == MATCH_ERROR)
-	goto cleanup;
-	  if (m == MATCH_NO || dt->format_expr == NULL)
-	goto syntax;
+	  if (m == MATCH_ERROR)
+		goto cleanup;
+	  if (m == MATCH_NO || dt->format_expr == NULL)
+		goto syntax;
 
-	  comma_flag = 1;
-	  dt->io_unit = default_unit (k);
-	  goto get_io_list;
+	  comma_flag = 1;
+	  dt->io_unit = default_unit (k);
+	  goto get_io_list;
+	}
+	  if (k == M_READ)
+	{
+	  /* Reset current locus to get the initial '(' in an expression.  */
+	  gfc_current_locus = where;
+	  dt->format_expr = NULL;
+	  m = gfc_match_expr (>format_expr);
+	  if (m == MATCH_YES)
+	{
+		  if (dt->format_expr
+		  && dt->format_expr->ts.type == BT_CHARACTER)
+		{
+		  comma_flag = 1;
+		  dt->io_unit = default_unit (k);
+		  goto get_io_list;
+		}
+		  else
+		{
+		  gfc_free_expr (dt->format_expr);
+		  dt->format_expr = NULL;
+		  gfc_current_locus = control;
+		}
+		}
+	  else
+	{
+		  gfc_clear_error ();
+		  gfc_undo_symbols ();
+		  gfc_free_expr (dt->format_expr);
+		  dt->format_expr = NULL;
+		  gfc_current_locus = control;
+		}
+	}
 	}
 }
 
! { dg-do compile }
! PR52392 "READ format" statement with parenthesed default-char-expr
PROGRAM ReadMeTwo
  IMPLICIT NONE
  CHARACTER(10) :: var
  var = "TestStr"
  PRINT ('(') // 'A)', var 
  PRINT ('(') // 'A)', var 
  READ ('(') // 'A)', var  
  PRINT *, var
  READ *, var
END PROGRAM ReadMeTwo



C++ PATCH for c++/60095, partial specialization of variable template

2016-05-31 Thread Jason Merrill
In the implementation of partially specialized variable templates, we 
decided to put the partial specialization template and args in 
DECL_TEMPLATE_INFO rather than the general template and args as we do 
for classes.  This decision came out of the fact that for variables, we 
need to decide what partial specialization we're using in order to get 
the type of the variable, not just for instantiating its value.  But 
that has caused a number of problems with code that expects DECL_TI_ARGS 
to be the general args; I previously fixed 69009 in instantiate_decl by 
adding special handling there.  But the same issue affects many other 
places around the compiler, notably in name mangling; it seems safer to 
go back to the earlier pattern of using the general template/args and 
looking up the partial specialization again at instantiation time.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 51c534b3d0d0c53a1f3b788eb85ff4e70f9ac52a
Author: Jason Merrill 
Date:   Mon May 30 15:38:29 2016 -0400

	PR c++/60095 - partial specialization of variable templates

	PR c++/69515
	PR c++/69009
	* pt.c (instantiate_template_1): Don't put the partial
	specialization in DECL_TI_TEMPLATE.
	(partial_specialization_p, impartial_args): Remove.
	(regenerate_decl_from_template): Add args parm.
	(instantiate_decl): Look up the partial specialization again.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5008b68..70ca207 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -182,7 +182,6 @@ static tree copy_template_args (tree);
 static tree tsubst_template_arg (tree, tree, tsubst_flags_t, tree);
 static tree tsubst_template_args (tree, tree, tsubst_flags_t, tree);
 static tree tsubst_template_parms (tree, tree, tsubst_flags_t);
-static void regenerate_decl_from_template (tree, tree);
 static tree most_specialized_partial_spec (tree, tsubst_flags_t);
 static tree tsubst_aggr_type (tree, tree, tsubst_flags_t, tree, int);
 static tree tsubst_arg_types (tree, tree, tree, tsubst_flags_t, tree);
@@ -17398,6 +17397,7 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
 
   tree pattern = DECL_TEMPLATE_RESULT (gen_tmpl);
 
+  fndecl = NULL_TREE;
   if (VAR_P (pattern))
 {
   /* We need to determine if we're using a partial or explicit
@@ -17409,14 +17409,16 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
 	pattern = error_mark_node;
   else if (elt)
 	{
-	  tmpl = TREE_VALUE (elt);
-	  pattern = DECL_TEMPLATE_RESULT (tmpl);
-	  targ_ptr = TREE_PURPOSE (elt);
+	  tree partial_tmpl = TREE_VALUE (elt);
+	  tree partial_args = TREE_PURPOSE (elt);
+	  tree partial_pat = DECL_TEMPLATE_RESULT (partial_tmpl);
+	  fndecl = tsubst (partial_pat, partial_args, complain, gen_tmpl);
 	}
 }
 
   /* Substitute template parameters to obtain the specialization.  */
-  fndecl = tsubst (pattern, targ_ptr, complain, gen_tmpl);
+  if (fndecl == NULL_TREE)
+fndecl = tsubst (pattern, targ_ptr, complain, gen_tmpl);
   if (DECL_CLASS_SCOPE_P (gen_tmpl))
 pop_nested_class ();
   pop_from_top_level ();
@@ -20888,36 +20890,6 @@ most_general_template (tree decl)
   return decl;
 }
 
-/* True iff the TEMPLATE_DECL tmpl is a partial specialization.  */
-
-static bool
-partial_specialization_p (tree tmpl)
-{
-  /* Any specialization has DECL_TEMPLATE_SPECIALIZATION.  */
-  if (!DECL_TEMPLATE_SPECIALIZATION (tmpl))
-return false;
-  tree t = DECL_TI_TEMPLATE (tmpl);
-  /* A specialization that fully specializes one of the containing classes is
- not a partial specialization.  */
-  return (list_length (DECL_TEMPLATE_PARMS (tmpl))
-	  == list_length (DECL_TEMPLATE_PARMS (t)));
-}
-
-/* If TMPL is a partial specialization, return the arguments for its primary
-   template.  */
-
-static tree
-impartial_args (tree tmpl, tree args)
-{
-  if (!partial_specialization_p (tmpl))
-return args;
-
-  /* If TMPL is a partial specialization, we need to substitute to get
- the args for the primary template.  */
-  return tsubst_template_args (DECL_TI_ARGS (tmpl), args,
-			   tf_warning_or_error, tmpl);
-}
-
 /* Return the most specialized of the template partial specializations
which can produce TARGET, a specialization of some class or variable
template.  The value returned is actually a TREE_LIST; the TREE_VALUE is
@@ -21419,14 +21391,12 @@ do_type_instantiation (tree t, tree storage, tsubst_flags_t complain)
to instantiate the DECL, we regenerate it.  */
 
 static void
-regenerate_decl_from_template (tree decl, tree tmpl)
+regenerate_decl_from_template (tree decl, tree tmpl, tree args)
 {
   /* The arguments used to instantiate DECL, from the most general
  template.  */
-  tree args;
   tree code_pattern;
 
-  args = DECL_TI_ARGS (decl);
   code_pattern = DECL_TEMPLATE_RESULT (tmpl);
 
   /* Make sure that we can see identifiers, and compute access
@@ -21742,7 +21712,7 @@ instantiate_decl (tree d, int defer_ok,
 

[PATCH] Make cp_fold more consistent in its output caching

2016-05-31 Thread Patrick Palka
Some code paths of cp_fold return early instead of falling through to
the end of the function and so in these cases we fail to cache the
return value of the function into the fold_cache.

This patch splits cp_fold into two functions, with the wrapper function
being responsible for caching the output of the worker function.  Does
this look OK to commit after bootstrap and regtest?

gcc/cp/ChangeLog:

* cp-gimplify.c (cp_fold_1): Split out from ...
(cp_fold): ... here.  Cache the output of cp_fold_1 into the
fold_cache.
---
 gcc/cp/cp-gimplify.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 7b63db4..aaf5f48 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 static tree cp_genericize_r (tree *, int *, void *);
 static tree cp_fold_r (tree *, int *, void *);
 static void cp_genericize_tree (tree*);
+static tree cp_fold_1 (tree);
 static tree cp_fold (tree);
 
 /* Local declarations.  */
@@ -1934,11 +1935,7 @@ clear_fold_cache (void)
 static tree
 cp_fold (tree x)
 {
-  tree op0, op1, op2, op3;
-  tree org_x = x, r = NULL_TREE;
-  enum tree_code code;
-  location_t loc;
-  bool rval_ops = true;
+  tree org_x = x;
 
   if (!x || x == error_mark_node)
 return x;
@@ -1957,6 +1954,27 @@ cp_fold (tree x)
   if (tree *cached = fold_cache->get (x))
 return *cached;
 
+  x = cp_fold_1 (x);
+
+  fold_cache->put (org_x, x);
+  /* Prevent that we try to fold an already folded result again.  */
+  if (x != org_x)
+fold_cache->put (x, x);
+
+  return x;
+}
+
+/* Worker function for cp_fold.  */
+
+static tree
+cp_fold_1 (tree x)
+{
+  tree op0, op1, op2, op3;
+  tree org_x = x, r = NULL_TREE;
+  enum tree_code code;
+  location_t loc;
+  bool rval_ops = true;
+
   code = TREE_CODE (x);
   switch (code)
 {
@@ -2319,11 +2337,6 @@ cp_fold (tree x)
   return org_x;
 }
 
-  fold_cache->put (org_x, x);
-  /* Prevent that we try to fold an already folded result again.  */
-  if (x != org_x)
-fold_cache->put (x, x);
-
   return x;
 }
 
-- 
2.9.0.rc0.29.gabd6606



Re: [PATCH] Fix missing/wrong function declaration in s-osinte-rtems.ads (ada/71317)

2016-05-31 Thread Eric Botcazou
> this patch fixes the build failures of recent gnat compiler version for
> RTEMS targets (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71317).
> Attached are patches for trunk, gcc-5-branch and gcc-6-branch.
> I don't have write access to the svn, so if the patches pass the review
> process please commit them.

Patches applied on mainline and 6 branch, but not on the 5 branch since the 
signature of Get_Page_Size is correct there.

-- 
Eric Botcazou


Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-05-31 Thread Evandro Menezes

On 05/31/16 04:27, James Greenhalgh wrote:

On Fri, May 27, 2016 at 05:57:30PM -0500, Evandro Menezes wrote:

On 05/25/16 11:16, James Greenhalgh wrote:

On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:

gcc/
 * config/aarch64/aarch64-protos.h
 (tune_params): Add new member "approx_div_modes".
 (aarch64_emit_approx_div): Declare new function.
 * config/aarch64/aarch64.c
 (generic_tunings): New member "approx_div_modes".
 (cortexa35_tunings): Likewise.
 (cortexa53_tunings): Likewise.
 (cortexa57_tunings): Likewise.
 (cortexa72_tunings): Likewise.
 (exynosm1_tunings): Likewise.
 (thunderx_tunings): Likewise.
 (xgene1_tunings): Likewise.
 (aarch64_emit_approx_div): Define new function.
 * config/aarch64/aarch64.md ("div3"): New expansion.
 * config/aarch64/aarch64-simd.md ("div3"): Likewise.
 * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
 * doc/invoke.texi (-mlow-precision-div): Describe new option.

My comments from the other two patches around using a structure to
group up the tuning flags and whether we really want the new option
apply here too.

This code has no consumers by default and is only used for
-mlow-precision-div. Is this option likely to be useful to our users in
practice? It might all be more palatable under something like the rs6000's
-mrecip=opt .

I agree.  OK as a follow up?

Works for me.


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 47ccb18..7e99e16 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1509,7 +1509,19 @@
[(set_attr "type" "neon_fp_mul_")]
  )
-(define_insn "div3"
+(define_expand "div3"
+ [(set (match_operand:VDQF 0 "register_operand")
+   (div:VDQF (match_operand:VDQF 1 "general_operand")

What does this relaxation to general_operand give you?

Hold that thought...


+(match_operand:VDQF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+DONE;
+
+  operands[1] = force_reg (mode, operands[1]);

...other than the need to do this (sorry if I've missed something obvious).

Hold on...


+  if (num != CONST1_RTX (mode))
+{
+  /* Calculate the approximate division.  */
+  rtx xnum = force_reg (mode, num);
+  emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
+}

About that relaxation, as you can see here, since the series
approximates the reciprocal of the denominator, if the numerator is
1.0, a register can be spared, as the result is ready and the
numerator is not needed.

But, in the case that the multiplication is by 1, can we not rely on the
other optimization machinery eliminating it? I mean, I see the optimization
that this enables for you, but can't you rely on future passes to do the
cleanup, and save yourself the few lines of special casing?


I prefer to not rely on subsequent passes, as, though it may work now, 
that may change in the future.  Methinks that it's a simple enough test 
to achieve better control of the resulting code.



+/* Emit the instruction sequence to compute the approximation for the division
+   of NUM by DEN and return whether the sequence was emitted or not.  */

Needs a brief mention of what we use QUO for :).


OK


+
+bool
+aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
+{
+  machine_mode mode = GET_MODE (quo);
+  bool use_approx_division_p = (flag_mlow_precision_div
+   || (aarch64_tune_params.approx_modes->division
+   & AARCH64_APPROX_MODE (mode)));
+
+  if (!flag_finite_math_only
+  || flag_trapping_math
+  || !flag_unsafe_math_optimizations
+  || optimize_function_for_size_p (cfun)
+  || !use_approx_division_p)
+return false;
+
+  /* Estimate the approximate reciprocal.  */
+  rtx xrcp = gen_reg_rtx (mode);
+  emit_insn ((*get_recpe_type (mode)) (xrcp, den));
+
+  /* Iterate over the series twice for SF and thrice for DF.  */
+  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
+
+  /* Optionally iterate over the series once less for faster performance,
+ while sacrificing the accuracy.  */
+  if (flag_mlow_precision_div)
+iterations--;
+
+  /* Iterate over the series to calculate the approximate reciprocal.  */
+  rtx xtmp = gen_reg_rtx (mode);
+  while (iterations--)
+{
+  emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
+
+  if (iterations > 0)
+   emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
+}
+
+  if (num != CONST1_RTX (mode))
+{
+  /* As the approximate reciprocal of the denominator is already 
calculated,
+ only calculate the approximate division when the numerator is not 
1.0.  */

Long lines.


OK


+  rtx xnum = force_reg (mode, num);
+  emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, 

[gomp4.5] Add support for target {enter,exit} data translation, some map clause diagnostics

2016-05-31 Thread Jakub Jelinek
Hi!

No testcases yet, tested on x86_64-linux, committed to gomp-4_5-branch.

2016-05-31  Jakub Jelinek  

* trans-openmp.c (gfc_trans_omp_target_enter_data,
gfc_trans_omp_target_exit_data): New functions.
(gfc_trans_omp_directive): Handle EXEC_OMP_TARGET_ENTER_DATA
and EXEC_OMP_TARGET_EXIT_DATA.
* openmp.c (resolve_omp_clauses): Diagnose map kinds not allowed
for various constructs.  Diagnose target {enter ,exit ,}data without
any map clauses.

--- gcc/fortran/trans-openmp.c.jj   2016-05-31 14:39:50.0 +0200
+++ gcc/fortran/trans-openmp.c  2016-05-31 18:32:37.742582958 +0200
@@ -4775,6 +4775,36 @@ gfc_trans_omp_target_data (gfc_code *cod
 }
 
 static tree
+gfc_trans_omp_target_enter_data (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt, omp_clauses;
+
+  gfc_start_block ();
+  omp_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
+  code->loc);
+  stmt = build1_loc (input_location, OMP_TARGET_ENTER_DATA, void_type_node,
+omp_clauses);
+  gfc_add_expr_to_block (, stmt);
+  return gfc_finish_block ();
+}
+
+static tree
+gfc_trans_omp_target_exit_data (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt, omp_clauses;
+
+  gfc_start_block ();
+  omp_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
+  code->loc);
+  stmt = build1_loc (input_location, OMP_TARGET_EXIT_DATA, void_type_node,
+omp_clauses);
+  gfc_add_expr_to_block (, stmt);
+  return gfc_finish_block ();
+}
+
+static tree
 gfc_trans_omp_target_update (gfc_code *code)
 {
   stmtblock_t block;
@@ -5060,6 +5090,10 @@ gfc_trans_omp_directive (gfc_code *code)
   return gfc_trans_omp_target (code);
 case EXEC_OMP_TARGET_DATA:
   return gfc_trans_omp_target_data (code);
+case EXEC_OMP_TARGET_ENTER_DATA:
+  return gfc_trans_omp_target_enter_data (code);
+case EXEC_OMP_TARGET_EXIT_DATA:
+  return gfc_trans_omp_target_exit_data (code);
 case EXEC_OMP_TARGET_UPDATE:
   return gfc_trans_omp_target_update (code);
 case EXEC_OMP_TASK:
--- gcc/fortran/openmp.c.jj 2016-05-31 16:20:07.0 +0200
+++ gcc/fortran/openmp.c2016-05-31 19:33:55.339516471 +0200
@@ -4218,6 +4218,62 @@ resolve_omp_clauses (gfc_code *code, gfc
else
  resolve_oacc_data_clauses (n->sym, n->where, name);
  }
+   if (list == OMP_LIST_MAP && !openacc)
+ switch (code->op)
+   {
+   case EXEC_OMP_TARGET:
+   case EXEC_OMP_TARGET_DATA:
+ switch (n->u.map_op)
+   {
+   case OMP_MAP_TO:
+   case OMP_MAP_ALWAYS_TO:
+   case OMP_MAP_FROM:
+   case OMP_MAP_ALWAYS_FROM:
+   case OMP_MAP_TOFROM:
+   case OMP_MAP_ALWAYS_TOFROM:
+   case OMP_MAP_ALLOC:
+ break;
+   default:
+ gfc_error ("TARGET%s with map-type other than TO, "
+"FROM, TOFROM, or ALLOC on MAP clause "
+"at %L",
+code->op == EXEC_OMP_TARGET
+? "" : " DATA", >where);
+ break;
+   }
+ break;
+   case EXEC_OMP_TARGET_ENTER_DATA:
+ switch (n->u.map_op)
+   {
+   case OMP_MAP_TO:
+   case OMP_MAP_ALWAYS_TO:
+   case OMP_MAP_ALLOC:
+ break;
+   default:
+ gfc_error ("TARGET ENTER DATA with map-type other "
+"than TO, or ALLOC on MAP clause at %L",
+>where);
+ break;
+   }
+ break;
+   case EXEC_OMP_TARGET_EXIT_DATA:
+ switch (n->u.map_op)
+   {
+   case OMP_MAP_FROM:
+   case OMP_MAP_ALWAYS_FROM:
+   case OMP_MAP_RELEASE:
+   case OMP_MAP_DELETE:
+ break;
+   default:
+ gfc_error ("TARGET EXIT DATA with map-type other "
+"than FROM, RELEASE, or DELETE on MAP "
+"clause at %L", >where);
+ break;
+   }
+ break;
+   default:
+ break;
+   }
  }
 
if (list != OMP_LIST_DEPEND)
@@ 

Re: [gotools, libcc1] Update copyright dates

2016-05-31 Thread Richard Sandiford
Rainer Orth  writes:
> I noticed that libcc1 copyright dates hadn't been updated since it had
> initially been added.  Looking around, the same applies to gotools.
>
> This patch updates update-copyright.py to take care of that.  At the
> same time, I checked that script, added appropriate entries for the
> other missing directories and removed libmudflap support.  I'm also
> including the results of a update-copyright.py run on gotools and
> libcc1, but am uncertain what to include in default_dirs.
>
> Bootstrapped without regressions on i386-pc-solaris2.12, ok for
> mainline?
>
>   Rainer
>
> 2016-05-25  Rainer Orth  
>
>   libcc1:
>   Update copyrights.
>
>   gotools:
>   Update copyrights.
>
>   contrib:
>   * update-copyright.py (LibMudflapFilter): Remove.
>   (GCCCmdLine.__init__): Add gotools, libcc1.
>   Remove libmudflap.
>   List unhandled intl, libcilkrts, libgo, liboffloadmic,
>   maintainer-scripts.

The changes to update-copyright.py LGTM, thanks.

Richard


Re: [patch] Fix segfault in vectorizer

2016-05-31 Thread Eric Botcazou
> This code appears when we try to disable boolean patterns.  Boolean patterns
> replace booleans with integers of proper size which allow us to simply
> determine vectype using get_vectype_for_scalar_type.  With no such
> replacement we can't determine vectype just out of a scalar type (there are
> multiple possible options and get_vectype_for_scalar_type use would result
> in a lot of redundant conversions).  So we delay vectype computation for
> them and compute it basing on vectypes computed for their arguments.

OK, thanks for the explanation.  It would be nice to document that somewhere 
in the code if this isn't already done.

> Surely any missing vectype for a statement due to this delay is a bug.  I
> didn't notice STMT_VINFO_LIVE_P should be checked in addition to
> STMT_VINFO_RELEVANT_P.

That works indeed and generates no regression.  OK for mainline and 6 branch?


2016-05-31  Eric Botcazou  

* tree-vect-loop.c (vect_determine_vectorization_factor): Also take
into account live statements for mask producers.


2016-05-31  Eric Botcazou  

* gnat.dg/opt56.ad[sb]: New test.

-- 
Eric BotcazouIndex: tree-vect-loop.c
===
--- tree-vect-loop.c	(revision 236877)
+++ tree-vect-loop.c	(working copy)
@@ -441,7 +441,8 @@ vect_determine_vectorization_factor (loo
 		  && is_gimple_assign (stmt)
 		  && gimple_assign_rhs_code (stmt) != COND_EXPR)
 		{
-		  if (STMT_VINFO_RELEVANT_P (stmt_info))
+		  if (STMT_VINFO_RELEVANT_P (stmt_info)
+		  || STMT_VINFO_LIVE_P (stmt_info))
 		mask_producers.safe_push (stmt_info);
 		  bool_result = true;
 


Re: [C++ Patch] PR 70972 ("[6/7 Regression] Inheriting constructors taking parameters by value should move them, not copy")

2016-05-31 Thread Jason Merrill
On Tue, May 31, 2016 at 12:29 PM, Paolo Carlini
 wrote:
> Hi,
>
> On 23/05/2016 19:32, Jason Merrill wrote:
>>
>> OK.
>
> What about gcc-6-branch? Assuming no issues show up in mainline I think we
> want it there too in time for 6.2, right?

Definitely.

Jason


Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-05-31 Thread Jason Merrill
On Tue, May 31, 2016 at 1:18 PM, Martin Sebor  wrote:
> On 03/31/2016 11:54 AM, Jason Merrill wrote:
>>
>> OK, thanks.
>
> This bug was fixed in 6.1 but it is also a GCC 5 regression and
> the patch hasn't been applied there yet.  Should I go ahead and
> backport it to the 5.x branch?

It seems safe enough to me, but since it touches fold-const let's get
a release manager opinion as well.  Jakub?

Jason


Re: Moving backwards/FSM threader into its own pass

2016-05-31 Thread Richard Biener
On May 31, 2016 4:55:36 PM GMT+02:00, Jeff Law  wrote:
>On 05/30/2016 03:16 AM, Richard Biener wrote:
>>
>> Ok, but the placement (and number of) threading passes then no longer
>depends
>> on DOM/VRP passes - and as you placed the threading passes _before_
>those
>> passes the threading itself does not benefit from DOM/VRP but only
>from
>> previous optimization passes.
>Right.  Note that number of passes now is actually the same as we had 
>before, they're just occurring outside DOM/VRP.
>
>The backwards threader's only dependency on DOM/VRP was to propagate 
>constants into PHI nodes and to propagate away copies.  That dependency
>
>was removed.
>
>
>>
>> I see this as opportunity to remove some of them ;)  I now see in the
>main
>> optimization pipeline
>>
>>   NEXT_PASS (pass_fre);
>>   NEXT_PASS (pass_thread_jumps);
>>   NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
>>
>> position makes sense - FRE removed redundancies and fully
>copy/constant
>> propagated the IL.
>>
>>   NEXT_PASS (pass_sra);
>>   /* The dom pass will also resolve all __builtin_constant_p
>calls
>>  that are still there to 0.  This has to be done after some
>>  propagations have already run, but before some more dead
>code
>>  is removed, and this place fits nicely.  Remember this when
>>  trying to move or duplicate pass_dominator somewhere
>earlier.  */
>>   NEXT_PASS (pass_thread_jumps);
>>   NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
>>
>> this position OTOH doesn't make much sense as IL cleanup is missing
>> after SRA and previous opts.  After loop we now have
>We should look at this one closely.  The backwards threader doesn't 
>depend on IL cleanups.  It should do its job regardless of the state of
>
>the IL.
>
>
>>
>>   NEXT_PASS (pass_tracer);
>>   NEXT_PASS (pass_thread_jumps);
>>   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p
>*/);
>>   NEXT_PASS (pass_strlen);
>>   NEXT_PASS (pass_thread_jumps);
>>   NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>>
>> I don't think we want two threadings so close together.  It makes
>some sense
>> to have a threading _after_ DOM but before VRP (DOM cleaned up the
>IL).
>That one is, IMHO, the least useful.  I haven't done any significant 
>analysis of this specific instance though to be sure.  The step you saw
>
>was meant to largely preserve behavior.  Further cleanups are
>definitely 
>possible.
>
>The most common case I've seen where the DOM/VRP make transformations 
>that then expose something useful to the backward threader come from 
>those pesky context sensitive equivalences.
>
>We (primarily Andrew, but Aldy and myself are also involved) are
>looking 
>at ways to more generally expose range information created for these 
>situations.  Exposing range information and getting it more precise by 
>allowing "unnecessary copies" or some such would eliminate those cases 
>where DOM/VRP expose new opportunities for the backwards jump threader.
>
>
>
>
>>
>> So that would leave two from your four passes and expose the
>opportunity
>> to re-add one during early-opts, also after FRE.  That one should be
>> throttled down to operate in "-Os" mode though.
>I'll take a look at them, but with some personal stuff and PTO it'll 
>likely be a few weeks before I've got anything useful.
>
>>
>> So, can you see what removing the two threading passes that don't
>make
>> sense to me do to your statistics?  And check whether a -Os-like
>threading
>> can be done early?
>Interesting you should mention doing threading early -- that was one of
>
>the secondary motivations behind getting the backwards threading bits 
>out into their own pass, I just failed to mention it.
>
>Essentially we want to limit the backwards substitution to single step 
>within a single block for that case (which is trivially easy).  That 
>would allow us to run a very cheap threader during early optimizations.

Just do double check - the pass does both forward and backward threading and 
DOM/VRP now do neither themselves?

Richard.

>Jeff




Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-05-31 Thread Martin Sebor

On 03/31/2016 11:54 AM, Jason Merrill wrote:

OK, thanks.


This bug was fixed in 6.1 but it is also a GCC 5 regression and
the patch hasn't been applied there yet.  Should I go ahead and
backport it to the 5.x branch?

Martin



Re: [AArch64][3/4] Don't generate redundant checks when there is no composite arg

2016-05-31 Thread James Greenhalgh
On Fri, May 06, 2016 at 04:00:40PM +0100, Jiong Wang wrote:
> 2016-05-06  Jiong Wang  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_gimplify_va_arg_expr): Avoid
>   duplicated check code.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/va_arg_4.c: New testcase.

I wonder whether this is safe for the int128_t/uint128_t data types and
overaligned data types? My concern is that something with alignment of
16-bytes may still need the check to skip the final hard register slot
on the stack.

I'm not sure here, so I'll need some more time to think this through, or
for Richard or Marcus to review this and help me understand why there
is nothing to worry about.

Thanks,
James

> From b92a4c4b8e52a9a952e91f307836022f667ab403 Mon Sep 17 00:00:00 2001
> From: "Jiong.Wang" 
> Date: Fri, 6 May 2016 14:37:37 +0100
> Subject: [PATCH 3/4] 3
> 
> ---
>  gcc/config/aarch64/aarch64.c| 94 
> -
>  gcc/testsuite/gcc.target/aarch64/va_arg_4.c | 23 +++
>  2 files changed, 87 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/va_arg_4.c
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b1a0287..06904d5 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9587,6 +9587,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, 
> gimple_seq *pre_p,
>bool indirect_p;
>bool is_ha;/* is HFA or HVA.  */
>bool dw_align; /* double-word align.  */
> +  bool composite_type_p;
>machine_mode ag_mode = VOIDmode;
>int nregs;
>machine_mode mode;
> @@ -9594,13 +9595,14 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, 
> gimple_seq *pre_p,
>tree f_stack, f_grtop, f_vrtop, f_groff, f_vroff;
>tree stack, f_top, f_off, off, arg, roundup, on_stack;
>HOST_WIDE_INT size, rsize, adjust, align;
> -  tree t, u, cond1, cond2;
> +  tree t, t1, u, cond1, cond2;
>  
>indirect_p = pass_by_reference (NULL, TYPE_MODE (type), type, false);
>if (indirect_p)
>  type = build_pointer_type (type);
>  
>mode = TYPE_MODE (type);
> +  composite_type_p = aarch64_composite_type_p (type, mode);
>  
>f_stack = TYPE_FIELDS (va_list_type_node);
>f_grtop = DECL_CHAIN (f_stack);
> @@ -9671,35 +9673,38 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, 
> gimple_seq *pre_p,
> build_int_cst (TREE_TYPE (off), 0));
>cond1 = build3 (COND_EXPR, ptr_type_node, t, NULL_TREE, NULL_TREE);
>  
> -  if (dw_align)
> +  if (composite_type_p)
>  {
> -  /* Emit: offs = (offs + 15) & -16.  */
> -  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
> -   build_int_cst (TREE_TYPE (off), 15));
> -  t = build2 (BIT_AND_EXPR, TREE_TYPE (off), t,
> -   build_int_cst (TREE_TYPE (off), -16));
> -  roundup = build2 (MODIFY_EXPR, TREE_TYPE (off), off, t);
> -}
> -  else
> -roundup = NULL;
> +  if (dw_align)
> + {
> +   /* Emit: offs = (offs + 15) & -16.  */
> +   t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
> +   build_int_cst (TREE_TYPE (off), 15));
> +   t = build2 (BIT_AND_EXPR, TREE_TYPE (off), t,
> +   build_int_cst (TREE_TYPE (off), -16));
> +   roundup = build2 (MODIFY_EXPR, TREE_TYPE (off), off, t);
> + }
> +  else
> + roundup = NULL;
>  
> -  /* Update ap.__[g|v]r_offs  */
> -  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
> -   build_int_cst (TREE_TYPE (off), rsize));
> -  t = build2 (MODIFY_EXPR, TREE_TYPE (f_off), unshare_expr (f_off), t);
> +  /* Update ap.__[g|v]r_offs  */
> +  t = build2 (PLUS_EXPR, TREE_TYPE (off), off,
> +   build_int_cst (TREE_TYPE (off), rsize));
> +  t = build2 (MODIFY_EXPR, TREE_TYPE (f_off), unshare_expr (f_off), t);
>  
> -  /* String up.  */
> -  if (roundup)
> -t = build2 (COMPOUND_EXPR, TREE_TYPE (t), roundup, t);
> +  /* String up.  */
> +  if (roundup)
> + t = build2 (COMPOUND_EXPR, TREE_TYPE (t), roundup, t);
>  
> -  /* [cond2] if (ap.__[g|v]r_offs > 0)  */
> -  u = build2 (GT_EXPR, boolean_type_node, unshare_expr (f_off),
> -   build_int_cst (TREE_TYPE (f_off), 0));
> -  cond2 = build3 (COND_EXPR, ptr_type_node, u, NULL_TREE, NULL_TREE);
> +  /* [cond2] if (ap.__[g|v]r_offs > 0)  */
> +  u = build2 (GT_EXPR, boolean_type_node, unshare_expr (f_off),
> +   build_int_cst (TREE_TYPE (f_off), 0));
> +  cond2 = build3 (COND_EXPR, ptr_type_node, u, NULL_TREE, NULL_TREE);
>  
> -  /* String up: make sure the assignment happens before the use.  */
> -  t = build2 (COMPOUND_EXPR, TREE_TYPE (cond2), t, cond2);
> -  COND_EXPR_ELSE (cond1) = t;
> +  /* String up: make sure the assignment happens before the use.  */
> +  t = build2 (COMPOUND_EXPR, TREE_TYPE (cond2), t, cond2);
> +  COND_EXPR_ELSE (cond1) = t;
> +}
>  
>/* Prepare the trees handling the argument 

[PATCH] Fix PR ada/71358

2016-05-31 Thread Simon Wright
This fixes a minor problem where GNAT.Command_Line.Getopt raises CE if
there are in fact no program-specified options (only the
internally-supplied -h, --help are meant to be available).

Tested on GCC 6.1.0, x86_64-apple-darwin15.

If OK, can someone commit it (I can't).

gcc/ada/Changelog:

2016-05-31  Simon Wright  

PR ada/71358
* g-comlin.adb: bump copyright year.
(Display_Section_Help): don't deference Config.Switches if
it's null.
(Getopt): likewise.



g-comlin.adb.diff
Description: Binary data


Re: [C++ Patch] PR 70972 ("[6/7 Regression] Inheriting constructors taking parameters by value should move them, not copy")

2016-05-31 Thread Paolo Carlini

Hi,

On 23/05/2016 19:32, Jason Merrill wrote:

OK.
What about gcc-6-branch? Assuming no issues show up in mainline I think 
we want it there too in time for 6.2, right?


Thanks,
Paolo.


Re: ]PATCH][RFC] Initial patch for better performance of 64-bit math instructions in 32-bit mode on x86-64

2016-05-31 Thread Uros Bizjak
On Tue, May 31, 2016 at 5:00 PM, Yuri Rumyantsev  wrote:
> Hi Uros,
>
> Here is initial patch to improve performance of 64-bit integer
> arithmetic in 32-bit mode. We discovered that gcc is significantly
> behind icc and clang on rsa benchmark from eembc2.0 suite.
> Te problem function looks like
> typedef unsigned long long ull;
> typedef unsigned long ul;
> ul mul_add(ul *rp, ul *ap, int num, ul w)
>  {
>  ul c1=0;
>  ull t;
>  for (;;)
>   {
>   { t=(ull)w * ap[0] + rp[0] + c1;
>rp[0]= ((ul)t)&0xL; c1= ((ul)((t)>>32))&(0xL); };
>   if (--num == 0) break;
>   { t=(ull)w * ap[1] + rp[1] + c1;
>rp[1]= ((ul)(t))&(0xL); c1= (((ul)((t)>>32))&(0xL)); };
>   if (--num == 0) break;
>   { t=(ull)w * ap[2] + rp[2] + c1;
>rp[2]= (((ul)(t))&(0xL)); c1= (((ul)((t)>>32))&(0xL)); };
>   if (--num == 0) break;
>   { t=(ull)w * ap[3] + rp[3] + c1;
>rp[3]= (((ul)(t))&(0xL)); c1= (((ul)((t)>>32))&(0xL)); };
>   if (--num == 0) break;
>   ap+=4;
>   rp+=4;
>   }
>  return(c1);
>  }
>
> If we apply patch below we will get +6% speed-up for rsa on Silvermont.
>
> The patch looks loke (not complete since there are other 64-bit
> instructions e.g. subtraction):
>
> Index: i386.md
> ===
> --- i386.md (revision 236181)
> +++ i386.md (working copy)
> @@ -5439,7 +5439,7 @@
> (clobber (reg:CC FLAGS_REG))]
>"ix86_binary_operator_ok (PLUS, mode, operands)"
>"#"
> -  "reload_completed"
> +  "1"
>[(parallel [(set (reg:CCC FLAGS_REG)
>(compare:CCC
>  (plus:DWIH (match_dup 1) (match_dup 2))
>
> What is your opinion?

This splitter doesn't depend on hard registers, so there is no
technical obstacle for the proposed patch. OTOH, this is a very old
splitter, it is possible that it was introduced to handle some of
reload deficiencies. Maybe Jeff knows something about this approach.
We have LRA now, so perhaps we have to rethink the purpose of these
DImode splitters.

A pragmatic approach would be - if the patch shows measurable benefit,
and doesn't introduce regressions, then Stage 1 is the time to try it.

BTW: Use "&&  1" in the split condition of the combined insn_and_split
pattern to copy the enable condition from the insn part. If there is
no condition, you should just use "".

Uros.


[PCH] Add GTY marker

2016-05-31 Thread Nathan Sidwell
A bunch of PCH tests fail for a PTX target, due to a difference in debug data. 
It turned out that object pointed to by  cur_line_info_table was poisoned by PCH 
reading, and this led to an is_stmt being emitted erroneously.


My suspicion is that ports that support section switching update 
cur_line_info_table and/or reach that object by another path during PCH writing 
and reading, so don't see this failure.


PTX doesn't override TARGET_HAVE_NAMED_SECTIONS, which might be an oversight. 
However, setting it to false doesn't fix this problem.  Hence I deduce this is 
simply an oversite.


tested on x86_64-linux as well as ptx-none.  Applied as obvious.

nathan
2016-05-31  Nathan Sidwell  

	* dwarf2out.c (cur_line_info_table): Add GTY marker.

Index: dwarf2out.c
===
--- dwarf2out.c	(revision 236774)
+++ dwarf2out.c	(working copy)
@@ -3038,7 +3038,7 @@ static unsigned int line_info_label_num;
 /* The current table to which we should emit line number information
for the current function.  This will be set up at the beginning of
assembly for the function.  */
-static dw_line_info_table *cur_line_info_table;
+static GTY(()) dw_line_info_table *cur_line_info_table;
 
 /* The two default tables of line number info.  */
 static GTY(()) dw_line_info_table *text_section_line_info;


[SH][committed] Use default ASM_OUTPUT_SYMBOL_REF

2016-05-31 Thread Oleg Endo
Hi,

Since the SH5 stuff is gone, there's no need to override the
 ASM_OUTPUT_SYMBOL_REF target macro anymore.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236930.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.h (ASM_OUTPUT_SYMBOL_REF): Remove macro and use the
default implementation.diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
index d724bd2..0403616 100644
--- a/gcc/config/sh/sh.h
+++ b/gcc/config/sh/sh.h
@@ -1718,15 +1718,6 @@ extern bool current_function_interrupt;
? (24) \
: (unsigned) -1)
 
-/* This is how to output a reference to a symbol_ref.  On SH5,
-   references to non-code symbols must be preceded by `datalabel'.  */
-#define ASM_OUTPUT_SYMBOL_REF(FILE,SYM)			\
-  do 			\
-{			\
-  assemble_name ((FILE), XSTR ((SYM), 0));		\
-}			\
-  while (0)
-
 /* This is how to output an assembler line
that says to advance the location counter
to a multiple of 2**LOG bytes.  */


[SH][committeð] Remove SH5 target regs leftovers

2016-05-31 Thread Oleg Endo
Hi,

The attached patch removes the SH5 target register leftovers.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236928.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/constraints.md (b): Remove constraint.
* config/sh/predicates.md (arith_reg_operand): Remove TARGET_REGISTER_P.
* config/sh/sh-modes.def (PDI): Remove.
* config/sh/sh.c (sh_target_reg_class,
sh_optimize_target_register_callee_saved): Remove functions.
(sh_option_override): Don't set MASK_SAVE_ALL_TARGET_REGS.
(sh_expand_epilogue): Update comment.
(sh_hard_regno_mode_ok, sh_register_move_cost, calc_live_regs,
sh_secondary_reload): Remove TARGET_REGS related code.
* config/sh/sh.h (FIRST_TARGET_REG, LAST_TARGET_REG,
TARGET_REGISTER_P): Remove macros.
(SH_DBX_REGISTER_NUMBER, REG_ALLOC_ORDER): Remove target regs.
* config/sh/sh.md (PR_MEDIA_REG, T_MEDIA_REG, FR23_REG, TR0_REG,
TR1_REG, TR2_REG): Remove constants.
* config/sh/sh.opt (SAVE_ALL_TARGET_REGS): Remove.diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md
index 644a0f0..c3e9d55 100644
--- a/gcc/config/sh/constraints.md
+++ b/gcc/config/sh/constraints.md
@@ -62,9 +62,6 @@
 (define_register_constraint "a" "ALL_REGS"
   "@internal")
 
-(define_register_constraint "b" "TARGET_REGS"
-  "Branch target registers.")
-
 (define_register_constraint "c" "FPSCR_REGS"
   "Floating-point status register.")
 
diff --git a/gcc/config/sh/predicates.md b/gcc/config/sh/predicates.md
index 4de90af..4b93c6d 100644
--- a/gcc/config/sh/predicates.md
+++ b/gcc/config/sh/predicates.md
@@ -34,7 +34,6 @@
 	return 1;
 
   return (regno != T_REG && regno != PR_REG
-	  && ! TARGET_REGISTER_P (regno)
 	  && regno != FPUL_REG && regno != FPSCR_REG
 	  && regno != MACH_REG && regno != MACL_REG);
 }
diff --git a/gcc/config/sh/sh-modes.def b/gcc/config/sh/sh-modes.def
index cab2011a..6db9943 100644
--- a/gcc/config/sh/sh-modes.def
+++ b/gcc/config/sh/sh-modes.def
@@ -17,9 +17,6 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-/* PDI mode is used to represent a function address in a target register.  */
-PARTIAL_INT_MODE (DI, 64, PDI);
-
 /* Vector modes.  */
 VECTOR_MODE  (INT, QI, 2);/* V2QI */
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index a36b098..2bd917a 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -234,8 +234,6 @@ static int sh_variable_issue (FILE *, int, rtx_insn *, int);
 static bool sh_function_ok_for_sibcall (tree, tree);
 
 static bool sh_can_follow_jump (const rtx_insn *, const rtx_insn *);
-static reg_class_t sh_target_reg_class (void);
-static bool sh_optimize_target_register_callee_saved (bool);
 static bool sh_ms_bitfield_layout_p (const_tree);
 
 static void sh_init_builtins (void);
@@ -465,11 +463,6 @@ static const struct attribute_spec sh_attribute_table[] =
 
 #undef TARGET_CAN_FOLLOW_JUMP
 #define TARGET_CAN_FOLLOW_JUMP sh_can_follow_jump
-#undef TARGET_BRANCH_TARGET_REGISTER_CLASS
-#define TARGET_BRANCH_TARGET_REGISTER_CLASS sh_target_reg_class
-#undef TARGET_BRANCH_TARGET_REGISTER_CALLEE_SAVED
-#define TARGET_BRANCH_TARGET_REGISTER_CALLEE_SAVED \
-  sh_optimize_target_register_callee_saved
 
 #undef TARGET_MS_BITFIELD_LAYOUT_P
 #define TARGET_MS_BITFIELD_LAYOUT_P sh_ms_bitfield_layout_p
@@ -800,8 +793,6 @@ sh_option_override (void)
   int regno;
 
   SUBTARGET_OVERRIDE_OPTIONS;
-  if (optimize > 1 && !optimize_size)
-target_flags |= MASK_SAVE_ALL_TARGET_REGS;
 
   sh_cpu = PROCESSOR_SH1;
   assembler_dialect = 0;
@@ -7037,30 +7028,6 @@ calc_live_regs (HARD_REG_SET *live_regs_mask)
   if (nosave_low_regs && reg == R8_REG)
 	break;
 }
-  /* If we have a target register optimization pass after prologue / epilogue
- threading, we need to assume all target registers will be live even if
- they aren't now.  */
-  if (flag_branch_target_load_optimize2 && TARGET_SAVE_ALL_TARGET_REGS)
-for (reg = LAST_TARGET_REG; reg >= FIRST_TARGET_REG; reg--)
-  if ((! call_really_used_regs[reg] || interrupt_handler)
-	  && ! TEST_HARD_REG_BIT (*live_regs_mask, reg))
-	{
-	  SET_HARD_REG_BIT (*live_regs_mask, reg);
-	  count += GET_MODE_SIZE (REGISTER_NATURAL_MODE (reg));
-	}
-  /* If this is an interrupt handler, we don't have any call-clobbered
- registers we can conveniently use for target register save/restore.
- Make sure we save at least one general purpose register when we need
- to save target registers.  */
-  if (interrupt_handler
-  && hard_reg_set_intersect_p (*live_regs_mask,
-   reg_class_contents[TARGET_REGS])
-  && ! hard_reg_set_intersect_p (*live_regs_mask,
- 

Re: C/C++ OpenACC routine directive, undeclared name error: try to help the user, once

2016-05-31 Thread Thomas Schwinge
Hi!

On Tue, 31 May 2016 10:26:14 -0400, Nathan Sidwell  wrote:
> 'lexically following' is implementor-speak.  [...]

Thanks for the review, and wording suggestion.

OK for trunk, as follows?

commit 3289032bf7fd7e4a0cce37e7acd71e3330729d83
Author: Thomas Schwinge 
Date:   Tue May 31 17:46:26 2016 +0200

C/C++ OpenACC routine directive, undeclared name error: try to help the 
user, once

gcc/c/
* c-parser.c (c_parser_oacc_routine): If running into an
undeclared name error, try to help the user, once.
gcc/cp/
* parser.c (cp_parser_oacc_routine): If running into an undeclared
name error, try to help the user, once.
gcc/testsuite/
* c-c++-common/goacc/routine-5.c: Update.
---
 gcc/c/c-parser.c | 16 ++--
 gcc/cp/parser.c  | 16 ++--
 gcc/testsuite/c-c++-common/goacc/routine-5.c | 15 ++-
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 993c0a0..d3cab69 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -14003,8 +14003,20 @@ c_parser_oacc_routine (c_parser *parser, enum 
pragma_context context)
{
  decl = lookup_name (token->value);
  if (!decl)
-   error_at (token->location, "%qE has not been declared",
- token->value);
+   {
+ error_at (token->location, "%qE has not been declared",
+   token->value);
+ static bool informed_once = false;
+ if (!informed_once)
+   {
+ inform (token->location,
+ "omit the %<(%E)%>, if you want to mark the"
+ " immediately following function, or place this"
+ " pragma after a declaration of the function to be"
+ " marked", token->value);
+ informed_once = true;
+   }
+   }
  c_parser_consume_token (parser);
}
   else
diff --git gcc/cp/parser.c gcc/cp/parser.c
index 8841666..0c67608 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -36528,8 +36528,20 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token 
*pragma_tok,
 /*optional_p=*/false);
   decl = cp_parser_lookup_name_simple (parser, id, token->location);
   if (id != error_mark_node && decl == error_mark_node)
-   cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
-token->location);
+   {
+ cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
+  token->location);
+ static bool informed_once = false;
+ if (!informed_once)
+   {
+ inform (token->location,
+ "omit the %<(%E)%>, if you want to mark the"
+ " immediately following function, or place this"
+ " pragma after a declaration of the function to be"
+ " marked", id);
+ informed_once = true;
+   }
+   }
 
   if (decl == error_mark_node
  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c 
gcc/testsuite/c-c++-common/goacc/routine-5.c
index 1efd154..def78cd 100644
--- gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -71,7 +71,20 @@ void Foo ()
 
 #pragma acc routine (Foo) gang // { dg-error "must be applied before 
definition" }
 
-#pragma acc routine (Baz) // { dg-error "not been declared" }
+#pragma acc routine (Baz) worker
+/* { dg-error ".Baz. has not been declared" "" { target *-*-* } 74 }
+   Try to help the user:
+   { dg-message "note: omit the .\\(Baz\\)., if" "" { target *-*-* } 74 } */
+
+#pragma acc routine (Baz) vector
+/* { dg-error ".Baz. has not been declared" "" { target *-*-* } 79 }
+   Don't try to help the user again:
+   { dg-bogus "note: omit the .\\(Baz\\)., if" "" { target *-*-* } 79 } */
+
+#pragma acc routine (Qux) seq
+/* { dg-error ".Qux. has not been declared" "" { target *-*-* } 84 }
+   Don't try to help the user again:
+   { dg-bogus "note: omit the .\\(Qux\\)., if" "" { target *-*-* } 84 } */
 
 
 int vb1;   /* { dg-error "directive for use" } */


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: Remove the unused OMP_CLAUSE_DEVICE_RESIDENT

2016-05-31 Thread Jakub Jelinek
On Tue, May 31, 2016 at 05:45:25PM +0200, Thomas Schwinge wrote:
> While working on something else, I came across the following.  Cesar, can
> you please verify that this is really dead code in the Fortran front end,
> which currently is the only producer of OMP_CLAUSE_DEVICE_RESIDENT?
> 
> Also, I noticed that the Fortran front end never creates OMP_CLAUSE_MAP
> with GOMP_MAP_LINK or GOMP_MAP_DEVICE_RESIDENT -- how is OpenACC declare
> working at all?  Cesar, if this is not a simple question to answer,
> please file a GCC PR, for somebody to have a look later.
> 
> Jakub, if the following patch is OK, should I also clean it up on release
> branches as applicable, or just on trunk?

It is ok for trunk, not worth backporting to any branches.
But your ChangeLog doesn't cover the changes you've actually done.

> Remove the unused OMP_CLAUSE_DEVICE_RESIDENT
> 
>   gcc/
>   * tree-core.h (enum omp_clause_code): Remove
>   OMP_CLAUSE_DEVICE_RESIDENT.  Adjust all users.

Jakub


Remove the unused OMP_CLAUSE_DEVICE_RESIDENT

2016-05-31 Thread Thomas Schwinge
Hi!

While working on something else, I came across the following.  Cesar, can
you please verify that this is really dead code in the Fortran front end,
which currently is the only producer of OMP_CLAUSE_DEVICE_RESIDENT?

Also, I noticed that the Fortran front end never creates OMP_CLAUSE_MAP
with GOMP_MAP_LINK or GOMP_MAP_DEVICE_RESIDENT -- how is OpenACC declare
working at all?  Cesar, if this is not a simple question to answer,
please file a GCC PR, for somebody to have a look later.

Jakub, if the following patch is OK, should I also clean it up on release
branches as applicable, or just on trunk?

commit 65f613d59aca51bc6460eaae7ea19871577a7b26
Author: Thomas Schwinge 
Date:   Tue May 31 15:59:24 2016 +0200

Remove the unused OMP_CLAUSE_DEVICE_RESIDENT

gcc/
* tree-core.h (enum omp_clause_code): Remove
OMP_CLAUSE_DEVICE_RESIDENT.  Adjust all users.
---
 gcc/fortran/trans-openmp.c | 3 ---
 gcc/gimplify.c | 5 -
 gcc/omp-low.c  | 2 --
 gcc/tree-core.h| 3 ---
 gcc/tree-pretty-print.c| 3 ---
 gcc/tree.c | 3 ---
 6 files changed, 19 deletions(-)

diff --git gcc/fortran/trans-openmp.c gcc/fortran/trans-openmp.c
index c2d89eb..d3276f9 100644
--- gcc/fortran/trans-openmp.c
+++ gcc/fortran/trans-openmp.c
@@ -1773,9 +1773,6 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
case OMP_LIST_USE_DEVICE:
  clause_code = OMP_CLAUSE_USE_DEVICE_PTR;
  goto add_clause;
-   case OMP_LIST_DEVICE_RESIDENT:
- clause_code = OMP_CLAUSE_DEVICE_RESIDENT;
- goto add_clause;
 
add_clause:
  omp_clauses
diff --git gcc/gimplify.c gcc/gimplify.c
index 8316bb8..e53a2d3 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -7531,10 +7531,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
}
  break;
 
-   case OMP_CLAUSE_DEVICE_RESIDENT:
- remove = true;
- break;
-
case OMP_CLAUSE_NOWAIT:
case OMP_CLAUSE_ORDERED:
case OMP_CLAUSE_UNTIED:
@@ -8268,7 +8264,6 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
gimple_seq body, tree *list_p,
case OMP_CLAUSE__CILK_FOR_COUNT_:
case OMP_CLAUSE_ASYNC:
case OMP_CLAUSE_WAIT:
-   case OMP_CLAUSE_DEVICE_RESIDENT:
case OMP_CLAUSE_INDEPENDENT:
case OMP_CLAUSE_NUM_GANGS:
case OMP_CLAUSE_NUM_WORKERS:
diff --git gcc/omp-low.c gcc/omp-low.c
index a11f44b..77bdb18 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -2200,7 +2200,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
install_var_local (decl, ctx);
  break;
 
-   case OMP_CLAUSE_DEVICE_RESIDENT:
case OMP_CLAUSE__CACHE_:
  sorry ("Clause not supported yet");
  break;
@@ -2368,7 +2367,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
case OMP_CLAUSE__GRIDDIM_:
  break;
 
-   case OMP_CLAUSE_DEVICE_RESIDENT:
case OMP_CLAUSE__CACHE_:
  sorry ("Clause not supported yet");
  break;
diff --git gcc/tree-core.h gcc/tree-core.h
index b069928..db5b470 100644
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -316,9 +316,6 @@ enum omp_clause_code {
  #pragma acc cache (variable-list).  */
   OMP_CLAUSE__CACHE_,
 
-  /* OpenACC clause: device_resident (variable_list).  */
-  OMP_CLAUSE_DEVICE_RESIDENT,
-
   /* OpenACC clause: gang [(gang-argument-list)].
  Where
   gang-argument-list: [gang-argument-list, ] gang-argument
diff --git gcc/tree-pretty-print.c gcc/tree-pretty-print.c
index 0e7fdd1..734ecda 100644
--- gcc/tree-pretty-print.c
+++ gcc/tree-pretty-print.c
@@ -407,9 +407,6 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
int flags)
 case OMP_CLAUSE__LOOPTEMP_:
   name = "_looptemp_";
   goto print_remap;
-case OMP_CLAUSE_DEVICE_RESIDENT:
-  name = "device_resident";
-  goto print_remap;
 case OMP_CLAUSE_TO_DECLARE:
   name = "to";
   goto print_remap;
diff --git gcc/tree.c gcc/tree.c
index f4b470b..7511d0a 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -281,7 +281,6 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_IS_DEVICE_PTR  */
   2, /* OMP_CLAUSE__CACHE_  */
-  1, /* OMP_CLAUSE_DEVICE_RESIDENT  */
   2, /* OMP_CLAUSE_GANG  */
   1, /* OMP_CLAUSE_ASYNC  */
   1, /* OMP_CLAUSE_WAIT  */
@@ -353,7 +352,6 @@ const char * const omp_clause_code_name[] =
   "use_device_ptr",
   "is_device_ptr",
   "_cache_",
-  "device_resident",
   "gang",
   "async",
   "wait",
@@ -11764,7 +11762,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
  /* FALLTHRU */
 
-   case OMP_CLAUSE_DEVICE_RESIDENT:
case OMP_CLAUSE_ASYNC:
case OMP_CLAUSE_WAIT:
case OMP_CLAUSE_WORKER:


Grüße
 Thomas


signature.asc
Description: PGP signature


C++ PATCH for c++/71227 (error specializing friend)

2016-05-31 Thread Jason Merrill
My fix for 70522 caused us to reject this testcase as well.  I think 
that this is correct; I don't see any reason why lookup to find a 
template to specialize should be treated differently from other normal 
lookup.  But we could be more helpful in the diagnostic, especially 
since this breaks previously working code.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit fa370e284e8b04f85fb626b20919f52ef542748f
Author: Jason Merrill 
Date:   Wed May 25 16:10:24 2016 -0400

	PR c++/71227

	* pt.c (check_explicit_specialization): Give better diagnostic about
	specializing a hidden friend.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index df3d634..a57da62 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2807,12 +2807,22 @@ check_explicit_specialization (tree declarator,
 	  /* Find the namespace binding, using the declaration
 		 context.  */
 	  fns = lookup_qualified_name (CP_DECL_CONTEXT (decl), dname,
-	   false, true);
+	   /*type*/false, /*complain*/true,
+	   /*hidden*/true);
 	  if (fns == error_mark_node || !is_overloaded_fn (fns))
 		{
 		  error ("%qD is not a template function", dname);
 		  fns = error_mark_node;
 		}
+	  else if (remove_hidden_names (fns) == NULL_TREE)
+		{
+		  tree fn = get_first_fn (fns);
+		  if (pedwarn (DECL_SOURCE_LOCATION (decl), 0,
+			   "friend declaration %qD is not visible to "
+			   "explicit specialization", fn))
+		inform (DECL_SOURCE_LOCATION (fn),
+			"friend declaration here");
+		}
 	}
 
 	  declarator = lookup_template_function (fns, NULL_TREE);
diff --git a/gcc/testsuite/g++.dg/template/friend62.C b/gcc/testsuite/g++.dg/template/friend62.C
new file mode 100644
index 000..c9796c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/friend62.C
@@ -0,0 +1,16 @@
+// PR c++/71227
+// { dg-options "" }
+
+class A {
+  public:
+template
+  friend int f(int x, T v) { // { dg-message "declaration" }
+return x + v;
+  }
+};
+
+
+template<>
+int f(int x, int v) {		// { dg-warning "friend" }
+  return x + v;
+}



[SH][committed] Simplify DImode add, sub, neg patterns

2016-05-31 Thread Oleg Endo
Hi,

The attached patch simplifies some DImode patterns on SH.  The
force_reg in the expand patterns can also be expressed by using the
appropriate predicate, which eliminates the need for the expand
patterns altogether.

Tested on sh-elf with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r236927.

Cheers,
Oleg

gcc/ChangeLog:
* config/sh/sh.md (adddi3, subdi3, negdi2, abs2): Remove
define_expand patterns.
(adddi3_compact): Rename to adddi3.
(subdi3_compact): Rename to subdi3.
(*negdi2): Rename to negdi2.
(*abs2): Rename to abs2.diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 406721d..30948ca 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -1535,18 +1535,7 @@
 ;; Addition instructions
 ;; -
 
-(define_expand "adddi3"
-  [(set (match_operand:DI 0 "arith_reg_operand")
-	(plus:DI (match_operand:DI 1 "arith_reg_operand")
-		 (match_operand:DI 2 "arith_operand")))]
-  ""
-{
-  operands[2] = force_reg (DImode, operands[2]);
-  emit_insn (gen_adddi3_compact (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
-(define_insn_and_split "adddi3_compact"
+(define_insn_and_split "adddi3"
   [(set (match_operand:DI 0 "arith_reg_dest")
 	(plus:DI (match_operand:DI 1 "arith_reg_operand")
 		 (match_operand:DI 2 "arith_reg_operand")))
@@ -1938,21 +1927,10 @@
 ;; Subtraction instructions
 ;; -
 
-(define_expand "subdi3"
-  [(set (match_operand:DI 0 "arith_reg_operand" "")
-	(minus:DI (match_operand:DI 1 "arith_reg_or_0_operand" "")
-		  (match_operand:DI 2 "arith_reg_operand" "")))]
-  ""
-{
-  operands[1] = force_reg (DImode, operands[1]);
-  emit_insn (gen_subdi3_compact (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
-(define_insn_and_split "subdi3_compact"
+(define_insn_and_split "subdi3"
   [(set (match_operand:DI 0 "arith_reg_dest")
 	(minus:DI (match_operand:DI 1 "arith_reg_operand")
-		 (match_operand:DI 2 "arith_reg_operand")))
+		  (match_operand:DI 2 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
   "TARGET_SH1"
   "#"
@@ -4393,13 +4371,7 @@
 
 ;; Don't split into individual negc insns immediately so that neg:DI (abs:DI)
 ;; can be combined.
-(define_expand "negdi2"
-  [(parallel [(set (match_operand:DI 0 "arith_reg_dest")
-		   (neg:DI (match_operand:DI 1 "arith_reg_operand")))
-	  (clobber (reg:SI T_REG))])]
-  "TARGET_SH1")
-
-(define_insn_and_split "*negdi2"
+(define_insn_and_split "negdi2"
   [(set (match_operand:DI 0 "arith_reg_dest")
 	(neg:DI (match_operand:DI 1 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
@@ -4480,13 +4452,7 @@
 }
   [(set_attr "type" "arith")])
 
-(define_expand "abs2"
-  [(parallel [(set (match_operand:SIDI 0 "arith_reg_dest")
-		   (abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
-	  (clobber (reg:SI T_REG))])]
-  "TARGET_SH1")
-
-(define_insn_and_split "*abs2"
+(define_insn_and_split "abs2"
   [(set (match_operand:SIDI 0 "arith_reg_dest")
   	(abs:SIDI (match_operand:SIDI 1 "arith_reg_operand")))
(clobber (reg:SI T_REG))]


Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.

2016-05-31 Thread James Greenhalgh
On Tue, May 17, 2016 at 09:02:22AM +, Bin Cheng wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in
> AArch64(or AArch32?) backend, and they both had some patches fixing this.
> After investigation, I agree with them that vcond/vcondu in AArch64's backend
> should be re-implemented using vec_cmp/vcond_mask patterns, so here comes
> this patch which is based on Alan's.  This patch supports all vcond/vcondu
> patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to
> the original patch, it doesn't change GCC's expanding process, and it keeps
> vcond patterns.  The patch also introduces vec_cmp*_internal to support
> special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my
> change that shall be blamed if any potential bug is introduced.
> 
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on
> AArch64 (in a following patch).  Bootstrap and test on AArch64.  Is it OK?
> BTW, this patch is necessary for gcc.dg/vect/PR56541.c (on AArch64) which was
> added before in tree if-conversion patch.

Splitting this patch would have been very helpful. One patch each for the
new standard pattern names, and one patch for the refactor of vcond. As
it is, this patch is rather difficult to read.

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index bd73bce..f51473a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1053,7 +1053,7 @@
>  }
>  
>cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], 
> operands[2]);
> -  emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1],
> +  emit_insn (gen_vcondv2div2di (operands[0], operands[1],
>operands[2], cmp_fmt, operands[1], operands[2]));
>DONE;
>  })
> @@ -2225,204 +2225,215 @@
>DONE;
>  })
>  
> -(define_expand "aarch64_vcond_internal"
> +(define_expand "vcond_mask_"
> +  [(match_operand:VALLDI 0 "register_operand")
> +   (match_operand:VALLDI 1 "nonmemory_operand")
> +   (match_operand:VALLDI 2 "nonmemory_operand")
> +   (match_operand: 3 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  /* If we have (a = (P) ? -1 : 0);
> + Then we can simply move the generated mask (result must be int).  */
> +  if (operands[1] == CONSTM1_RTX (mode)
> +  && operands[2] == CONST0_RTX (mode))
> +emit_move_insn (operands[0], operands[3]);
> +  /* Similarly, (a = (P) ? 0 : -1) is just inverting the generated mask.  */
> +  else if (operands[1] == CONST0_RTX (mode)
> +&& operands[2] == CONSTM1_RTX (mode))
> +emit_insn (gen_one_cmpl2 (operands[0], operands[3]));
> +  else
> +{
> +  if (!REG_P (operands[1]))
> + operands[1] = force_reg (mode, operands[1]);
> +  if (!REG_P (operands[2]))
> + operands[2] = force_reg (mode, operands[2]);
> +  emit_insn (gen_aarch64_simd_bsl (operands[0], operands[3],
> +  operands[1], operands[2]));
> +}
> +
> +  DONE;
> +})
> +

This pattern is fine.

> +;; Patterns comparing two vectors to produce a mask.

This comment is insufficient. The logic in vec_cmp_internal
does not always return the expected mask (in particular for NE), but this
is not made clear in the comment.

> +
> +(define_expand "vec_cmp_internal"
>[(set (match_operand:VSDQ_I_DI 0 "register_operand")
> - (if_then_else:VSDQ_I_DI
> -   (match_operator 3 "comparison_operator"
> - [(match_operand:VSDQ_I_DI 4 "register_operand")
> -  (match_operand:VSDQ_I_DI 5 "nonmemory_operand")])
> -   (match_operand:VSDQ_I_DI 1 "nonmemory_operand")
> -   (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))]
> +   (match_operator 1 "comparison_operator"
> + [(match_operand:VSDQ_I_DI 2 "register_operand")
> +  (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
>"TARGET_SIMD"



> +(define_expand "vec_cmp"
> +  [(set (match_operand:VSDQ_I_DI 0 "register_operand")
> +   (match_operator 1 "comparison_operator"
> + [(match_operand:VSDQ_I_DI 2 "register_operand")
> +  (match_operand:VSDQ_I_DI 3 "nonmemory_operand")]))]
> +  "TARGET_SIMD"
> +{
> +  enum rtx_code code = GET_CODE (operands[1]);
> +
> +  emit_insn (gen_vec_cmp_internal (operands[0],
> +  operands[1], operands[2],
> +  operands[3]));
> +  /* See comments in vec_cmp_internal, below
> + operators are computed using other operators, and we need to
> + revert the result.  */

s/revert/invert

There are no comments in vec_cmp_internal for this
to refer to. But, there should be - more comments would definitely help
this patch!

These comments apply equally to the other copies of this comment in the
patch.

> +  if (code == NE)
> +emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
>DONE;
>  })

Re: [RX] Add support for atomic operations

2016-05-31 Thread Oleg Endo
On Tue, 2016-05-31 at 14:17 +0100, Nick Clifton wrote:

> > Sorry, but my original patch was buggy.  There are two problems:
> 
> Thanks for your diligence in checking the patch.

Just eating my own dogfood here ... :)

> Approved - please apply.

Committed as r236926.

Cheers,
Oleg


Re: Do not imply -fweb with -fpeel-loops

2016-05-31 Thread Bill Schmidt
If the decision is taken to remove -fweb, please give me a heads-up as I have a 
target-specific pass that shares infrastructure with it.  I can factor that out 
to facilitate the removal; just let me know.

Thanks!

-- Bill

Bill Schmidt, Ph.D.
GCC for LInux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschm...@linux.vnet.ibm.com




> On May 31, 2016, at 8:50 AM, Richard Biener  
> wrote:
> 
> On Tue, May 31, 2016 at 1:42 PM, Jan Hubicka  wrote:
 I would say we should handle -frename-registers same way as we do 
 -fschedule-insns.
 I.e. enable/disable it on target basis rather than budnle it to loop 
 unrolling.
 But that is for future patch.  I will commit this and propose patch making 
 -fpeel-loops
 independent of rename-registers next
>>> 
>>> Ok.  Does -fweb still do what its documentation says (enable register
>>> allocation to
>>> work on pseudos directly)?  Given its downside and strong GIMPLE
>>> optimizations maybe
>>> it is time to remove it?
>> 
>> I will wait with comitting the renaming patch and we will see the effect with
>> -fno-unroll-loops on ia64 tester (it probably pays back more on in-order
>> target) and we can test effect with -funroll-loops by patching it tomorrow.
>> 
>> I guess IRA does live range pslitting by itself.  Register allocator is not 
>> the
>> only pass which does not like reuse of pseudo.  Web originally also made
>> -fschedule-insns (not -fschedule-insns2) and CSE to work harder.
>> 
>> In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
>> these days is limited to loop unrolling and RTL expansion I think. Without
>> unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
>> try to dig into that. With -funroll-all-loops it renames 17309 registers.
>> Of course it would be possible to write unroller specific renamer, but it 
>> would
>> hardly be any wasier than webizer.
> 
> And we have things like -fsplit-ivs-in-unroller that does part of it...
> 
> Richard.
> 
>> Honza
> 



[PATCH] Support for SPARC M7 and VIS 4.0

2016-05-31 Thread Jose E. Marchesi

This patch adds support for -mcpu=niagara7, corresponding to the SPARC
M7 CPU as documented in the Oracle SPARC Architecture 2015 and the M7
Processor Supplement.  The patch also includes intrinsics support for
all the VIS 4.0 instructions.

This patch has been tested in sparc64-*-linux-gnu, sparcv9-*-linux-gnu
and sparc-sun-solaris2.11 targets.

The support for the new instructions/registers/isas/etc of the M7 is
already committed upstream in binutils.

Some notes on the patch:

- Pipeline description for the M7

  The pipeline description for the M7 is similar to the niagara4
  description, with an exception: the latencies of many VIS instructions
  are different because of the introduction of the so-called "V3 pipe".

  There is a 3-cycle pipeline through which a number of instructions
  flow.  That pipeline was born to process some of the short crypto
  instructions in T4/T5/M5/M6.  Then in M7/T7 some instructions that
  used to have ~11-cycle latency in the FP pipeline (notably, some
  non-FP VIS instructions) were moved to the V3 pipeline (and we will be
  moving more instructions to the V3 in future M* models).  The result
  is that latency for quite a few previously 11-cycle operations has
  dropped to 3 cycles.

  But then, in order to obtain 3-cycle latency the destination register
  of the instruction must be "consumed" by another instruction that also
  executes in the V3 pipeline (in order for the register-bypass path in
  V3 to keep the latency down to 3 cycles).  If the instruction that
  consumes Rd does not execute in the V3 pipeline, then it has to wait
  longer to see the result, because the 3-cycle bypass path only works
  within the V3 pipeline.  The list of instructions using the V3 pipe
  are marked with a (1) superscript in the latencies table (A-1) in the
  M7 processor supplement.

  The niagara7 pipeline description models the V3 pipe using a bypass
  with latency 3, from-to any instruction executing in the V3 pipe.  The
  instructions are identified by mean of a new instruction attribute
  "v3pipe", that has been added to the proper define_insns in sparc.md.

  However, I am not sure how well this will scale ni the future, as in
  future cpu models the subset of instruction executing in the V3 pipe
  will be different than in the M7.  So we need a way to define that
  instruction class that is processor-specific: using v3pipe, v3pipe_m8,
  v3pipe_m9, etc seems a bit messy to me.  Any idea?

  Note that the reason why the empirically observed latencies in the T4
  were different than the documented ones in the T4 supplement (as Dave
  found and fixed in
  https://gcc.gnu.org/ml/gcc-patches/2012-10/msg00934.html) was that the
  HW chaps didn't feel it necessary to document the complexity in the
  PRM, and just assigned a latency of 11 to the VIS instructions.
  Fortunately, the policy changed (as these implementation details are
  obviously very useful to compiler writers) and in the M7 supplement
  these details have been documented, as they will be documented in the
  future models PRMs.

- Changes in the cache parameters for niagara processors

  The Oracle SPARC Architecture (previously the UltraSPARC Architecture)
  specification states that when a PREFETCH[A] instruction is executed
  an implementation-specific amount of data is prefetched, and that it
  is at least 64 bytes long (aligned to at least 64 bytes).

  However, this is not correct.  The M7 (and implementations prior to
  that) does not guarantee a 64B prefetch into a cache if the line size
  is smaller.  A single cache line is all that is ever prefetched.  So
  for the M7, where the L1D$ has 32B lines and the L2D$ and L3 have 64B
  lines, a prefetch will prefetch 64B into the L2 and L3, but only 32B
  are brought into the L1D$. (Assuming it is a read_n prefetch, which is
  the only type which allocates to the L1.)

  We will be fixing the OSA specification to say that PREFETCH
  prefetches 64B or one cache line (whichever is larger) from memory
  into the last-level cache, and that whether and how much data a given
  PREFETCH function code causes to be prefetched from the LLC to higher
  (closer to the processor) levels of caches is
  implementation-dependent.  This accurately describe the behavior for
  every processor at least as far back as T2.

  So, by changing the L1_CACHE_LINE_SIZE parameter to 32B, we get the
  tree SSA loop prefetch pass to unroll loops so in every iteration a
  L1D$ line size is processed, and prefetched ahead.

  Another change is setting PARAM_SIMULTANEOUS_PREFETCHES to 32, as
  opposed to its previous value of 2 for niagara processors.  Unlike in
  the UltraSPARC III, which featured a documented prefetch queue with a
  size of 8, in niagara processors prefetches are handled much like
  regular loads.  The L1 miss buffer is 32 entries, but prefetches start
  getting affected when 30 entries become occupied.  That occupation
  could be a mix of regular loads and prefetches 

Re: [PATCH v4] gcov: Runtime configurable destination output

2016-05-31 Thread Aaron Conole
Nathan Sidwell  writes:

> On 05/26/16 13:08, Aaron Conole wrote:
>> The previous gcov behavior was to always output errors on the stderr channel.
>> This is fine for most uses, but some programs will require stderr to be
>> untouched by libgcov for certain tests. This change allows configuring
>> the gcov output via an environment variable which will be used to open
>> the appropriate file.
>
> this is ok, thanks.
>
> 1) Do you know how to write and format a ChangeLog entry?

I can copy the style used, I hope.

> 2) Are you able to commit the patch yourself (or have someone at RH
> walk you through the process)

I don't know how to do this for gcc, but I will ask.

Thanks for your all your time!

-Aaron


]PATCH][RFC] Initial patch for better performance of 64-bit math instructions in 32-bit mode on x86-64

2016-05-31 Thread Yuri Rumyantsev
Hi Uros,

Here is initial patch to improve performance of 64-bit integer
arithmetic in 32-bit mode. We discovered that gcc is significantly
behind icc and clang on rsa benchmark from eembc2.0 suite.
Te problem function looks like
typedef unsigned long long ull;
typedef unsigned long ul;
ul mul_add(ul *rp, ul *ap, int num, ul w)
 {
 ul c1=0;
 ull t;
 for (;;)
  {
  { t=(ull)w * ap[0] + rp[0] + c1;
   rp[0]= ((ul)t)&0xL; c1= ((ul)((t)>>32))&(0xL); };
  if (--num == 0) break;
  { t=(ull)w * ap[1] + rp[1] + c1;
   rp[1]= ((ul)(t))&(0xL); c1= (((ul)((t)>>32))&(0xL)); };
  if (--num == 0) break;
  { t=(ull)w * ap[2] + rp[2] + c1;
   rp[2]= (((ul)(t))&(0xL)); c1= (((ul)((t)>>32))&(0xL)); };
  if (--num == 0) break;
  { t=(ull)w * ap[3] + rp[3] + c1;
   rp[3]= (((ul)(t))&(0xL)); c1= (((ul)((t)>>32))&(0xL)); };
  if (--num == 0) break;
  ap+=4;
  rp+=4;
  }
 return(c1);
 }

If we apply patch below we will get +6% speed-up for rsa on Silvermont.

The patch looks loke (not complete since there are other 64-bit
instructions e.g. subtraction):

Index: i386.md
===
--- i386.md (revision 236181)
+++ i386.md (working copy)
@@ -5439,7 +5439,7 @@
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "1"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))

What is your opinion?


Re: Moving backwards/FSM threader into its own pass

2016-05-31 Thread Jeff Law

On 05/30/2016 03:16 AM, Richard Biener wrote:


Ok, but the placement (and number of) threading passes then no longer depends
on DOM/VRP passes - and as you placed the threading passes _before_ those
passes the threading itself does not benefit from DOM/VRP but only from
previous optimization passes.
Right.  Note that number of passes now is actually the same as we had 
before, they're just occurring outside DOM/VRP.


The backwards threader's only dependency on DOM/VRP was to propagate 
constants into PHI nodes and to propagate away copies.  That dependency 
was removed.





I see this as opportunity to remove some of them ;)  I now see in the main
optimization pipeline

  NEXT_PASS (pass_fre);
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);

position makes sense - FRE removed redundancies and fully copy/constant
propagated the IL.

  NEXT_PASS (pass_sra);
  /* The dom pass will also resolve all __builtin_constant_p calls
 that are still there to 0.  This has to be done after some
 propagations have already run, but before some more dead code
 is removed, and this place fits nicely.  Remember this when
 trying to move or duplicate pass_dominator somewhere earlier.  */
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);

this position OTOH doesn't make much sense as IL cleanup is missing
after SRA and previous opts.  After loop we now have
We should look at this one closely.  The backwards threader doesn't 
depend on IL cleanups.  It should do its job regardless of the state of 
the IL.





  NEXT_PASS (pass_tracer);
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
  NEXT_PASS (pass_strlen);
  NEXT_PASS (pass_thread_jumps);
  NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);

I don't think we want two threadings so close together.  It makes some sense
to have a threading _after_ DOM but before VRP (DOM cleaned up the IL).
That one is, IMHO, the least useful.  I haven't done any significant 
analysis of this specific instance though to be sure.  The step you saw 
was meant to largely preserve behavior.  Further cleanups are definitely 
possible.


The most common case I've seen where the DOM/VRP make transformations 
that then expose something useful to the backward threader come from 
those pesky context sensitive equivalences.


We (primarily Andrew, but Aldy and myself are also involved) are looking 
at ways to more generally expose range information created for these 
situations.  Exposing range information and getting it more precise by 
allowing "unnecessary copies" or some such would eliminate those cases 
where DOM/VRP expose new opportunities for the backwards jump threader.







So that would leave two from your four passes and expose the opportunity
to re-add one during early-opts, also after FRE.  That one should be
throttled down to operate in "-Os" mode though.
I'll take a look at them, but with some personal stuff and PTO it'll 
likely be a few weeks before I've got anything useful.




So, can you see what removing the two threading passes that don't make
sense to me do to your statistics?  And check whether a -Os-like threading
can be done early?
Interesting you should mention doing threading early -- that was one of 
the secondary motivations behind getting the backwards threading bits 
out into their own pass, I just failed to mention it.


Essentially we want to limit the backwards substitution to single step 
within a single block for that case (which is trivially easy).  That 
would allow us to run a very cheap threader during early optimizations.


Jeff



Re: Further refinement to -Wswitch-unreachable

2016-05-31 Thread Jason Merrill
On Tue, May 31, 2016 at 9:40 AM, Marek Polacek  wrote:
> On Fri, May 27, 2016 at 10:41:51AM -0400, Jason Merrill wrote:
>> On 05/26/2016 02:44 PM, Marek Polacek wrote:
>> > + if (gimple_code (stmt) == GIMPLE_TRY)
>> > {
>> > + /* A compiler-generated cleanup or a user-written try block.
>> > +Try to get the first statement in its try-block, for better
>> > +location.  */
>> > + if ((seq = gimple_try_eval (stmt)))
>> > +   stmt = gimple_seq_first_stmt (seq);
>>
>> Should this loop?  If there are two variables declared, do we get two try
>> blocks?
>
> It looks like we get only one try block, so this doesn't have to loop.  But I
> at least added a new test to make sure we warn even with more decls.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

Jason


Re: [PATCH][RFC] Remove ifcvt_repair_bool_pattern, re-do bool patterns

2016-05-31 Thread Yuri Rumyantsev
Richard,

I built compiler with your patch and did not find out any issues with
vectorization of loops marked with pragma simd. I also noticed that
the size of the vectorized loop looks smaller (I can't tell you exact
numbers since the fresh compiler performs fool unroll even if
"-funroll-loops" option was not passed).

2016-05-30 15:55 GMT+03:00 Ilya Enkovich :
> 2016-05-30 14:04 GMT+03:00 Richard Biener :
>>
>> The following patch removes the restriction on seeing a tree of stmts
>> in vectorizer bool pattern detection (aka single-use).  With this
>> it is no longer necessary to unshare DEFs in ifcvt_repair_bool_pattern
>> and that compile-time hog can go (it's now enabled unconditionally for GCC 
>> 7).
>>
>> Instead the pattern detection code will now "unshare" the condition tree
>> for each bool pattern root my means of adding all pattern stmts of the
>> condition tree to its pattern def sequence (so we still get some
>> unnecessary copying, worst-case quadratic rather than exponential).
>>
>> Ilja - I had to disable the
>>
>>   tree mask_type = get_mask_type_for_scalar_type (TREE_TYPE
>> (rhs1));
>>   if (mask_type
>>   && expand_vec_cmp_expr_p (comp_vectype, mask_type))
>> return false;
>>
>> check you added to check_bool_pattern to get any coverage for bool
>> patterns on x86_64.  Doing that regresses
>>
>> FAIL: gcc.target/i386/mask-pack.c scan-tree-dump-times vect "vectorized 1
>> loops" 10
>> FAIL: gcc.target/i386/mask-unpack.c scan-tree-dump-times vect "vectorized
>> 1 loops" 10
>> FAIL: gcc.target/i386/pr70021.c scan-tree-dump-times vect "vectorized 1
>> loops" 2
>>
>> so somehow bool patterns mess up things here (I didn't investigate).
>> The final patch will enable the above path again, avoiding the regression.
>
> Mask conversion patterns handle some cases bool patterns don't.  So it's
> expected we can't vectorize some loops if bool patterns are enforced.
>
> Thanks,
> Ilya
>
>>
>> Yuri - I suppose you have a larger set of testcases using OMP simd
>> or other forced vectorization you added ifcvt_repair_bool_pattern for.
>> I'd appreciate testing (and testcases if anything fails unexpectedly).
>>
>> Testing on other targets is of course appreciated as well.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu (with the #if 0).
>>
>> Comments?
>>
>> I agree with Ilya elsewhere to remove bool patterns completely
>> (as a first step making them apply to individual stmts).  We do
>> currently not support mixed cond/non-cond uses anyway, like
>>
>> _Bool a[64];
>> unsigned char b[64];
>>
>> void foo (void)
>> {
>>   for (int i = 0; i < 64; ++i)
>> {
>>   _Bool x = a[i] && b[i] < 10;
>>   a[i] = x;
>> }
>> }
>>
>> and stmt-local "patterns" can be added when vectorizing the stmt
>> (like in the above case a tem = a[i] != 0 ? -1 : 0).  Doing
>> bool "promotion" optimally requires a better vectorizer IL
>> (similar to placing of shuffles).
>>
>> Thanks,
>> Richard.
>>


[PATCH][RFC][rtlanal] Fix rtl-optimization/71295

2016-05-31 Thread Kyrill Tkachov

Hi all,

The ICE in this PR occurs when initialising the reginfo for the ira_costs for 
an instruction of the form:
(insn 6 15 16 2 (set (subreg:V2DI (reg/v:EI 111 [ b ]) 0)
(const_vector:V2DI [
(const_int 1 [0x1])
(const_int 1 [0x1])
])) mycrash.c:10 861 {*neon_movv2di}
 (nil))

we're talking about (subreg:V2DI (reg/v:EI 111 [ b ]) 0).
EImode is a 24 byte arm-specific integer mode used to represent 3 D-registers.

record_subregs_of_mode in reginfo.c analyses not just the subreg at offset 0 as 
requested
but also the adjacent V2DI subreg i.e. at offset 16. However, there cannot be a 
second V2DI 16-byte
subreg because EImode is only 24 bytes wide, so we get an ICE in 
subreg_get_info to that effect.

I'm not familiar with the processes in that part of the code but my suspicion 
is that we should be
rejecting that invalid subreg somewhere and this patch does that in 
subreg_get_info a few lines before
the triggering assert. It rejects the subreg by setting info->representable_p 
to false.

This fixes the ICE for me on ARM (reproducible at -O3) and doesn't show any 
regressions on aarch64 and x86_64.

Is this the right way to go around fixing this?

Thanks,
Kyrill

2016-05-31  Kyrylo Tkachov  

PR rtl-optimization/71295
* rtlanal.c (subreg_get_info): If taking a subreg at the requested
offset would go over the size of the inner mode reject it.

2016-05-31  Kyrylo Tkachov  

PR rtl-optimization/71295
* gcc.c-torture/compile/pr71295.c: New test.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index bfefd968b47c530c77c50d72ce42f4a6d4955ea4..6ab96889408bc09815f5490532657ffcecf5c52e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3657,6 +3657,16 @@ subreg_get_info (unsigned int xregno, machine_mode xmode,
 	  info->offset = offset / regsize_xmode;
 	  return;
 	}
+  /* It's not valid to extract a subreg of mode YMODE at OFFSET that
+	 would go outside of XMODE.  */
+  if (!rknown
+	  && GET_MODE_SIZE (ymode) + offset > GET_MODE_SIZE (xmode))
+	{
+	  info->representable_p = false;
+	  info->nregs = nregs_ymode;
+	  info->offset = offset / regsize_xmode;
+	  return;
+	}
   /* Quick exit for the simple and common case of extracting whole
 	 subregisters from a multiregister value.  */
   /* ??? It would be better to integrate this into the code below,
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr71295.c b/gcc/testsuite/gcc.c-torture/compile/pr71295.c
new file mode 100644
index ..d2ec852fd08b307da50bfd993a6c43d42f303037
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr71295.c
@@ -0,0 +1,12 @@
+extern void fn2 (long long);
+int a;
+
+void
+fn1 ()
+{
+  long long b[3];
+  a = 0;
+  for (; a < 3; a++)
+b[a] = 1;
+  fn2 (b[1]);
+}


Re: C/C++ OpenACC routine directive, undeclared name error: try to help the user, once

2016-05-31 Thread Nathan Sidwell

On 05/24/16 12:28, Thomas Schwinge wrote:


+ static bool informed_once = false;
+ if (!informed_once)
+   {
+ inform (token->location,
+ "if the routine directive is meant to apply to the "
+ "lexically following function declaration or "
+ "definition, either don't specify %<(%E)%> here, or "
+ "place a function declaration before the directive",
+ id);
+ informed_once = true;


'lexically following' is implementor-speak.  This error message is for users who 
don't understand the difference between 'declaration' and 'use', so the wording 
should be simplified.   As the most common error is attempting to name the 
immediately following function, place that clue first.  I suggest

  "omit the %<(%E)%>, if you want to mark the immediately following function,
   or place this pragma after a declaration of the function to be marked"

nathan


Re: [PATCH] Fix PR c++/27100

2016-05-31 Thread Jason Merrill
On Mon, May 30, 2016 at 4:19 PM, Patrick Palka  wrote:
> Here's a more straightforward version of the patch that abuses the fact
> that the fields DECL_PENDING_INLINE_INFO and DECL_SAVED_FUNCTION_DATA
> are a union.

That gets into undefined behavior; let's go with the first patch.

Jason


Re: [PATCH] c++/71306 - bogus -Wplacement-new with an array element

2016-05-31 Thread Jason Merrill
On Mon, May 30, 2016 at 6:59 PM, Martin Sebor  wrote:
> On 05/30/2016 12:42 PM, Jason Merrill wrote:
>>
>> OK.
>
> Sorry, I should have asked to begin with: Is it okay to backport
> this fix to the 6.x branch too?

Yes.

Jason


Re: [C++ Patch] PR 71248

2016-05-31 Thread Jason Merrill
OK.

Jason


On Tue, May 31, 2016 at 8:54 AM, Paolo Carlini  wrote:
> Hi,
>
> the primary issue is already fixed, we don't ICE anymore, but the location
> of the error message is (badly) incorrect, because
> check_static_variable_definition doesn't use DECL_SOURCE_LOCATION. Just
> doing that, consistently, plus a number of additional column checks in
> existing testcases amounts to most of the patch. There is a subtlety though:
> as shown by constexpr-static8.C we have been giving redundant diagnostics
> about the out-of-class definitions of the erroneous in-class declarations.
> This is particularly evident now because we would have to get right the
> location of the latter too and DECL_SOURCE_LOCATION doesn't help much for
> those. However, I think we simply want to suppress those messages, thus the
> early return at the beginning of check_static_variable_definition. The
> latter is called only by cp_finish_decl, thus conceivably the same logic
> could be easily shuffled around
>
> Tested x86_64-linux.
>
> Thanks,
> Paolo.
>
> ///


Re: [Patch, ARM] PR71061, length pop* pattern in epilogue correctly

2016-05-31 Thread Kyrill Tkachov

Hi Jiong,

On 19/05/16 09:19, Jiong Wang wrote:



On 13/05/16 14:54, Jiong Wang wrote:

For thumb mode, this is causing wrong size calculation and may affect
some rtl pass, for example bb-order where copy_bb_p needs accurate insn
length info.

This have eventually part of the reason for
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00639.html  where bb-order
failed to do the bb copy.

For the fix, I think we should extend arm_attr_length_push_multi to pop*
pattern.

OK for trunk?

2016-05-13  Jiong. Wang 

gcc/
  PR target/71061
  * config/arm/arm-protos.h (arm_attr_length_push_multi): Rename to
  "arm_attr_length_pp_multi".  Add one parameter "first_index".
  * config/arm/arm.md (*push_multi): Use new function.
  (*load_multiple_with_writeback): Set "length" attribute.
  (*pop_multiple_with_writeback_and_return): Likewise.
  (*pop_multiple_with_return): Likewise.



Wilco pointed out offline the new length function is wrong as for pop we
should check PC_REG instead of LR_REG, meanwhile these pop patterns may
allow ldm thus base register needs to be checked also.

I updated this patch to address these issues.

OK for trunk ?



I agree with the idea of the patch, but I have some comments inline.


gcc/

2016-05-19  Jiong Wang  

PR target/71061
* config/arm/arm-protos.h (arm_attr_length_pop_multi): New declaration.
* config/arm/arm.c (arm_attr_length_pop_multi): New function to return
length for pop patterns.
* config/arm/arm.md (*load_multiple_with_writeback): Set "length" attribute.
(*pop_multiple_with_writeback_and_return): Likewise.
(*pop_multiple_with_return): Likewise.




diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d8179c4..5dd3e0d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -163,6 +163,7 @@ extern const char *arm_output_iwmmxt_shift_immediate (const 
char *, rtx *, bool)
 extern const char *arm_output_iwmmxt_tinsr (rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern int arm_attr_length_pop_multi(rtx *, bool, bool);
 extern void arm_expand_compare_and_swap (rtx op[]);
 extern void arm_split_compare_and_swap (rtx op[]);
 extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c3f74dc..4d19d3a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27812,6 +27812,65 @@ arm_attr_length_push_multi(rtx parallel_op, rtx 
first_op)
   return 4;
 }
 
+/* Compute the atrribute "length" of insn.  Currently, this function is used

+   for "*load_multiple_with_writeback", "*pop_multiple_with_return" and
+   "*pop_multiple_with_writeback_and_return".  */
+

s/atrribute/attribute/

Also, please add a description of the function arguments to the function 
comment.

 +int
+arm_attr_length_pop_multi(rtx *operands, bool return_pc, bool write_back_p)
+{

space between function name and '('.

 +  /* ARM mode.  */
+  if (TARGET_ARM)
+return 4;
+  /* Thumb1 mode.  */
+  if (TARGET_THUMB1)
+return 2;
+
+  /* For POP/PUSH/LDM/STM under Thumb2 mode, we can use 16-bit Thumb encodings
+ if the register list is 8-bit.  Normally this means all registers in the
+ list must be LO_REGS, that is (R0 -R7).  If any HI_REGS used, then we must
+ use 32-bit encodings.  But there are two exceptions to this rule where
+ special HI_REGS used in PUSH/POP.
+
+ 1. 16-bit Thumb encoding POP can use an 8-bit register list, and an
+additional bit, P, that corresponds to the PC.  This means it can 
access
+   any of {PC, R7-R0}.


It took me a bit to figure it out but this bit 'P' you're talking about is a 
just a bit
in the illustration in the ARM Architecture Reference Manual (section A8.8.131).
I don't think it's useful to refer to it.
This would be better phrased as "The encoding is 16 bits wide if the register 
list contains
only the registers in {R0 - R7} or the PC".

 + 2. 16-bit Thumb encoding PUSH can use an 8-bit register list, and an
+   additional bit, M , that corresponds to the LR.  This means it can
+   access any of {LR, R7-R0}.  */
+

This function only deals with POP instructions, so talking about PUSH encodings
is confusing. I suggest you drop point 2).

 +  rtx parallel_op = operands[0];
+  /* Initialize to elements number of PARALLEL.  */
+  unsigned indx = XVECLEN (parallel_op, 0) - 1;
+  /* Initialize the value to base register.  */
+  unsigned regno = REGNO (operands[1]);
+  /* Skip return and write back pattern.
+ We only need register pop pattern for later analysis.  */
+  unsigned first_indx = 0;
+  first_indx += return_pc ? 1 : 0;
+  first_indx += write_back_p ? 1 : 0;
+
+  /* A pop operation can be done through LDM or POP.  If the base register is 
SP
+ and if it's with write back, then a LDM will be alias of POP.  */
+  bool pop_p = (regno == SP_REGNUM && 

Re: Thoughts on memcmp expansion (PR43052)

2016-05-31 Thread Bernd Schmidt

On 05/30/2016 11:37 AM, Florian Weimer wrote:


 * Expand __memcmp_eq for small constant sizes with loads and
   comparison, fall back to a memcmp call.


Should we export such a function from glibc?  I expect it's fairly
common.  Computing the tail difference costs a few cycles.


At the moment the patch ensures that no actual memcmp_eq function is 
called, it's either expanded inline or we fall back to memcmp. But 
there's no reason why it couldn't be added under some appropriate name.



Bernd


Re: [PATCH] Fix 416.gamess miscompare (PR71311)

2016-05-31 Thread Richard Biener
On Tue, 31 May 2016, Richard Biener wrote:

> 
> The following patch adds missing patterns with swapped comparison
> operands for the @0 < @1 and @0 < @2 to @0 < min (@1, @2) transform.
> This nullifies the difference gimplify-into-SSA made on
> rhfuhf.F:ROFOCK which causes the function no longer to be miscompiled
> (by vectorization eventually, -fno-tree-vectorize also fixes the
> miscompare at r235817).

The following is a variant, avoiding the pattern repetition in match.pd
by (finally) completing a local patch I had to support "commutating"
relational operators.  It also adds sanity-checking for what you apply
:c to which catches a few cases in match.pd I introduce :C for in this
patch.

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-05-31  Richard Biener  

PR tree-optimization/71311
* genmatch.c (comparison_code_p): New predicate.
(swap_tree_comparison): New function.
(commutate): Add for_vec parameter to append new for entries.
Support commutating relational operators by swapping it alongside
operands.
(lower_commutative): Adjust.
(dt_simplify::gen): Do not pass artificial operators to gen
functions.
(decision_tree::gen): Do not add artificial operators as parameters.
(parser::parse_expr): Verify operator commutativity when :c is
applied.  Allow :C to override this.
* match.pd: Adjust patterns to use :C instead of :c where required.
Add :c to @0 < @1 && @0 < @2 -> @0 < min(@1,@2) transform.

Index: gcc/genmatch.c
===
*** gcc/genmatch.c  (revision 236877)
--- gcc/genmatch.c  (working copy)
*** commutative_ternary_tree_code (enum tree
*** 291,296 
--- 291,325 
return false;
  }
  
+ /* Return true if CODE is a comparison.  */
+ 
+ bool
+ comparison_code_p (enum tree_code code)
+ {
+   switch (code)
+ {
+ case EQ_EXPR:
+ case NE_EXPR:
+ case ORDERED_EXPR:
+ case UNORDERED_EXPR:
+ case LTGT_EXPR:
+ case UNEQ_EXPR:
+ case GT_EXPR:
+ case GE_EXPR:
+ case LT_EXPR:
+ case LE_EXPR:
+ case UNGT_EXPR:
+ case UNGE_EXPR:
+ case UNLT_EXPR:
+ case UNLE_EXPR:
+   return true;
+ 
+ default:
+   break;
+ }
+   return false;
+ }
+ 
  
  /* Base class for all identifiers the parser knows.  */
  
*** get_operator (const char *id, bool allow
*** 528,533 
--- 557,598 
return operators->find_with_hash (, tem.hashval);
  }
  
+ /* Return the comparison operators that results if the operands are
+swapped.  This is safe for floating-point.  */
+ 
+ id_base *
+ swap_tree_comparison (operator_id *p)
+ {
+   switch (p->code)
+ {
+ case EQ_EXPR:
+ case NE_EXPR:
+ case ORDERED_EXPR:
+ case UNORDERED_EXPR:
+ case LTGT_EXPR:
+ case UNEQ_EXPR:
+   return p;
+ case GT_EXPR:
+   return get_operator ("LT_EXPR");
+ case GE_EXPR:
+   return get_operator ("LE_EXPR");
+ case LT_EXPR:
+   return get_operator ("GT_EXPR");
+ case LE_EXPR:
+   return get_operator ("GE_EXPR");
+ case UNGT_EXPR:
+   return get_operator ("UNLT_EXPR");
+ case UNGE_EXPR:
+   return get_operator ("UNLE_EXPR");
+ case UNLT_EXPR:
+   return get_operator ("UNGT_EXPR");
+ case UNLE_EXPR:
+   return get_operator ("UNGE_EXPR");
+ default:
+   gcc_unreachable ();
+ }
+ }
+ 
  typedef hash_map cid_map_t;
  
  
*** cartesian_product (const vec< vec
! commutate (operand *op)
  {
vec ret = vNULL;
  
--- 881,887 
  /* Lower OP to two operands in case it is marked as commutative.  */
  
  static vec
! commutate (operand *op, vec _vec)
  {
vec ret = vNULL;
  
*** commutate (operand *op)
*** 827,833 
  ret.safe_push (op);
  return ret;
}
!   vec v = commutate (c->what);
for (unsigned i = 0; i < v.length (); ++i)
{
  capture *nc = new capture (c->location, c->where, v[i]);
--- 892,898 
  ret.safe_push (op);
  return ret;
}
!   vec v = commutate (c->what, for_vec);
for (unsigned i = 0; i < v.length (); ++i)
{
  capture *nc = new capture (c->location, c->where, v[i]);
*** commutate (operand *op)
*** 845,851 
  
vec< vec > ops_vector = vNULL;
for (unsigned i = 0; i < e->ops.length (); ++i)
! ops_vector.safe_push (commutate (e->ops[i]));
  
auto_vec< vec > result;
auto_vec v (e->ops.length ());
--- 910,916 
  
vec< vec > ops_vector = vNULL;
for (unsigned i = 0; i < e->ops.length (); ++i)
! ops_vector.safe_push (commutate (e->ops[i], for_vec));
  
auto_vec< vec > result;
auto_vec v (e->ops.length ());
*** commutate (operand *op)
*** 868,873 
--- 933,982 
for 

Re: [PATCH] Improve *avx_vperm_broadcast_*

2016-05-31 Thread Jakub Jelinek
On Tue, May 31, 2016 at 06:54:14AM -0700, H.J. Lu wrote:
> On Mon, May 23, 2016 at 10:15 AM, Jakub Jelinek  wrote:
> > Hi!
> >
> > The vbroadcastss and vpermilps insns are already in AVX512F & AVX512VL,
> > so can be used with v instead of x, the splitter case where we for AVX
> > emit vpermilps plus vpermf128 is more problematic, because the latter
> > insn isn't available in EVEX.  But, we can get the same effect with
> > vshuff32x4 when both source operands are the same.
> > Alternatively, we could replace the vpermilps and vshuff32x4 insns
> > with the AVX512VL arbitrary permutations I think, the question is
> > what is faster, because we'd need to load the mask from memory.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2016-05-23  Jakub Jelinek  
> >
> > * config/i386/sse.md
> > (avx512vl_shuf_32x4_1): Rename
> > to ...
> > (avx512vl_shuf_32x4_1): ... this.
> > (*avx_vperm_broadcast_v4sf): Use v constraint instead of x.  Use
> > maybe_evex prefix instead of vex.
> > (*avx_vperm_broadcast_): Use v constraint instead of x.  
> > Handle
> > EXT_REX_SSE_REG_P (op0) case in the splitter.
> >
> > * gcc.target/i386/avx512vl-vbroadcast-3.c: New test.
> >
> 
> The new test fails on x32 due to 32-bit register in address.  This
> patch fixes it.  Tested on x86-64.  OK for trunk?

Ok, thanks.
> 2016-05-31  H.J. Lu  
> 
> * gcc.target/i386/avx512vl-vbroadcast-3.c: Scan %\[re\]di
> instead of %rdi.
> * gcc.target/i386/avx512vl-vcvtps2ph-3.c: Likewise.

Jakub


Re: [PATCH] Improve *avx_vperm_broadcast_*

2016-05-31 Thread H.J. Lu
On Mon, May 23, 2016 at 10:15 AM, Jakub Jelinek  wrote:
> Hi!
>
> The vbroadcastss and vpermilps insns are already in AVX512F & AVX512VL,
> so can be used with v instead of x, the splitter case where we for AVX
> emit vpermilps plus vpermf128 is more problematic, because the latter
> insn isn't available in EVEX.  But, we can get the same effect with
> vshuff32x4 when both source operands are the same.
> Alternatively, we could replace the vpermilps and vshuff32x4 insns
> with the AVX512VL arbitrary permutations I think, the question is
> what is faster, because we'd need to load the mask from memory.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-23  Jakub Jelinek  
>
> * config/i386/sse.md
> (avx512vl_shuf_32x4_1): Rename
> to ...
> (avx512vl_shuf_32x4_1): ... this.
> (*avx_vperm_broadcast_v4sf): Use v constraint instead of x.  Use
> maybe_evex prefix instead of vex.
> (*avx_vperm_broadcast_): Use v constraint instead of x.  Handle
> EXT_REX_SSE_REG_P (op0) case in the splitter.
>
> * gcc.target/i386/avx512vl-vbroadcast-3.c: New test.
>

The new test fails on x32 due to 32-bit register in address.  This
patch fixes it.  Tested on x86-64.  OK for trunk?

Thanks.

H.J.

2016-05-31  H.J. Lu  

* gcc.target/i386/avx512vl-vbroadcast-3.c: Scan %\[re\]di
instead of %rdi.
* gcc.target/i386/avx512vl-vcvtps2ph-3.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
index d981fe4..7233398 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vbroadcast-3.c
@@ -150,9 +150,9 @@ f16 (V2 *x)
   asm volatile ("" : "+v" (a));
 }

-/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%xmm16" 4 } } */
+/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%xmm16" 4 } } */
 /* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%xmm16\[^\n\r]*%ymm16" 3 } } */
-/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%rdi\[^\n\r]*%ymm16" 3 } } */
+/* { dg-final { scan-assembler-times
"vbroadcastss\[^\n\r]*%\[re\]di\[^\n\r]*%ymm16" 3 } } */
 /* { dg-final { scan-assembler-times
"vpermilps\[^\n\r]*\\\$0\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
 /* { dg-final { scan-assembler-times
"vpermilps\[^\n\r]*\\\$85\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
 /* { dg-final { scan-assembler-times
"vpermilps\[^\n\r]*\\\$170\[^\n\r]*%xmm16\[^\n\r]*%xmm16" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
index 2fd2215..c2e3f01 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vcvtps2ph-3.c
@@ -38,4 +38,4 @@ f3 (__m256 x, __v8hi *y)
   *y = (__v8hi) _mm256_cvtps_ph (a, 1);
 }

-/* { dg-final { scan-assembler
"vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%rdi" } } */
+/* { dg-final { scan-assembler
"vcvtps2ph\[^\n\r]*\\\$1\[^\n\r]*%ymm16\[^\n\r]*%\[re\]di" } } */


Re: Do not imply -fweb with -fpeel-loops

2016-05-31 Thread Richard Biener
On Tue, May 31, 2016 at 1:42 PM, Jan Hubicka  wrote:
>> > I would say we should handle -frename-registers same way as we do 
>> > -fschedule-insns.
>> > I.e. enable/disable it on target basis rather than budnle it to loop 
>> > unrolling.
>> > But that is for future patch.  I will commit this and propose patch making 
>> > -fpeel-loops
>> > independent of rename-registers next
>>
>> Ok.  Does -fweb still do what its documentation says (enable register
>> allocation to
>> work on pseudos directly)?  Given its downside and strong GIMPLE
>> optimizations maybe
>> it is time to remove it?
>
> I will wait with comitting the renaming patch and we will see the effect with
> -fno-unroll-loops on ia64 tester (it probably pays back more on in-order
> target) and we can test effect with -funroll-loops by patching it tomorrow.
>
> I guess IRA does live range pslitting by itself.  Register allocator is not 
> the
> only pass which does not like reuse of pseudo.  Web originally also made
> -fschedule-insns (not -fschedule-insns2) and CSE to work harder.
>
> In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
> these days is limited to loop unrolling and RTL expansion I think. Without
> unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
> try to dig into that. With -funroll-all-loops it renames 17309 registers.
> Of course it would be possible to write unroller specific renamer, but it 
> would
> hardly be any wasier than webizer.

And we have things like -fsplit-ivs-in-unroller that does part of it...

Richard.

> Honza


[committed] FIx up lastprivate handling in doacross loops

2016-05-31 Thread Jakub Jelinek
Hi!

Here is the promissed fix for lastprivate.  We just need to do in the FE
what gimplifier does when using a private counter because the normal counter
is addressable - because we do the similar thing in the FE.

Tested on x86_64-linux, committed to gomp-4_5-branch.

2016-05-31  Jakub Jelinek  

* gimplify.c (gimplify_scan_omp_clauses) : Add
missing break at the end of OMP_CLAUSE_DEPEND_SINK case.

* trans-openmp.c (gfc_trans_omp_do): For doacross loop, put into
OMP_CLAUSE_LASTPRIVATE_STMT dovar = count * step + from or
dovar = (count + 1) * step + from instead of dovar = dovar + step.

* testsuite/libgomp.fortran/doacross3.f90: New test.

--- gcc/gimplify.c.jj   2016-05-27 18:32:16.0 +0200
+++ gcc/gimplify.c  2016-05-31 15:06:06.504711812 +0200
@@ -7188,6 +7188,7 @@ gimplify_scan_omp_clauses (tree *list_p,
   pre_p, NULL, is_gimple_val, fb_rvalue);
  deps = TREE_CHAIN (deps);
}
+ break;
}
  else if (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SOURCE)
break;
--- gcc/fortran/trans-openmp.c.jj   2016-05-30 18:36:42.0 +0200
+++ gcc/fortran/trans-openmp.c  2016-05-31 14:39:50.693158672 +0200
@@ -3637,9 +3637,24 @@ gfc_trans_omp_do (gfc_code *code, gfc_ex
 OMP_CLAUSE_LASTPRIVATE_STMT, otherwise the copied dovar
 will have the value on entry of the last loop, rather
 than value after iterator increment.  */
- tmp = gfc_evaluate_now (step, pblock);
- tmp = fold_build2_loc (input_location, PLUS_EXPR, type, dovar,
-tmp);
+ if (clauses->orderedc)
+   {
+ if (clauses->collapse <= 1 || i >= clauses->collapse)
+   tmp = count;
+ else
+   tmp = fold_build2_loc (input_location, PLUS_EXPR,
+  type, count, build_one_cst (type));
+ tmp = fold_build2_loc (input_location, MULT_EXPR, type,
+tmp, step);
+ tmp = fold_build2_loc (input_location, PLUS_EXPR, type,
+from, tmp);
+   }
+ else
+   {
+ tmp = gfc_evaluate_now (step, pblock);
+ tmp = fold_build2_loc (input_location, PLUS_EXPR, type,
+dovar, tmp);
+   }
  tmp = fold_build2_loc (input_location, MODIFY_EXPR, type,
 dovar, tmp);
  for (c = omp_clauses; c ; c = OMP_CLAUSE_CHAIN (c))
--- libgomp/testsuite/libgomp.fortran/doacross3.f90.jj  2016-05-31 
15:03:08.520012394 +0200
+++ libgomp/testsuite/libgomp.fortran/doacross3.f90 2016-05-31 
15:01:34.0 +0200
@@ -0,0 +1,261 @@
+! { dg-do run }
+
+  integer, parameter :: N = 256
+  integer, save :: a(N), b(N / 16, 8, 4), c(N / 32, 8, 8), g(N/16,8,6)
+  integer, save, volatile :: d, e
+  integer(kind=8), save, volatile :: f
+  integer(kind=8) :: i
+  integer :: j, k, l, m
+  integer :: m1, m2, m3, m4, m5, m6, m7, m8
+  integer :: m9, m10, m11, m12, m13, m14, m15, m16
+  d = 0
+  e = 0
+  f = 0
+  !$omp parallel private (l) shared(k)
+!$omp do schedule(guided, 3) ordered(1)
+do i = 2, N + f, f + 1
+  !$omp atomic write
+  a(i) = 1
+  !$omp ordered depend ( sink : i - 1 )
+  if (i.gt.2) then
+!$omp atomic read
+l = a(i - 1)
+if (l.lt.2) call abort
+  end if
+  !$omp atomic write
+  a(i) = 2
+  if (i.lt.N) then
+!$omp atomic read
+l = a(i + 1)
+if (l.eq.3) call abort
+  end if
+  !$omp ordered depend(source)
+  !$omp atomic write
+  a(i) = 3
+end do
+!$omp end do nowait
+!$omp do schedule(guided) ordered ( 3 )
+do i = 4, N / 16 - 1 + f, 1 + f
+  do j = 1, 8, d + 2
+do k = 2, 4, 1 + d
+  !$omp atomic write
+  b(i, j, k) = 1
+  !$omp ordered depend(sink:i,j-2,k-1) &
+  !$omp& depend(sink: i - 2, j - 2, k + 1)
+  !$omp ordered depend(sink:i-3,j+2,k-2)
+  if (j.gt.2.and.k.gt.2) then
+!$omp atomic read
+l = b(i,j-2,k-1)
+if (l.lt.2) call abort
+  end if
+  !$omp atomic write
+  b(i,j,k) = 2
+  if (i.gt.5.and.j.gt.2.and.k.lt.4) then
+!$omp atomic read
+l = b(i-2,j-2, k+1)
+if (l.lt.2) call abort
+  end if
+  if (i.gt.6.and.j.le.N/16-3.and.k.eq.4) then
+!$omp atomic read
+l = b( i - 3, j+2, k-2)
+if (l.lt.2) call abort
+  end if
+  !$omp ordered depend(source)
+  !$omp atomic write
+  b(i, j, k) = 3
+end do
+  end do
+end do
+!$omp 

Re: [wwwdocs] Remove broken link to old Intel handook in projects/tree-ssa

2016-05-31 Thread David Malcolm
On Sun, 2016-05-29 at 10:11 +0200, Gerald Pfeifer wrote:
> This one tries to redirect on the Intel side, and then even 
> leads to a "Server not found"
> 
> Committed.
> 
> Gerald
> 
> Index: htdocs/projects/tree-ssa/vectorization.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/projects/tree
> -ssa/vectorization.html,v
> retrieving revision 1.31
> diff -u -r1.31 vectorization.html
> --- htdocs/projects/tree-ssa/vectorization.html   29 Jun 2014
> 11:31:34 -1.31
> +++ htdocs/projects/tree-ssa/vectorization.html   29 May 2016
> 08:08:08 -
> @@ -1548,8 +1548,7 @@
>  
>  "The Software Vectorization Handbook. Applying
> Multimedia
>  Extensions for Maximum Performance.", Aart Bik, Intel Press,
> -June 2004. http://www.intel.com/intelpress/sum_vmmx.htm;>;
> -http://www.intel.com/intelpress/sum_vmmx.htm;
> +June 2004.

The author's website has a description of the book here:
  http://www.aartbik.com/SSE/index.html
(with links to errata and code samples).

Maybe we should use that link?


Dave


Re: Further refinement to -Wswitch-unreachable

2016-05-31 Thread Marek Polacek
On Fri, May 27, 2016 at 10:41:51AM -0400, Jason Merrill wrote:
> On 05/26/2016 02:44 PM, Marek Polacek wrote:
> > + if (gimple_code (stmt) == GIMPLE_TRY)
> > {
> > + /* A compiler-generated cleanup or a user-written try block.
> > +Try to get the first statement in its try-block, for better
> > +location.  */
> > + if ((seq = gimple_try_eval (stmt)))
> > +   stmt = gimple_seq_first_stmt (seq);
> 
> Should this loop?  If there are two variables declared, do we get two try
> blocks?

It looks like we get only one try block, so this doesn't have to loop.  But I
at least added a new test to make sure we warn even with more decls.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-31  Marek Polacek  

* gimplify.c (gimplify_switch_expr): Also handle GIMPLE_TRY.

* c-c++-common/Wswitch-unreachable-3.c: New test.
* g++.dg/warn/Wswitch-unreachable-1.C: New test.

diff --git gcc/gimplify.c gcc/gimplify.c
index 8316bb8..8b7dddc 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -1609,10 +1609,17 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
  while (gimple_code (seq) == GIMPLE_BIND)
seq = gimple_bind_body (as_a  (seq));
  gimple *stmt = gimple_seq_first_stmt (seq);
- enum gimple_code code = gimple_code (stmt);
- if (code != GIMPLE_LABEL && code != GIMPLE_TRY)
+ if (gimple_code (stmt) == GIMPLE_TRY)
{
- if (code == GIMPLE_GOTO
+ /* A compiler-generated cleanup or a user-written try block.
+Try to get the first statement in its try-block, for better
+location.  */
+ if ((seq = gimple_try_eval (stmt)))
+   stmt = gimple_seq_first_stmt (seq);
+   }
+ if (gimple_code (stmt) != GIMPLE_LABEL)
+   {
+ if (gimple_code (stmt) == GIMPLE_GOTO
  && TREE_CODE (gimple_goto_dest (stmt)) == LABEL_DECL
  && DECL_ARTIFICIAL (gimple_goto_dest (stmt)))
/* Don't warn for compiler-generated gotos.  These occur
diff --git gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c 
gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c
index e69de29..c53cb10 100644
--- gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c
+++ gcc/testsuite/c-c++-common/Wswitch-unreachable-3.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+
+extern void f (int *);
+
+void
+g (int i)
+{
+  switch (i)
+{
+  int a[3];
+  __builtin_memset (a, 0, sizeof a); /* { dg-warning "statement will never 
be executed" } */
+
+default:
+  f (a);
+}
+
+  switch (i)
+{
+  int a[3];
+  int b[3];
+  int c[3];
+  b[1] = 60; /* { dg-warning "statement will never be executed" } */
+
+default:
+  f (a);
+  f (b);
+  f (c);
+}
+}
diff --git gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C 
gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C
index e69de29..99d9a83 100644
--- gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C
+++ gcc/testsuite/g++.dg/warn/Wswitch-unreachable-1.C
@@ -0,0 +1,34 @@
+// { dg-do compile }
+
+extern int j;
+
+void
+f (int i)
+{
+  switch (i) // { dg-warning "statement will never be executed" }
+{
+  try
+  {
+  }
+  catch (...)
+  {
+  }
+case 1:;
+}
+}
+
+void
+g (int i)
+{
+  switch (i)
+{
+  try
+  {
+   j = 42;  // { dg-warning "statement will never be executed" }
+  }
+  catch (...)
+  {
+  }
+case 1:;
+}
+}

Marek


Re: C PATCH for comptypes handling of TYPE_REF_CAN_ALIAS_ALL

2016-05-31 Thread Marek Polacek
Sorry, gcc-patches fell out of CC.

On Tue, May 31, 2016 at 03:14:43PM +0200, Marek Polacek wrote:
> On Thu, May 26, 2016 at 05:16:39PM +, Joseph Myers wrote:
> > On Thu, 26 May 2016, Marek Polacek wrote:
> > 
> > > The C++ FE has been changed, as a part of c++/50800, in such a way that 
> > > it no
> > > longer considers types differentiating only in TYPE_REF_CAN_ALIAS_ALL
> > > incompatible.  But the C FE still rejects the following testcase, so this 
> > > patch
> > > makes the C FE follow suit.  After all, the may_alias attribute is not
> > > considered as "affects_type_identity".  This TYPE_REF_CAN_ALIAS_ALL check 
> > > was
> > > introduced back in 2004 (r90078), but since then we've gotten rid of 
> > > them, only
> > > comptypes_internal retained it.  I suspect the TYPE_MODE check might go 
> > > too,
> > > but I don't feel like changing that right now.
> > > 
> > > This arised when discussing struct sockaddr vs. may_alias issue in glibc.
> > > 
> > > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > I'd expect you to need to do something about composite types, to ensure 
> > that the composite type has TYPE_REF_CAN_ALIAS_ALL set if either of the 
> > two types does - along with tests in such a case, where the two types are 
> > in either order, that the composite type produced really is treated as 
> > may_alias.  (The sort of cases I'm thinking of are
> > 
> > typedef int T __attribute__((may_alias));
> > extern T *p;
> > extern int *p;
> > 
> > with the declarations in either order, and then making sure that 
> > type-based aliasing handles references through this pointer properly.)
> 
> I investigated this and it turned out the problem is not in composite_types,
> but still in comptypes.  For the example above, 'T' and 'int' have different
> TYPE_MAIN_VARIANT so they were deemed incompatible.  This eventually led me
> back to  and to
> .  The conclusion
> there was that we should rely on canonical types to compare INTEGER_TYPE,
> FIXED_POINT_TYPE, and REAL_TYPE nodes -- canonical types of 'T' and 'int' are
> the same.  Since the C++ FE does exactly this, I just used the same code and
> commentary.  Now, this regressed gcc.dg/pr39464.c, but cc1plus doesn't warn
> on that code either, so that might not be a big deal.  With this patch cc1
> behaves more like cc1plus so I put the new tests into c-c++-common/.  I also
> verified that the composite types do have the may_alias property (using 
> typeof).
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-05-31  Marek Polacek  
> 
>   * c-typeck.c (comptypes_internal): Handle comparisons of
>   INTEGER_TYPE, FIXED_POINT_TYPE, and REAL_TYPE nodes.  Don't check
>   TYPE_REF_CAN_ALIAS_ALL.
> 
>   * c-c++-common/attr-may-alias-1.c: New test.
>   * c-c++-common/attr-may-alias-2.c: New test.
>   * gcc.dg/pr39464.c: Turn dg-warning into dg-bogus.
> 
> diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
> index 1520c20..a7f28cc 100644
> --- gcc/c/c-typeck.c
> +++ gcc/c/c-typeck.c
> @@ -1105,10 +1105,28 @@ comptypes_internal (const_tree type1, const_tree 
> type2, bool *enum_and_int_p,
>  
>switch (TREE_CODE (t1))
>  {
> +case INTEGER_TYPE:
> +case FIXED_POINT_TYPE:
> +case REAL_TYPE:
> +  /* With these nodes, we can't determine type equivalence by
> +  looking at what is stored in the nodes themselves, because
> +  two nodes might have different TYPE_MAIN_VARIANTs but still
> +  represent the same type.  For example, wchar_t and int could
> +  have the same properties (TYPE_PRECISION, TYPE_MIN_VALUE,
> +  TYPE_MAX_VALUE, etc.), but have different TYPE_MAIN_VARIANTs
> +  and are distinct types.  On the other hand, int and the
> +  following typedef
> +
> +typedef int INT __attribute((may_alias));
> +
> +  have identical properties, different TYPE_MAIN_VARIANTs, but
> +  represent the same type.  The canonical type system keeps
> +  track of equivalence in this case, so we fall back on it.  */
> +  return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
> +
>  case POINTER_TYPE:
> -  /* Do not remove mode or aliasing information.  */
> -  if (TYPE_MODE (t1) != TYPE_MODE (t2)
> -   || TYPE_REF_CAN_ALIAS_ALL (t1) != TYPE_REF_CAN_ALIAS_ALL (t2))
> +  /* Do not remove mode information.  */
> +  if (TYPE_MODE (t1) != TYPE_MODE (t2))
>   break;
>val = (TREE_TYPE (t1) == TREE_TYPE (t2)
>? 1 : comptypes_internal (TREE_TYPE (t1), TREE_TYPE (t2),
> diff --git gcc/testsuite/c-c++-common/attr-may-alias-1.c 
> gcc/testsuite/c-c++-common/attr-may-alias-1.c
> index e69de29..978b9a5 100644
> --- gcc/testsuite/c-c++-common/attr-may-alias-1.c
> +++ gcc/testsuite/c-c++-common/attr-may-alias-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wall" } */
> +
> 

Re: [RX] Add support for atomic operations

2016-05-31 Thread Nick Clifton
Hi Oleg,

> Sorry, but my original patch was buggy.  There are two problems:

Thanks for your diligence in checking the patch.

> The attached patch fixes those issues.
> OK for trunk?
> 
> Cheers,
> Oleg
> 
> gcc/ChangeLog:
>   * config/rx/rx.md (FETCHOP_NO_MINUS): New code iterator.
>   (atomic__fetchsi): Extract minus operator into ...
>   (atomic_sub_fetchsi): ... this new pattern.
>   (mvtc): Add CC_REG clobber.

Approved - please apply.

Cheers



Re: RFC [1/2] divmod transform

2016-05-31 Thread Prathamesh Kulkarni
On 30 May 2016 at 13:15, Richard Biener  wrote:
> On Mon, 30 May 2016, Prathamesh Kulkarni wrote:
>
>> The attached patch ICE's during bootstrap for x86_64, and is reproducible 
>> with
>> following case with -m32 -O2:
>>
>> typedef long long type;
>>
>> type f(type x, type y)
>> {
>>   type q = x / y;
>>   type r = x % y;
>>   return q + r;
>> }
>>
>> The ICE happens because the test-case hits
>> gcc_assert (unsignedp);
>> in default_expand_divmod_libfunc ().
>
> That's of course your function (and ICE).
>
>> Surprisingly, optab_libfunc (sdivmod_optab, DImode) returns optab_libfunc
>> with name "__divmoddi4" although __divmoddi4() is nowhere defined in
>> libgcc for x86.
>> (I verified that by forcing the patch to generate call to __divmoddi4,
>> which results in undefined reference to __divmoddi4).
>>
>> This happens because in optabs.def we have:
>> OPTAB_NL(sdivmod_optab, "divmod$a4", UNKNOWN, "divmod", '4', gen_int_libfunc)
>>
>> and gen_int_libfunc generates "__divmoddi4" on first call to optab_libfunc
>> and sets optab_libfunc (sdivmod_optab, DImode) to "__divmoddi4".
>> I wonder if we should remove gen_int_libfunc entry in optabs.def for
>> sdivmod_optab ?
>
> Hum, not sure - you might want to look at expand_divmod (though that
> always just computes one part of the result in the end).
As a workaround, would it be OK to check if libfunc is __udivmoddi4
if expand_divmod_libfunc is default, as in attached patch ?
This prevents ICE for the above test-case.
Bootstrap+test on x86_64 in progress.

Thanks,
Prathamesh
>
> Joseph - do you know sth about why there's not a full set of divmod
> libfuncs in libgcc?
>
> Thanks,
> Richard.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8c7f2a1..111f19f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6963,6 +6963,12 @@ This is firstly introduced on ARM/AArch64 targets, 
please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool 
@var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx 
*@var{quot}, rtx *@var{rem})
+Define this hook if the port does not have hardware div and divmod insn for
+the given mode but has divmod libfunc, which is incompatible
+with libgcc2.c:__udivmoddi4
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f963a58..2c9a800 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c867ddc..0cb59f7 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2276,6 +2276,48 @@ multi_vector_optab_supported_p (convert_optab optab, 
tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+optab_libfunc for udivmod/sdivmod.  */
+
+static void 
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for [u]divmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+{
+  quotient = gen_reg_rtx (mode);
+  remainder = gen_reg_rtx (mode);
+  expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
+}
+  else
+targetm.expand_divmod_libfunc (unsignedp, mode, op0, op1,
+  , );
+
+  /* Wrap the return value (quotient, remainder) within COMPLEX_EXPR.  */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+  make_tree (TREE_TYPE (arg0), quotient),
+  make_tree (TREE_TYPE (arg1), remainder)),
+  target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Return true if FN is supported for the types in TYPES when the
optimization type is OPT_TYPE.  The types are those associated with
the "type0" and "type1" fields of FN's direct_internal_fn_info
diff 

[C++ Patch] PR 71248

2016-05-31 Thread Paolo Carlini

Hi,

the primary issue is already fixed, we don't ICE anymore, but the 
location of the error message is (badly) incorrect, because 
check_static_variable_definition doesn't use DECL_SOURCE_LOCATION. Just 
doing that, consistently, plus a number of additional column checks in 
existing testcases amounts to most of the patch. There is a subtlety 
though: as shown by constexpr-static8.C we have been giving redundant 
diagnostics about the out-of-class definitions of the erroneous in-class 
declarations. This is particularly evident now because we would have to 
get right the location of the latter too and DECL_SOURCE_LOCATION 
doesn't help much for those. However, I think we simply want to suppress 
those messages, thus the early return at the beginning of 
check_static_variable_definition. The latter is called only by 
cp_finish_decl, thus conceivably the same logic could be easily shuffled 
around


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2016-05-31  Paolo Carlini  

PR c++/71248
* decl.c (check_static_variable_definition): Use DECL_SOURCE_LOCATION
to obtain correct locations; avoid redundant diagnostics on
out-of-class definitions.

/testsuite
2016-05-31  Paolo Carlini  

PR c++/71248
* g++.dg/cpp0x/pr71248.C: New.
* g++.dg/cpp0x/auto7.C: Test column numbers too.
* g++.dg/cpp0x/constexpr-static8.C: Likewise.
* g++.dg/init/new37.C: Likewise.
* g++.dg/template/static1.C: Likewise.
* g++.dg/template/static2.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 236910)
+++ cp/decl.c   (working copy)
@@ -8581,6 +8581,9 @@ build_ptrmem_type (tree class_type, tree member_ty
 static int
 check_static_variable_definition (tree decl, tree type)
 {
+  /* Avoid redundant diagnostics on out-of-class definitions.  */
+  if (!current_class_type || !TYPE_BEING_DEFINED (current_class_type))
+return 0;
   /* Can't check yet if we don't know the type.  */
   if (dependent_type_p (type))
 return 0;
@@ -8591,15 +8594,17 @@ check_static_variable_definition (tree decl, tree
   else if (cxx_dialect >= cxx11 && !INTEGRAL_OR_ENUMERATION_TYPE_P (type))
 {
   if (!COMPLETE_TYPE_P (type))
-   error ("in-class initialization of static data member %q#D of "
-  "incomplete type", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "in-class initialization of static data member %q#D of "
+ "incomplete type", decl);
   else if (literal_type_p (type))
-   permerror (input_location,
+   permerror (DECL_SOURCE_LOCATION (decl),
   "% needed for in-class initialization of "
   "static data member %q#D of non-integral type", decl);
   else
-   error ("in-class initialization of static data member %q#D of "
-  "non-literal type", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "in-class initialization of static data member %q#D of "
+ "non-literal type", decl);
   return 1;
 }
 
@@ -8611,17 +8616,20 @@ check_static_variable_definition (tree decl, tree
  required.  */
   if (!ARITHMETIC_TYPE_P (type) && TREE_CODE (type) != ENUMERAL_TYPE)
 {
-  error ("invalid in-class initialization of static data member "
-"of non-integral type %qT",
-type);
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "invalid in-class initialization of static data member "
+   "of non-integral type %qT",
+   type);
   return 1;
 }
   else if (!CP_TYPE_CONST_P (type))
-error ("ISO C++ forbids in-class initialization of non-const "
-  "static member %qD",
-  decl);
+error_at (DECL_SOURCE_LOCATION (decl),
+ "ISO C++ forbids in-class initialization of non-const "
+ "static member %qD",
+ decl);
   else if (!INTEGRAL_OR_ENUMERATION_TYPE_P (type))
-pedwarn (input_location, OPT_Wpedantic, "ISO C++ forbids initialization of 
member constant "
+pedwarn (DECL_SOURCE_LOCATION (decl), OPT_Wpedantic,
+"ISO C++ forbids initialization of member constant "
 "%qD of non-integral type %qT", decl, type);
 
   return 0;
Index: testsuite/g++.dg/cpp0x/auto7.C
===
--- testsuite/g++.dg/cpp0x/auto7.C  (revision 236910)
+++ testsuite/g++.dg/cpp0x/auto7.C  (working copy)
@@ -7,7 +7,7 @@ auto j; // { dg-error "has no initializer" }
 
 template struct A
 {
-  static auto k = 7;   // { dg-error "non-const" }
+  static auto k = 7;   // { dg-error "15:ISO C\\+\\+ forbids" }
   static auto l;   // { dg-error "has no initializer" }
   auto m;  // { dg-error "non-static data member declared" }
 };
Index: 

[PATCH] Fix 416.gamess miscompare (PR71311)

2016-05-31 Thread Richard Biener

The following patch adds missing patterns with swapped comparison
operands for the @0 < @1 and @0 < @2 to @0 < min (@1, @2) transform.
This nullifies the difference gimplify-into-SSA made on
rhfuhf.F:ROFOCK which causes the function no longer to be miscompiled
(by vectorization eventually, -fno-tree-vectorize also fixes the
miscompare at r235817).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I didn't attempt to understand the miscompile or create an executable
testcase.

Richard.

2016-05-31  Richard Biener  

PR tree-optimization/71311
* match.pd (@0 < @1 and @0 < @2 to @0 < min (@1, @2)): Add
cases with swapped comparison operands.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 235817)
+++ gcc/match.pd(working copy)
@@ -3179,6 +3179,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_and (op:s @0 @1) (op:s @0 @2))
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
(op @0 (ext @1 @2)
+/* Transform (@1 < @0 and @2 < @0) to use max,
+   (@1 > @0 and @2 > @0) to use min */
+(for op (lt le gt ge)
+ ext (max max min min)
+ (simplify
+  (bit_and (op:s @1 @0) (op:s @2 @0))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   (op (ext @1 @2) @0
+/* Transform (@0 < @1 and @2 > @0) to use min, 
+   (@0 > @1 and @2 < @0) to use max */
+(for op (lt le gt ge)
+ ops (gt ge lt le)
+ ext (min min max max)
+ (simplify
+  (bit_and:c (op:s @0 @1) (ops:s @2 @0))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   (op @0 (ext @1 @2)
 
 (simplify
  /* signbit(x) -> 0 if x is nonnegative.  */


Re: [PATCH][ARM] Use proper output modifier for DImode register in store exclusive patterns

2016-05-31 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 19/05/16 14:27, Kyrill Tkachov wrote:

Ping.

Thanks,
Kyrill

On 09/05/16 12:13, Kyrill Tkachov wrote:

Hi all,

This seems to have fallen through the cracks.
Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00558.html

Thanks,
Kyrill

On 09/03/16 12:56, Kyrill Tkachov wrote:

Hi all,

I notice that the output code for our store exclusive patterns accesses 
unallocated memory.
It wants to output an strexd instruction with a pair of consecutive registers 
corresponding
to a DImode value. For that it creates the SImode top half of the DImode 
register and puts it
into operands[3]. But the pattern only defines entries only up to operands[2], 
with no match_dup 3
or like that, so operands[3] should technically be out of bounds.

We already have a mechanism for printing the top half of a DImode register, 
that's the 'H' output modifier.
So this patch changes those patterns to use that, eliminating the out of bounds 
access and making
the code a bit simpler as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-03-09  Kyrylo Tkachov  

* config/arm/sync.md (arm_store_exclusive):
Use 'H' output modifier on operands[2] rather than creating a new
entry in out-of-bounds memory of the operands array.
(arm_store_release_exclusivedi): Likewise.








[PATCH] Fix PR71352

2016-05-31 Thread Richard Biener

The following fixes another fallout of the negate multiply reassoc patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-05-31  Richard Biener  

PR tree-optimization/71352
* tree-ssa-reassoc.c (zero_one_operation): Handle op equal to
minus one and a negate.

* gcc.dg/tree-ssa/reassoc-45.c: New testcase.

Index: gcc/tree-ssa-reassoc.c
===
*** gcc/tree-ssa-reassoc.c  (revision 236877)
--- gcc/tree-ssa-reassoc.c  (working copy)
*** zero_one_operation (tree *def, enum tree
*** 1199,1209 
propagate_op_to_single_use (op, stmt, def);
  return;
}
! else if (gimple_assign_rhs_code (stmt) == NEGATE_EXPR
!  && gimple_assign_rhs1 (stmt) == op)
{
! propagate_op_to_single_use (op, stmt, def);
! return;
}
}
  
--- 1199,1218 
propagate_op_to_single_use (op, stmt, def);
  return;
}
! else if (gimple_assign_rhs_code (stmt) == NEGATE_EXPR)
{
! if (gimple_assign_rhs1 (stmt) == op)
!   {
! propagate_op_to_single_use (op, stmt, def);
! return;
!   }
! else if (integer_minus_onep (op)
!  || real_minus_onep (op))
!   {
! gimple_assign_set_rhs_code
!   (stmt, TREE_CODE (gimple_assign_rhs1 (stmt)));
! return;
!   }
}
}
  
*** zero_one_operation (tree *def, enum tree
*** 1238,1248 
  return;
}
  else if (is_gimple_assign (stmt2)
!  && gimple_assign_rhs_code (stmt2) == NEGATE_EXPR
!  && gimple_assign_rhs1 (stmt2) == op)
{
! propagate_op_to_single_use (op, stmt2, def);
! return;
}
}
  
--- 1247,1266 
  return;
}
  else if (is_gimple_assign (stmt2)
!  && gimple_assign_rhs_code (stmt2) == NEGATE_EXPR)
{
! if (gimple_assign_rhs1 (stmt2) == op)
!   {
! propagate_op_to_single_use (op, stmt2, def);
! return;
!   }
! else if (integer_minus_onep (op)
!  || real_minus_onep (op))
!   {
! gimple_assign_set_rhs_code
!   (stmt2, TREE_CODE (gimple_assign_rhs1 (stmt2)));
! return;
!   }
}
}
  
Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-45.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/reassoc-45.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-45.c  (working copy)
***
*** 0 
--- 1,15 
+ /* PR/71352 */
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-reassoc1" } */
+ 
+ unsigned a, b, c, d, e;
+ 
+ void
+ fn1 ()
+ {
+   unsigned f;
+   e = f = d * -b + a * -c;
+ }
+ 
+ /* Check that we factor -1 and create -(d * b + a * c).  */
+ /* { dg-final { scan-tree-dump-times " = -" 1 "reassoc1" } } */


Re: [PATCH][i386] Add -march=native support for VIA nano CPUs

2016-05-31 Thread J. Mayer
On Tue, 2016-05-31 at 13:38 +0200, Uros Bizjak wrote:
> On Mon, May 30, 2016 at 12:09 AM, J. Mayer  wrote:
> > 
> > Hello,
> > 
> > On Sun, 2016-05-29 at 21:12 +0200, Uros Bizjak wrote:
> > > 
> > > Hello!
> > > 
> > > > 
> > > > 
> > > > When trying to compile using -march=native on a VIA nano CPU,
> > > > gcc
> > > > selects "-march=core2" "-mtune=i386" then is unable to compile,
> > > > as
> > > > this
> > > > creates a conflicts between 32 bits and 64 bits compilation
> > > > modes,
> > > > as
> > > > show by the following test:
> > > [...]
> > > 
> > > > 
> > > > 
> > > > --- gcc/config/i386/driver-i386.c.origÂÂ2015-02-02
> > > > 05:20:49.0
> > > > +0100
> > > > +++ gcc/config/i386/driver-i386.cÂÂÂ2015-08-23
> > > > 01:11:03.0
> > > > +0200
> > > > @@ -601,15 +601,20 @@
> > > > ÂÂswitch (family)
> > > > {
> > > > case 6:
> > > > -Âif (model > 9)
> > > The patch was corrupted by your mailer. But - can you please open
> > > a
> > > bugreport, and refer reposted patch to this bugreport? This way,
> > > the
> > > problem (and the patch) won't get forgotten.
> > > 
> > > Uros.
> > > 
> > Sorry for that, might be because of UTF-8 encoding.
> > I already opened a bug many monthes ago, ID 67310:
> > 
> > I just updated the patch against current git repository, the only
> > difference between previous versions are diff offsets.
> Can you please test the patch that is attached to the bugreport?
> 
> Thanks,
> Uros.

Yes, I've seen your patch and downloaded it, I'm now recompiling gcc
from patched git repository; I'll be busy in the hours to come, but
I'll make the test as soon as I can.

Jocelyn



Re: Do not imply -fweb with -fpeel-loops

2016-05-31 Thread Jan Hubicka
> > I would say we should handle -frename-registers same way as we do 
> > -fschedule-insns.
> > I.e. enable/disable it on target basis rather than budnle it to loop 
> > unrolling.
> > But that is for future patch.  I will commit this and propose patch making 
> > -fpeel-loops
> > independent of rename-registers next
> 
> Ok.  Does -fweb still do what its documentation says (enable register
> allocation to
> work on pseudos directly)?  Given its downside and strong GIMPLE
> optimizations maybe
> it is time to remove it?

I will wait with comitting the renaming patch and we will see the effect with
-fno-unroll-loops on ia64 tester (it probably pays back more on in-order
target) and we can test effect with -funroll-loops by patching it tomorrow.

I guess IRA does live range pslitting by itself.  Register allocator is not the
only pass which does not like reuse of pseudo.  Web originally also made
-fschedule-insns (not -fschedule-insns2) and CSE to work harder.

In tree-SSA world most of reuse is avoided by out-of-ssa pass. Random reuse
these days is limited to loop unrolling and RTL expansion I think. Without
unroling webizer renames 204 pseudos on tramp3d that seems close to 0. I can
try to dig into that. With -funroll-all-loops it renames 17309 registers.
Of course it would be possible to write unroller specific renamer, but it would
hardly be any wasier than webizer.

Honza


Re: [PATCH][i386] Add -march=native support for VIA nano CPUs

2016-05-31 Thread Uros Bizjak
On Mon, May 30, 2016 at 12:09 AM, J. Mayer  wrote:
> Hello,
>
> On Sun, 2016-05-29 at 21:12 +0200, Uros Bizjak wrote:
>> Hello!
>>
>> >
>> > When trying to compile using -march=native on a VIA nano CPU, gcc
>> > selects "-march=core2" "-mtune=i386" then is unable to compile, as
>> > this
>> > creates a conflicts between 32 bits and 64 bits compilation modes,
>> > as
>> > show by the following test:
>> [...]
>>
>> >
>> > --- gcc/config/i386/driver-i386.c.origÂÂ2015-02-02
>> > 05:20:49.0
>> > +0100
>> > +++ gcc/config/i386/driver-i386.cÂÂÂ2015-08-23
>> > 01:11:03.0
>> > +0200
>> > @@ -601,15 +601,20 @@
>> > ÂÂswitch (family)
>> > {
>> > case 6:
>> > -Âif (model > 9)
>> The patch was corrupted by your mailer. But - can you please open a
>> bugreport, and refer reposted patch to this bugreport? This way, the
>> problem (and the patch) won't get forgotten.
>>
>> Uros.
>>
>
> Sorry for that, might be because of UTF-8 encoding.
> I already opened a bug many monthes ago, ID 67310:
> 
> I just updated the patch against current git repository, the only
> difference between previous versions are diff offsets.

Can you please test the patch that is attached to the bugreport?

Thanks,
Uros.


Re: [patch] Fix segfault in vectorizer

2016-05-31 Thread Ilya Enkovich
2016-05-31 12:25 GMT+03:00 Richard Biener :
> On Tue, May 31, 2016 at 10:22 AM, Eric Botcazou  wrote:
>> Hi,
>>
>> it's a regression present on the mainline and 6 branch: for the attached Ada
>> testcase, optab_for_tree_code segfaults because the function is invoked on a
>> NULL_TREE vectype_out from the vectorizer (vectorizable_reduction):
>>
>>   if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == TREE_CODE_REDUCTION
>>   || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>> == INTEGER_INDUC_COND_REDUCTION)
>> {
>>   if (reduction_code_for_scalar_code (orig_code, _reduc_code))
>> {
>>   reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
>>  optab_default);
>>
>> Naive attempts at working around the NULL_TREE result in bogus vectorization
>> and GIMPLE verification failure so it seems clear that vectype_out ought not
>> to be NULL_TREE here.
>>
>> The problems stems from vect_determine_vectorization_factor, namely:
>>
>>   /* Bool ops don't participate in vectorization factor
>>  computation.  For comparison use compared types to
>>  compute a factor.  */
>>   if (TREE_CODE (scalar_type) == BOOLEAN_TYPE
>>   && is_gimple_assign (stmt)
>>   && gimple_assign_rhs_code (stmt) != COND_EXPR)
>> {
>>   if (STMT_VINFO_RELEVANT_P (stmt_info))
>> mask_producers.safe_push (stmt_info);
>>   bool_result = true;
>>
>>   if (gimple_code (stmt) == GIMPLE_ASSIGN
>>   && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
>>  == tcc_comparison
>>   && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt)))
>>  != BOOLEAN_TYPE)
>> scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
>>   else
>> {
>>  if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
>> {
>>   pattern_def_seq = NULL;
>>   gsi_next ();
>> }
>>   continue;
>> }
>> }
>>
>> which leaves STMT_VINFO_VECTYPE set to NULL_TREE.  It would have been nice to
>> say in the comment why boolean operations don't participate in vectorization
>> factor computation; one can only assume that it's because they are somehow
>> treated specially, so the proposed fix does the same in 
>> vectorizable_reduction
>> (and is probably a no-op except for Ada because of the precision test).
>
> Note that vect_determine_vectorization_factor is supposed to set the
> vector type on all
> stmts.  That it doesn't is a bug.  Do you run into the else branch?  I
> think that should
> only trigger with STMT_VINFO_RELEVANT_P and thus the stmt in mask_producers
> which should later get post-processed and have the VECTYPE set.
>
> But maybe Ilya can shed some more light on this (awkward) code.

This code appears when we try to disable boolean patterns.  Boolean patterns
replace booleans with integers of proper size which allow us to simply determine
vectype using get_vectype_for_scalar_type.  With no such replacement we
can't determine vectype just out of a scalar type (there are multiple possible
options and get_vectype_for_scalar_type use would result in a lot of redundant
conversions).  So we delay vectype computation for them and compute it basing on
vectypes computed for their arguments.

Surely any missing vectype for a statement due to this delay is a bug.  I didn't
notice STMT_VINFO_LIVE_P should be checked in addition to STMT_VINFO_RELEVANT_P.

Thanks,
Ilya

>
> Richard.
>
>> Tested on x86_64-suse-linux, OK for mainline and 6 branch?
>>
>>
>> 2016-05-31  Eric Botcazou  
>>
>> * tree-vect-loop.c (vectorizable_reduction): Also return false if the
>> scalar type is a boolean type.
>>
>>
>> 2016-05-31  Eric Botcazou  
>>
>> * gnat.dg/opt56.ad[sb]: New test.
>>
>> --
>> Eric Botcazou


Re: Do not imply -fweb with -fpeel-loops

2016-05-31 Thread Richard Biener
On Tue, May 31, 2016 at 12:49 PM, Jan Hubicka  wrote:
> Hi,
> enabling -fpeel-loops with -O3 and -Ofast had unexpected effect of enabling
> -fweb and -frename-registers.  This patch drop -fweb from list of passes that
> are implied by -fpeel-loops because it is no longer needed since we do peeling
> at tree instead of RTL and the out-of-ssa pass cares about this.
>
> I will look into rename registers separately. My understanding is that rename
> registers is not really that specific for -funroll-loops or -fpeel-loops.
> Loop unrolling may make it bit more useful because it increases expected basic
> block size and thus enables scheduler to do more.
>
> Rather it depends on target whether register renaming is win. On out of order 
> targets
> where scheduling is not that imprtant it does not seem to be worth the 
> compile time
> and code size effects, while on targets that depends on scheduling it does.
> Tonight tests on x86 shows improvmeents in botan which I think are related to 
> renaming
> (because they also improved when Bern temporarily enabled it few days back),
> measurable code size regression (which is probably fixable if we make 
> rename-registers
> to not increase instruction encoding size) and noticeable compile time 
> slowdown
> (not sure if it is from regrename itself that is quite easy pass or from the 
> fact
> that scheduler has more freedom and thus work harder).
>
> On the other hand both SPECint and SPECfp improved by about 2% on Itanium and
> code size reduced.  I think similar effect should happen on superscalar 
> inorder
> tarets with constant size of instruction encoding.
>
> I would say we should handle -frename-registers same way as we do 
> -fschedule-insns.
> I.e. enable/disable it on target basis rather than budnle it to loop 
> unrolling.
> But that is for future patch.  I will commit this and propose patch making 
> -fpeel-loops
> independent of rename-registers next

Ok.  Does -fweb still do what its documentation says (enable register
allocation to
work on pseudos directly)?  Given its downside and strong GIMPLE
optimizations maybe
it is time to remove it?

Thanks,
Richard.

> Honza
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>
> * loop-init.c (gate): Do not enale RTL loop unroller with 
> -fpeel-loops.
> It no longer does that.
> * toplev.c (process_options): Do not enable flag_web with 
> -fpeel-loops.
> Index: loop-init.c
> ===
> --- loop-init.c (revision 236894)
> +++ loop-init.c (working copy)
> @@ -560,7 +560,7 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
> +  return (flag_unroll_loops || flag_unroll_all_loops);
>  }
>
>virtual unsigned int execute (function *);
> Index: toplev.c
> ===
> --- toplev.c(revision 236894)
> +++ toplev.c(working copy)
> @@ -1296,7 +1296,7 @@ process_options (void)
>
>/* web and rename-registers help when run after loop unrolling.  */
>if (flag_web == AUTODETECT_VALUE)
> -flag_web = flag_unroll_loops || flag_peel_loops;
> +flag_web = flag_unroll_loops;
>
>if (flag_rename_registers == AUTODETECT_VALUE)
>  flag_rename_registers = flag_unroll_loops || flag_peel_loops;


Re: Do not imply -frename-registers by -fpeel-loops

2016-05-31 Thread Richard Biener
On Tue, 31 May 2016, Jan Hubicka wrote:

> Hi,
> this patch makes -frename-registers independent with -fpeel-loops.
> I think the original idea for building this in was the fact that loop peeling 
> creates
> large basic blocks where scheduling matters. This is no longer true about 
> -fpeel-loops because
> this happens only for complete peeling which has been moved to separate pass 
> long time ago.
> 
> Bootsrapping/regtesting x86_64-linux, OK?

Ok.

Thanks,
Richard.

>   * doc/invoke.texi (-frename-registers): Drop -fpeel-loops from list
>   of flags impliying the register renaming.
>   * toplev.c (process_options): Do not imply flag_rename_registers with
>   loop peeling.
> 
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi   (revision 236914)
> +++ doc/invoke.texi   (working copy)
> @@ -8623,7 +8623,7 @@ debug information format adopted by the
>  make debugging impossible, since variables no longer stay in
>  a ``home register''.
>  
> -Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}.
> +Enabled by default with @option{-funroll-loops}.
>  
>  @item -fschedule-fusion
>  @opindex fschedule-fusion
> Index: toplev.c
> ===
> --- toplev.c  (revision 236915)
> +++ toplev.c  (working copy)
> @@ -1299,7 +1299,7 @@ process_options (void)
>  flag_web = flag_unroll_loops;
>  
>if (flag_rename_registers == AUTODETECT_VALUE)
> -flag_rename_registers = flag_unroll_loops || flag_peel_loops;
> +flag_rename_registers = flag_unroll_loops;
>  
>if (flag_non_call_exceptions)
>  flag_asynchronous_unwind_tables = 1;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Do not imply -frename-registers by -fpeel-loops

2016-05-31 Thread Jan Hubicka
Hi,
this patch makes -frename-registers independent with -fpeel-loops.
I think the original idea for building this in was the fact that loop peeling 
creates
large basic blocks where scheduling matters. This is no longer true about 
-fpeel-loops because
this happens only for complete peeling which has been moved to separate pass 
long time ago.

Bootsrapping/regtesting x86_64-linux, OK?

* doc/invoke.texi (-frename-registers): Drop -fpeel-loops from list
of flags impliying the register renaming.
* toplev.c (process_options): Do not imply flag_rename_registers with
loop peeling.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236914)
+++ doc/invoke.texi (working copy)
@@ -8623,7 +8623,7 @@ debug information format adopted by the
 make debugging impossible, since variables no longer stay in
 a ``home register''.
 
-Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}.
+Enabled by default with @option{-funroll-loops}.
 
 @item -fschedule-fusion
 @opindex fschedule-fusion
Index: toplev.c
===
--- toplev.c(revision 236915)
+++ toplev.c(working copy)
@@ -1299,7 +1299,7 @@ process_options (void)
 flag_web = flag_unroll_loops;
 
   if (flag_rename_registers == AUTODETECT_VALUE)
-flag_rename_registers = flag_unroll_loops || flag_peel_loops;
+flag_rename_registers = flag_unroll_loops;
 
   if (flag_non_call_exceptions)
 flag_asynchronous_unwind_tables = 1;


Do not imply -fweb with -fpeel-loops

2016-05-31 Thread Jan Hubicka
Hi,
enabling -fpeel-loops with -O3 and -Ofast had unexpected effect of enabling
-fweb and -frename-registers.  This patch drop -fweb from list of passes that
are implied by -fpeel-loops because it is no longer needed since we do peeling
at tree instead of RTL and the out-of-ssa pass cares about this.

I will look into rename registers separately. My understanding is that rename
registers is not really that specific for -funroll-loops or -fpeel-loops.
Loop unrolling may make it bit more useful because it increases expected basic
block size and thus enables scheduler to do more.

Rather it depends on target whether register renaming is win. On out of order 
targets
where scheduling is not that imprtant it does not seem to be worth the compile 
time
and code size effects, while on targets that depends on scheduling it does.
Tonight tests on x86 shows improvmeents in botan which I think are related to 
renaming
(because they also improved when Bern temporarily enabled it few days back),
measurable code size regression (which is probably fixable if we make 
rename-registers
to not increase instruction encoding size) and noticeable compile time slowdown
(not sure if it is from regrename itself that is quite easy pass or from the 
fact
that scheduler has more freedom and thus work harder).

On the other hand both SPECint and SPECfp improved by about 2% on Itanium and
code size reduced.  I think similar effect should happen on superscalar inorder
tarets with constant size of instruction encoding.

I would say we should handle -frename-registers same way as we do 
-fschedule-insns.
I.e. enable/disable it on target basis rather than budnle it to loop unrolling.
But that is for future patch.  I will commit this and propose patch making 
-fpeel-loops
independent of rename-registers next

Honza

Bootstrapped/regtested x86_64-linux, will commit it shortly.

* loop-init.c (gate): Do not enale RTL loop unroller with -fpeel-loops.
It no longer does that.
* toplev.c (process_options): Do not enable flag_web with -fpeel-loops.
Index: loop-init.c
===
--- loop-init.c (revision 236894)
+++ loop-init.c (working copy)
@@ -560,7 +560,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
 {
-  return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
+  return (flag_unroll_loops || flag_unroll_all_loops);
 }
 
   virtual unsigned int execute (function *);
Index: toplev.c
===
--- toplev.c(revision 236894)
+++ toplev.c(working copy)
@@ -1296,7 +1296,7 @@ process_options (void)
 
   /* web and rename-registers help when run after loop unrolling.  */
   if (flag_web == AUTODETECT_VALUE)
-flag_web = flag_unroll_loops || flag_peel_loops;
+flag_web = flag_unroll_loops;
 
   if (flag_rename_registers == AUTODETECT_VALUE)
 flag_rename_registers = flag_unroll_loops || flag_peel_loops;


Re: [PATCH] AARCH64: Remove spurious attribute __unused__ from NEON intrinsic

2016-05-31 Thread James Greenhalgh
On Mon, Apr 25, 2016 at 05:47:57PM +0100, James Greenhalgh wrote:
> On Mon, Apr 25, 2016 at 05:39:45PM +0200, Wladimir J. van der Laan wrote:
> > 
> > Thanks for the info with regard to contributing,
> > 
> > On Fri, Apr 22, 2016 at 09:40:11AM +0100, James Greenhalgh wrote:
> > > This patch will need a ChangeLog entry [1], please draft one that I can
> > > use when I apply the patch.
> > 
> > * gcc/config/aarch64/arm_neon.h: Remove spurious attribute __unused__ from 
> > parameter of vdupb_laneq_s intrinsic
> 
> Close... This should look like:
> 
> 2016-04-25  Wladimir J. van der Laan  
> 
>   * config/aarch64/arm_neon.h (vdupb_laneq_s): Remove spurious
>   attribute __unused__
> 
> Can you confirm that this is how you want your name and email address
> to appear in the ChangeLog. If so, I'll commit the patch for you.

I've been out-of-office for a while. While I was away, Wladimir contacted
me off-list to confirm that this was correct.

I've committed the patch as revision r236914.

Thanks for the patch Wladimir!

James



Re: Remove word_mode hack for split bitfields

2016-05-31 Thread Bernd Schmidt

On 05/26/2016 04:36 PM, Richard Sandiford wrote:

This patch is effectively reverting a change from 1994.  The reason
I think it's a hack is that store_bit_field_1 is creating a subreg
reference to one word of a field even though it has already proven that
the field spills into the following word.  We then rely on the special
SUBREG handling in store_split_bit_field to ignore the extent of op0 and
look inside the SUBREG_REG regardless.  I don't see any reason why we can't
pass the original op0 to store_split_bit_field instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


Any observable effects on code generation?


Bernd



Re: [patch] Fix segfault in vectorizer

2016-05-31 Thread Eric Botcazou
> I recall that some STMT_VINFO_RELEVANT_P checks have a ||
> STMT_VINFO_DEF_TYPE () == vect_reduction_def
> or VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE ()).  

It's rather || STMT_VINFO_LIVE_P in most cases and this works here, the 
vectorization is properly blocked:

opt56.adb:9:29: note: not vectorized: different sized masks types in 
statement, vector(16) unsigned char and vector(4) 
opt56.adb:9:29: note: can't determine vectorization factor.
opt56.adb:6:4: note: vectorized 0 loops in function.

-- 
Eric Botcazou


Re: [PATCH, ARM] Do not set ARM_ARCH_ISA_THUMB for armv5

2016-05-31 Thread Thomas Preudhomme
On Friday 27 May 2016 14:01:22 Kyrill Tkachov wrote:
> On 27/05/16 13:51, Thomas Preudhomme wrote:
> > On Tuesday 24 May 2016 18:00:27 Kyrill Tkachov wrote:
> >> Hi Thomas,
> > 
> > Hi Kyrill,
> > 
> >>> +/* Nonzero if chip supports Thumb.  */
> >>> +extern int arm_arch_thumb;
> >>> +
> >> 
> >> Bit of bikeshedding really, but I think a better name would be
> >> arm_arch_thumb1.
> >> This is because we also have the macros TARGET_THUMB and TARGET_THUMB2
> >> where TARGET_THUMB2 means either Thumb-1 or Thumb-2 and a casual reader
> >> might think that arm_arch_thumb means that there is support for either.
> > 
> > Fixed.
> > 
> >> Also, please add a simple test that compiles something with -march=armv5
> >> (plus -marm) and checks that __ARM_ARCH_ISA_THUMB is not defined.
> > 
> > Fixed too.
> > 
> > Please find the updated in attachment. ChangeLog entries are now:
> > 
> > *** gcc/ChangeLog ***
> > 
> > 2016-05-26  Thomas Preud'homme  
> > 
> >  * config/arm/arm-protos.h (arm_arch_thumb1): Declare.
> >  * config/arm/arm.c (arm_arch_thumb1): Define.
> >  (arm_option_override): Initialize arm_arch_thumb1.
> >  * config/arm/arm.h (arm_arch_thumb1): Declare.
> >  (TARGET_ARM_ARCH_ISA_THUMB): Use arm_arch_thumb to determine if
> >  target
> >  support Thumb-1 ISA.
> > 
> > *** gcc/testsuite/ChangeLog ***
> > 
> > 2016-05-26  Thomas Preud'homme  
> > 
> >  * gcc.target/arm/armv5_thumb_isa.c: New test.
> 
> Ok.
> Thanks,
> Kyrill

Committed.

Best regards,

Thomas


Re: [patch] Fix segfault in vectorizer

2016-05-31 Thread Richard Biener
On Tue, May 31, 2016 at 11:46 AM, Eric Botcazou  wrote:
>> Note that vect_determine_vectorization_factor is supposed to set the
>> vector type on all
>> stmts.  That it doesn't is a bug.  Do you run into the else branch?
>
> Yes, for
>
>   result_15 = _6 & result_3;
>
> wich is a BIT_AND_EXPR, hence accepted by vectorizable_reduction.
>
>> I think that should only trigger with STMT_VINFO_RELEVANT_P and thus the
>> stmt in mask_producers which should later get post-processed and have the
>> VECTYPE set.
>
> OK, STMT_VINFO_RELEVANT_P is indeed not set:
>
> (gdb) p *stmt_info
> $4 = {type = undef_vec_info_type, live = true, in_pattern_p = false,
>   stmt = 0x768f5738, vinfo = 0x2e14820, vectype = 0x0,
>   vectorized_stmt = 0x0, data_ref_info = 0x0, dr_base_address = 0x0,
>   dr_init = 0x0, dr_offset = 0x0, dr_step = 0x0, dr_aligned_to = 0x0,
>   loop_phi_evolution_base_unchanged = 0x0, loop_phi_evolution_part = 0x0,
>   related_stmt = 0x0, pattern_def_seq = 0x0, same_align_refs = {m_vec = 0x0},
>   simd_clone_info = {m_vec = 0x0}, def_type = vect_reduction_def,
>   slp_type = loop_vect, first_element = 0x0, next_element = 0x0,
>   same_dr_stmt = 0x0, size = 0, store_count = 0, gap = 0, min_neg_dist = 0,
>   relevant = vect_unused_in_scope, vectorizable = true,
>   gather_scatter_p = false, strided_p = false, simd_lane_access_p = false,
>   v_reduc_type = TREE_CODE_REDUCTION, num_slp_uses = 0}

I recall that some STMT_VINFO_RELEVANT_P checks have a ||
STMT_VINFO_DEF_TYPE () == vect_reduction_def
or VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE ()).  Maybe that is missing here.

Richard.

> --
> Eric Botcazou


Re: [PATCH][AArch64] Use aarch64_fusion_enabled_p to check for insn fusion capabilities

2016-05-31 Thread James Greenhalgh
On Fri, May 27, 2016 at 06:10:42PM -0500, Evandro Menezes wrote:
> On 05/27/16 11:59, Kyrill Tkachov wrote:
> >Hi all,
> >
> >This patch is a small cleanup that uses the newly introduced
> >aarch64_fusion_enabled_p predicate
> >to check for what fusion opportunities are enabled for the current
> >target.
> >
> >Tested on aarch64-none-elf.
> >
> >Ok for trunk?
> >
> >Thanks,
> >Kyrill
> >
> >2016-05-27  Kyrylo Tkachov  
> >
> >* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Use
> >aarch64_fusion_enabled_p to check for fusion capabilities.
> 
> LGTM

And me.

OK.

Thanks,
James



Re: [PATCH][AArch64] Remove aarch64_simd_attr_length_move

2016-05-31 Thread James Greenhalgh
On Fri, May 27, 2016 at 05:57:17PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> I notice that we can do without aarch64_simd_attr_length_move. The move 
> alternatives for
> the OI,CI,XImode modes that involve memory operands all use a single 
> load/store so are always
> length 4, whereas the register-to-register moves have a statically-known 
> length of
> (GET_MODE_BITSIZE (mode) / 128) * 4, i.e. 4 bytes for every 128-bit SIMD move 
> instruction.
> This is already encoded in the insn_count mode attribute so use that when 
> needed.
> 
> That way we avoid a call to recog and a switch statement just to get the 
> length of an insn.
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

OK.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2016-05-27  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (aarch64_simd_attr_length_move): Delete.
> * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_move):
> Delete prototype.
> * config/aarch64/iterators.md (insn_count): Add descriptive comment.
> * config/aarch64/aarch64-simd.md (*aarch64_mov, VSTRUCT modes):
> Remove use of aarch64_simd_attr_length_move, set length attribute
> directly.
> (*aarch64_be_movoi): Likewise.
> (*aarch64_be_movci): Likewise.
> (*aarch64_be_movxi): Likewise.



Re: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-05-31 Thread James Greenhalgh
On Fri, May 27, 2016 at 02:50:15PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> As mentioned in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00297.html,
> frename-registers registers can be beneficial for aarch64 and the patch at
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01618.html resolves the
> AESE/AESMC fusion issue that it exposed in the aarch64 backend. So this patch
> enables the pass for aarch64 at -O2 and above.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

Not without some more detail on the situations in which this is a likely
gain for AArch64 and whether this expected improvement is generic enough
to enable it for all AArch64 targets and worth the compile time trade-off.

As you're proposing to have this on by default, I'd like to give a chance
to hear whether there is consensus as to this being the right choice for
the thunderx, xgene1, exynos-m1 and qdf24xx subtargets.

My expectation is that the slight code size decrease is likely to be a good
thing everywhere, and the breaking of register dependency chains may give
better scheduling for our cores with detailed scheduling models. So we're
unlikely to cause any performance worries, and may get a slight uplift in
a few places. Any data confirming that would be useful. I'd also like to see
some estimates for the impact on compilation time (we are already running
one regrename pass when targeting Cortex-A57).

If we can't reach consensus, then the next best approach is to add this as
a new option to AARCH64_EXTRA_TUNING_OPTION and enable it as appropriate
for subtargets - you might find that this is not worthwhile for the small
uplift Ramana and Wilco measured.

I've widened the CC list for you, let's wait a week for comments and make
a decision then.

Thanks,
James

> P.S. Why is the table holding this information called 
> aarch_option_optimization_table rather than
> aarch64_option_optimization_table ?
> 
> 2016-05-27  Kyrylo Tkachov  
> 
> * common/config/aarch64/aarch64-common.c
> (aarch64_option_optimization_table): Enable -frename-registers at
> -O2 and higher.

> diff --git a/gcc/common/config/aarch64/aarch64-common.c 
> b/gcc/common/config/aarch64/aarch64-common.c
> index 
> 08e795934207d015d9fa22c3822930af4a21c93a..91801df731471f1842802370497e498fda62098a
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.c
> +++ b/gcc/common/config/aarch64/aarch64-common.c
> @@ -50,6 +50,8 @@ static const struct default_options 
> aarch_option_optimization_table[] =
>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>  /* Enable redundant extension instructions removal at -O2 and higher.  */
>  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> +/* Enable the register renaming pass at -O2 and higher.  */
> +{ OPT_LEVELS_2_PLUS, OPT_frename_registers, NULL, 1 },
>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>};
>  



Re: [PATCH, OpenACC] Make reduction arguments addressable

2016-05-31 Thread Chung-Lin Tang
On 2016/5/31 3:28 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 30 May 2016 18:53:41 +0200, Jakub Jelinek  wrote:
>> On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote:
>>> Hi, a previous patch of Cesar's has made the middle-end omp-lowering
>>> automatically create and insert a tofrom (i.e. present_or_copy) map for
>>> parallel reductions.  This allowed the user to not need explicit
>>> clauses to copy out the reduction result, but because reduction arguments
>>> are not marked addressable, async does not work as expected,
>>> i.e. the asynchronous copy-out results are not used in the compiler 
>>> generated code.
>>
>> If you need it only for async parallel/kernels? regions, can't you do that
>> only for those and not for others?

That is achievable, but not in line with how we currently treat all other
data clause OMP_CLAUSE_MAPs, which are all marked addressable. Is this special
case handling really better here?

> Also, please add comments to the source code to document the need for
> such special handling.
> 
>>> This patch fixes this in the front-ends, I've tested this patch without
>>> new regressions, and fixes some C++ OpenACC tests that regressed after
>>> my last OpenACC async patch.  Is this okay for trunk?
>>
>> Testcases in the testsuite or others?  If the latter, we should add them.
> 
> The r236772 commit "[PATCH, libgomp] Rewire OpenACC async",
> 
> regressed (or, triggered/exposed the existing wrong behavior?)
> libgomp.oacc-c++/template-reduction.C execution testing for nvptx
> offloading.  (Please always send email about such known regressions, and
> XFAIL them with your commit -- that would have saved me an hour
> yesterday, when I bisected recent changes to figure out why that test
> suddenly fails.)

Sorry, Thomas. I was going to quickly send this follow-up patch, so glossed 
over XFAILing.

> For reference, here is a test case, a reduced C version of
> libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.  This test case
> works (without Chung-Lin's "[PATCH, OpenACC] Make reduction arguments
> addressable" patch) if I enable "POCs", which surprises me a bit, because
> I thought after Cesar's recent changes, the gimplifier is doing the same
> thing of adding a data clause next to the reduction clause.  Probably
> it's not doing the exactly same thing, though.  Should it?  Cesar, do you
> have any comments on this?  For example (just guessing), should
> TREE_ADDRESSABLE be set where the gimplifier does its work, instead of in
> the three front ends?

There's really nothing wrong about Cesar's patch. The marking addressable needs
to be done earlier, or it may be too late during gimplification. I already
tried to fix this in gimplify.c before, but didn't completely work.

I'll add more testcases for this before I commit any final patches.

Thanks,
Chung-Lin

> 
> // Reduced C version of 
> libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.
> 
> const int n = 100;
> 
> // Check present and async and an explicit firstprivate
> 
> int
> async_sum (int c)
> {
>   int s = 0;
> 
> #define POCs //present_or_copy(s)
> #pragma acc parallel loop num_gangs (10) gang reduction (+:s) POCs 
> firstprivate (c) async
>   for (int i = 0; i < n; i++)
> s += i+c;
> 
> #pragma acc wait
> 
>   return s;
> }
> 
> int
> main()
> {
>   int result = 0;
> 
>   for (int i = 0; i < n; i++)
> {
>   result += i+1;
> }
> 
>   if (async_sum (1) != result)
> __builtin_abort ();
> 
>   return 0;
> }
> 
> 
> Grüße
>  Thomas
> 



Re: [patch] Fix segfault in vectorizer

2016-05-31 Thread Eric Botcazou
> Note that vect_determine_vectorization_factor is supposed to set the
> vector type on all
> stmts.  That it doesn't is a bug.  Do you run into the else branch?

Yes, for

  result_15 = _6 & result_3;

wich is a BIT_AND_EXPR, hence accepted by vectorizable_reduction.

> I think that should only trigger with STMT_VINFO_RELEVANT_P and thus the
> stmt in mask_producers which should later get post-processed and have the
> VECTYPE set.

OK, STMT_VINFO_RELEVANT_P is indeed not set:

(gdb) p *stmt_info
$4 = {type = undef_vec_info_type, live = true, in_pattern_p = false, 
  stmt = 0x768f5738, vinfo = 0x2e14820, vectype = 0x0, 
  vectorized_stmt = 0x0, data_ref_info = 0x0, dr_base_address = 0x0, 
  dr_init = 0x0, dr_offset = 0x0, dr_step = 0x0, dr_aligned_to = 0x0, 
  loop_phi_evolution_base_unchanged = 0x0, loop_phi_evolution_part = 0x0, 
  related_stmt = 0x0, pattern_def_seq = 0x0, same_align_refs = {m_vec = 0x0}, 
  simd_clone_info = {m_vec = 0x0}, def_type = vect_reduction_def, 
  slp_type = loop_vect, first_element = 0x0, next_element = 0x0, 
  same_dr_stmt = 0x0, size = 0, store_count = 0, gap = 0, min_neg_dist = 0, 
  relevant = vect_unused_in_scope, vectorizable = true, 
  gather_scatter_p = false, strided_p = false, simd_lane_access_p = false, 
  v_reduc_type = TREE_CODE_REDUCTION, num_slp_uses = 0}

-- 
Eric Botcazou


Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-05-31 Thread James Greenhalgh
On Fri, May 27, 2016 at 05:57:30PM -0500, Evandro Menezes wrote:
> On 05/25/16 11:16, James Greenhalgh wrote:
> >On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
> >>gcc/
> >> * config/aarch64/aarch64-protos.h
> >> (tune_params): Add new member "approx_div_modes".
> >> (aarch64_emit_approx_div): Declare new function.
> >> * config/aarch64/aarch64.c
> >> (generic_tunings): New member "approx_div_modes".
> >> (cortexa35_tunings): Likewise.
> >> (cortexa53_tunings): Likewise.
> >> (cortexa57_tunings): Likewise.
> >> (cortexa72_tunings): Likewise.
> >> (exynosm1_tunings): Likewise.
> >> (thunderx_tunings): Likewise.
> >> (xgene1_tunings): Likewise.
> >> (aarch64_emit_approx_div): Define new function.
> >> * config/aarch64/aarch64.md ("div3"): New expansion.
> >> * config/aarch64/aarch64-simd.md ("div3"): Likewise.
> >> * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
> >> * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >My comments from the other two patches around using a structure to
> >group up the tuning flags and whether we really want the new option
> >apply here too.
> >
> >This code has no consumers by default and is only used for
> >-mlow-precision-div. Is this option likely to be useful to our users in
> >practice? It might all be more palatable under something like the rs6000's
> >-mrecip=opt .
> 
> I agree.  OK as a follow up?

Works for me.

> >>diff --git a/gcc/config/aarch64/aarch64-simd.md 
> >>b/gcc/config/aarch64/aarch64-simd.md
> >>index 47ccb18..7e99e16 100644
> >>--- a/gcc/config/aarch64/aarch64-simd.md
> >>+++ b/gcc/config/aarch64/aarch64-simd.md
> >>@@ -1509,7 +1509,19 @@
> >>[(set_attr "type" "neon_fp_mul_")]
> >>  )
> >>-(define_insn "div3"
> >>+(define_expand "div3"
> >>+ [(set (match_operand:VDQF 0 "register_operand")
> >>+   (div:VDQF (match_operand:VDQF 1 "general_operand")
> >What does this relaxation to general_operand give you?
> 
> Hold that thought...
> 
> >>+(match_operand:VDQF 2 "register_operand")))]
> >>+ "TARGET_SIMD"
> >>+{
> >>+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> >>+DONE;
> >>+
> >>+  operands[1] = force_reg (mode, operands[1]);
> >...other than the need to do this (sorry if I've missed something obvious).
> 
> Hold on...
> 
> >>+  if (num != CONST1_RTX (mode))
> >>+{
> >>+  /* Calculate the approximate division.  */
> >>+  rtx xnum = force_reg (mode, num);
> >>+  emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> >>+}
> 
> About that relaxation, as you can see here, since the series
> approximates the reciprocal of the denominator, if the numerator is
> 1.0, a register can be spared, as the result is ready and the
> numerator is not needed.

But, in the case that the multiplication is by 1, can we not rely on the
other optimization machinery eliminating it? I mean, I see the optimization
that this enables for you, but can't you rely on future passes to do the
cleanup, and save yourself the few lines of special casing?

> +/* Emit the instruction sequence to compute the approximation for the 
> division
> +   of NUM by DEN and return whether the sequence was emitted or not.  */

Needs a brief mention of what we use QUO for :).

> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
> +{
> +  machine_mode mode = GET_MODE (quo);
> +  bool use_approx_division_p = (flag_mlow_precision_div
> + || (aarch64_tune_params.approx_modes->division
> + & AARCH64_APPROX_MODE (mode)));
> +
> +  if (!flag_finite_math_only
> +  || flag_trapping_math
> +  || !flag_unsafe_math_optimizations
> +  || optimize_function_for_size_p (cfun)
> +  || !use_approx_division_p)
> +return false;
> +
> +  /* Estimate the approximate reciprocal.  */
> +  rtx xrcp = gen_reg_rtx (mode);
> +  emit_insn ((*get_recpe_type (mode)) (xrcp, den));
> +
> +  /* Iterate over the series twice for SF and thrice for DF.  */
> +  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> +  /* Optionally iterate over the series once less for faster performance,
> + while sacrificing the accuracy.  */
> +  if (flag_mlow_precision_div)
> +iterations--;
> +
> +  /* Iterate over the series to calculate the approximate reciprocal.  */
> +  rtx xtmp = gen_reg_rtx (mode);
> +  while (iterations--)
> +{
> +  emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
> +
> +  if (iterations > 0)
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> +}
> +
> +  if (num != CONST1_RTX (mode))
> +{
> +  /* As the approximate reciprocal of the denominator is already 
> calculated,
> + only calculate the approximate division when the numerator is not 
> 1.0.  */

Long lines.

> +  rtx xnum = force_reg (mode, num);

Re: [patch] Fix segfault in vectorizer

2016-05-31 Thread Richard Biener
On Tue, May 31, 2016 at 10:22 AM, Eric Botcazou  wrote:
> Hi,
>
> it's a regression present on the mainline and 6 branch: for the attached Ada
> testcase, optab_for_tree_code segfaults because the function is invoked on a
> NULL_TREE vectype_out from the vectorizer (vectorizable_reduction):
>
>   if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == TREE_CODE_REDUCTION
>   || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> == INTEGER_INDUC_COND_REDUCTION)
> {
>   if (reduction_code_for_scalar_code (orig_code, _reduc_code))
> {
>   reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
>  optab_default);
>
> Naive attempts at working around the NULL_TREE result in bogus vectorization
> and GIMPLE verification failure so it seems clear that vectype_out ought not
> to be NULL_TREE here.
>
> The problems stems from vect_determine_vectorization_factor, namely:
>
>   /* Bool ops don't participate in vectorization factor
>  computation.  For comparison use compared types to
>  compute a factor.  */
>   if (TREE_CODE (scalar_type) == BOOLEAN_TYPE
>   && is_gimple_assign (stmt)
>   && gimple_assign_rhs_code (stmt) != COND_EXPR)
> {
>   if (STMT_VINFO_RELEVANT_P (stmt_info))
> mask_producers.safe_push (stmt_info);
>   bool_result = true;
>
>   if (gimple_code (stmt) == GIMPLE_ASSIGN
>   && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
>  == tcc_comparison
>   && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt)))
>  != BOOLEAN_TYPE)
> scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
>   else
> {
>  if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> {
>   pattern_def_seq = NULL;
>   gsi_next ();
> }
>   continue;
> }
> }
>
> which leaves STMT_VINFO_VECTYPE set to NULL_TREE.  It would have been nice to
> say in the comment why boolean operations don't participate in vectorization
> factor computation; one can only assume that it's because they are somehow
> treated specially, so the proposed fix does the same in vectorizable_reduction
> (and is probably a no-op except for Ada because of the precision test).

Note that vect_determine_vectorization_factor is supposed to set the
vector type on all
stmts.  That it doesn't is a bug.  Do you run into the else branch?  I
think that should
only trigger with STMT_VINFO_RELEVANT_P and thus the stmt in mask_producers
which should later get post-processed and have the VECTYPE set.

But maybe Ilya can shed some more light on this (awkward) code.

Richard.

> Tested on x86_64-suse-linux, OK for mainline and 6 branch?
>
>
> 2016-05-31  Eric Botcazou  
>
> * tree-vect-loop.c (vectorizable_reduction): Also return false if the
> scalar type is a boolean type.
>
>
> 2016-05-31  Eric Botcazou  
>
> * gnat.dg/opt56.ad[sb]: New test.
>
> --
> Eric Botcazou


Re: [PATCH] Fix PR tree-optimization/71314

2016-05-31 Thread Richard Biener
On Mon, May 30, 2016 at 9:50 PM, Patrick Palka  wrote:
> The test case ssa-thread-14.c expects that non-short-circuit logical ops
> are used so it fails on targets that don't use such ops by default.  To
> fix, I copied the dg directives from tree-ssa/reassoc-35.c which look
> reasonable.  Does this look OK to commit?

Yes.

Richard.

> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/71314
> * gcc.dg/tree-ssa/ssa-thread-14.c: Adjust target selector.  Pass
> -mbranch-cost= 2.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> index e2ac2f7..c754b5b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> @@ -1,5 +1,6 @@
> -/* { dg-do compile }  */
> +/* { dg-do compile { target { ! { m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
> v850*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
> xtensa*-*-* hppa*-*-* nios2*-*-* } } } }  */
>  /* { dg-additional-options "-O2 -fdump-tree-vrp-details" }  */
> +/* { dg-additional-options "-mbranch-cost=2" { target mips*-*-* avr-*-* 
> s390*-*-* i?86-*-* x86_64-*-* } }  */
>  /* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
>
>  void foo (void);
> --
> 2.9.0.rc0.29.gabd6606
>


Re: [PATCH] Fix PR tree-optimization/71077

2016-05-31 Thread Richard Biener
On Mon, May 30, 2016 at 9:50 PM, Patrick Palka  wrote:
> In this PR the function simplify_control_stmt_condition_1(), which is
> responsible for recursing into the operands of a GIMPLE_COND during jump
> threading to check for dominating ASSERT_EXPRs, was erroneously
> returning a VECTOR_CST when it should instead be returning an
> INTEGER_CST.  The problem is that the zero/one constants that this
> function uses share the type of the GIMPLE_COND's operands, operands
> which may be vectors.  This makes no sense and there is no reason to
> build and use such constants instead of using boolean_[true|false]_node
> since all that matters is whether a constant returned by
> simplify_control_stmt_condition() correctly satisfies
> integer_[zero|one]p.  So this patch makes the combining part of
> simplify_control_stmt_condition_1() just use boolean_false_node or
> boolean_true_node as the designated false/true values.
>
> Does this look OK to commit after bootstrap+regtest on
> x86_64-pc-linux-gnu?  (I'm not sure if I put the test in the proper
> directory since no other test in tree-ssa uses -flto.  Also not sure if
> an i686 target accepts the -march=core-avx2 flag.)

Yes.

Ok.
Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/71077
> * tree-ssa-threadedge.c (simplify_control_stmt_condition_1): In
> the combining step, use boolean_false_node and boolean_true_node
> as the designated false/true return values.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/71077
> * gcc.dg/tree-ssa/pr71077.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr71077.c | 18 ++
>  gcc/tree-ssa-threadedge.c   | 26 --
>  2 files changed, 30 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71077.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c
> new file mode 100644
> index 000..4753740
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71077.c
> @@ -0,0 +1,18 @@
> +/* PR c++/71077  */
> +/* { dg-do link { target { i?86-*-* x86_64-*-* } } }  */
> +/* { dg-options "-O3 -flto -march=core-avx2" }  */
> +
> +int *a;
> +int b, c, d, e;
> +int sched_analyze(void) {
> + for (; b; b++) {
> +   c = 0;
> +   for (; c < 32; c++)
> + if (b & 1 << c)
> +   a[b + c] = d;
> + }
> + return 0;
> +}
> +
> +void schedule_insns(void) { e = sched_analyze(); }
> +int main(void) { schedule_insns(); }
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index eca3812..826d620 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -573,8 +573,6 @@ simplify_control_stmt_condition_1 (edge e,
>   enum tree_code rhs_code = gimple_assign_rhs_code (def_stmt);
>   const tree rhs1 = gimple_assign_rhs1 (def_stmt);
>   const tree rhs2 = gimple_assign_rhs2 (def_stmt);
> - const tree zero_cst = build_zero_cst (TREE_TYPE (op0));
> - const tree one_cst = build_one_cst (TREE_TYPE (op0));
>
>   /* Is A != 0 ?  */
>   const tree res1
> @@ -589,19 +587,19 @@ simplify_control_stmt_condition_1 (edge e,
> {
>   /* If A == 0 then (A & B) != 0 is always false.  */
>   if (cond_code == NE_EXPR)
> -   return zero_cst;
> +   return boolean_false_node;
>   /* If A == 0 then (A & B) == 0 is always true.  */
>   if (cond_code == EQ_EXPR)
> -   return one_cst;
> +   return boolean_true_node;
> }
>   else if (rhs_code == BIT_IOR_EXPR && integer_nonzerop (res1))
> {
>   /* If A != 0 then (A | B) != 0 is always true.  */
>   if (cond_code == NE_EXPR)
> -   return one_cst;
> +   return boolean_true_node;
>   /* If A != 0 then (A | B) == 0 is always false.  */
>   if (cond_code == EQ_EXPR)
> -   return zero_cst;
> +   return boolean_false_node;
> }
>
>   /* Is B != 0 ?  */
> @@ -617,19 +615,19 @@ simplify_control_stmt_condition_1 (edge e,
> {
>   /* If B == 0 then (A & B) != 0 is always false.  */
>   if (cond_code == NE_EXPR)
> -   return zero_cst;
> +   return boolean_false_node;
>   /* If B == 0 then (A & B) == 0 is always true.  */
>   if (cond_code == EQ_EXPR)
> -   return one_cst;
> +   return boolean_true_node;
> }
>   else if (rhs_code == BIT_IOR_EXPR && integer_nonzerop (res2))
> {
>   /* If B != 0 then (A | B) != 0 is always true.  */
>   if (cond_code == NE_EXPR)
> -   return one_cst;
> +   return boolean_true_node;
>   /* If B != 0 then (A | B) == 0 is always false.  */
>   if 

[patch] Fix segfault in vectorizer

2016-05-31 Thread Eric Botcazou
Hi,

it's a regression present on the mainline and 6 branch: for the attached Ada 
testcase, optab_for_tree_code segfaults because the function is invoked on a 
NULL_TREE vectype_out from the vectorizer (vectorizable_reduction):

  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == TREE_CODE_REDUCTION
  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
== INTEGER_INDUC_COND_REDUCTION)
{
  if (reduction_code_for_scalar_code (orig_code, _reduc_code))
{
  reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
 optab_default);

Naive attempts at working around the NULL_TREE result in bogus vectorization 
and GIMPLE verification failure so it seems clear that vectype_out ought not 
to be NULL_TREE here.

The problems stems from vect_determine_vectorization_factor, namely:

  /* Bool ops don't participate in vectorization factor
 computation.  For comparison use compared types to
 compute a factor.  */
  if (TREE_CODE (scalar_type) == BOOLEAN_TYPE
  && is_gimple_assign (stmt)
  && gimple_assign_rhs_code (stmt) != COND_EXPR)
{
  if (STMT_VINFO_RELEVANT_P (stmt_info))
mask_producers.safe_push (stmt_info);
  bool_result = true;

  if (gimple_code (stmt) == GIMPLE_ASSIGN
  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
 == tcc_comparison
  && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt)))
 != BOOLEAN_TYPE)
scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
  else
{
 if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
{
  pattern_def_seq = NULL;
  gsi_next ();
}
  continue;
}
}

which leaves STMT_VINFO_VECTYPE set to NULL_TREE.  It would have been nice to 
say in the comment why boolean operations don't participate in vectorization 
factor computation; one can only assume that it's because they are somehow 
treated specially, so the proposed fix does the same in vectorizable_reduction 
(and is probably a no-op except for Ada because of the precision test).

Tested on x86_64-suse-linux, OK for mainline and 6 branch?


2016-05-31  Eric Botcazou  

* tree-vect-loop.c (vectorizable_reduction): Also return false if the
scalar type is a boolean type.


2016-05-31  Eric Botcazou  

* gnat.dg/opt56.ad[sb]: New test.

-- 
Eric BotcazouIndex: tree-vect-loop.c
===
--- tree-vect-loop.c	(revision 236877)
+++ tree-vect-loop.c	(working copy)
@@ -5496,13 +5496,15 @@ vectorizable_reduction (gimple *stmt, gi
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
-  if (!POINTER_TYPE_P (scalar_type) && !INTEGRAL_TYPE_P (scalar_type)
+  if (!POINTER_TYPE_P (scalar_type)
+  && !INTEGRAL_TYPE_P (scalar_type)
   && !SCALAR_FLOAT_TYPE_P (scalar_type))
 return false;
 
-  /* Do not try to vectorize bit-precision reductions.  */
-  if ((TYPE_PRECISION (scalar_type)
-   != GET_MODE_PRECISION (TYPE_MODE (scalar_type
+  /* Do not try to vectorize boolean or bit-precision reductions.  */
+  if (TREE_CODE (scalar_type) == BOOLEAN_TYPE
+  || TYPE_PRECISION (scalar_type)
+	 != GET_MODE_PRECISION (TYPE_MODE (scalar_type)))
 return false;
 
   /* All uses but the last are expected to be defined in the loop.
-- { dg-do compile }
-- { dg-options "-O3" }

package body Opt56 is

   function F (Values : Vector) return Boolean is
  Result : Boolean := True;
   begin
  for I in Values'Range loop
 Result := Result and Values (I) >= 0.0;
 end loop;
 return Result;
   end;

end Opt56;
package Opt56 is

   type Vector is array (Positive range <>) of Float;

   function F (Values : Vector) return Boolean;

end Opt56;


Re: [PATCH][wwwdocs] Improve arm and aarch64-related info in readings.html

2016-05-31 Thread Kyrill Tkachov

Hi Gerald,

On 28/05/16 21:04, Gerald Pfeifer wrote:

Hi Kyrill,

On Thu, 19 May 2016, Kyrill Tkachov wrote:

I noticed that we have a readings.html page that has pointers to
documentation of various backends that GCC supports. The info on arm
seems a bit out of date and somewhat confusing, and there is no entry
for aarch64. This patch tries to address that.

I see you have not applied this yet.  In case you were waiting
for any further approvals beyond James': (a) His is sufficient
for doc and web pages around AArch64, and (b) I agree, too. :-)


I was going to apply it (with James's suggestions) but I got
sidetracked by other things. I'll do it today.

Thanks,
Kyrill


Gerald




Re: [wwwdocs] Document GCC 6 Solaris changes

2016-05-31 Thread Rainer Orth
Gerald Pfeifer  writes:

> On Tue, 19 Apr 2016, Rainer Orth wrote:
 [gcc-6/changes.html]
>> Btw., I noticed that the subsections of `Operating Systems' are in
>> random order.  Shouldn't they be sorted alphabetically?
>
> Yes.  Want to give it a try?  It's surely pre-approved.

Sure: done like this.  Fortunately, mostly the AIX entry was out of place.

Rainer


2016-05-31  Rainer Orth  

* htdocs/gcc-6/changes.html (Operating Systems): Sort entries.

? update.log
Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.81
diff -u -p -r1.81 changes.html
--- htdocs/gcc-6/changes.html	4 May 2016 10:07:41 -	1.81
+++ htdocs/gcc-6/changes.html	31 May 2016 08:00:59 -
@@ -765,6 +765,14 @@ within strings:
 
 Operating Systems
 
+AIX
+  
+DWARF debugging support for AIX 7.1 has been enabled as an optional
+debugging format.  A more recent Technology Level (TL) and GCC built
+with that level are required for full exploitation of DWARF debugging
+capabilities.
+  
+
 
 
 
@@ -796,14 +804,6 @@ within strings:
 variable GOMP_RTEMS_THREAD_POOLS.
   
 
-AIX
-  
-DWARF debugging support for AIX 7.1 has been enabled as an optional
-debugging format.  A more recent Technology Level (TL) and GCC built
-with that level are required for full exploitation of DWARF debugging
-capabilities.
-  
-
 Solaris
   
 Solaris 12 is now fully supported.  Minimal support had already
@@ -818,13 +818,13 @@ within strings:
 libvtv has been ported to Solaris 11 and up.
   
 
+
+
 Windows
   
The option -mstackrealign is now automatically activated
in 32-bit mode whenever the use of SSE instructions is requested.
   
-
-
 
 
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, OpenACC] Make reduction arguments addressable

2016-05-31 Thread Thomas Schwinge
Hi!

On Mon, 30 May 2016 18:53:41 +0200, Jakub Jelinek  wrote:
> On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote:
> > Hi, a previous patch of Cesar's has made the middle-end omp-lowering
> > automatically create and insert a tofrom (i.e. present_or_copy) map for
> > parallel reductions.  This allowed the user to not need explicit
> > clauses to copy out the reduction result, but because reduction arguments
> > are not marked addressable, async does not work as expected,
> > i.e. the asynchronous copy-out results are not used in the compiler 
> > generated code.
> 
> If you need it only for async parallel/kernels? regions, can't you do that
> only for those and not for others?

Also, please add comments to the source code to document the need for
such special handling.

> > This patch fixes this in the front-ends, I've tested this patch without
> > new regressions, and fixes some C++ OpenACC tests that regressed after
> > my last OpenACC async patch.  Is this okay for trunk?
> 
> Testcases in the testsuite or others?  If the latter, we should add them.

The r236772 commit "[PATCH, libgomp] Rewire OpenACC async",

regressed (or, triggered/exposed the existing wrong behavior?)
libgomp.oacc-c++/template-reduction.C execution testing for nvptx
offloading.  (Please always send email about such known regressions, and
XFAIL them with your commit -- that would have saved me an hour
yesterday, when I bisected recent changes to figure out why that test
suddenly fails.)

For reference, here is a test case, a reduced C version of
libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.  This test case
works (without Chung-Lin's "[PATCH, OpenACC] Make reduction arguments
addressable" patch) if I enable "POCs", which surprises me a bit, because
I thought after Cesar's recent changes, the gimplifier is doing the same
thing of adding a data clause next to the reduction clause.  Probably
it's not doing the exactly same thing, though.  Should it?  Cesar, do you
have any comments on this?  For example (just guessing), should
TREE_ADDRESSABLE be set where the gimplifier does its work, instead of in
the three front ends?

// Reduced C version of 
libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.

const int n = 100;

// Check present and async and an explicit firstprivate

int
async_sum (int c)
{
  int s = 0;

#define POCs //present_or_copy(s)
#pragma acc parallel loop num_gangs (10) gang reduction (+:s) POCs 
firstprivate (c) async
  for (int i = 0; i < n; i++)
s += i+c;

#pragma acc wait

  return s;
}

int
main()
{
  int result = 0;

  for (int i = 0; i < n; i++)
{
  result += i+1;
}

  if (async_sum (1) != result)
__builtin_abort ();

  return 0;
}


Grüße
 Thomas


signature.asc
Description: PGP signature