Re: [C PATCH] Warn for optimize attribute on decl after definition (PR c/70255)

2016-05-09 Thread Martin Sebor

On 05/09/2016 08:45 AM, Marek Polacek wrote:

In this PR, Richi pointed out that we don't warn for the case when a
declaration with attribute optimize follows the definition which is lacking
that attribute.  This patch adds such a warning.  Though the question is
whether this shouldn't apply to more attributes than just "optimize".  And,
as can be seen in the testcase, we'll warn for even for the case when the
definition has
   optimize ("no-associative-math,O2")
and the declaration
   optimize ("O2,no-associative-math")
Not sure if we have something better than attribute_value_equal, though.


There is attribute_list_equal which seems more appropriate given
that there could be more than one attribute optimize associated
with a function, and the order of the attributes shouldn't matter.
attribute_value_equal only returns true if all attributes are
the same and in the same order.  I would not expect GCC to warn
on the following, for example:

  int attribute__ ((optimize ("no-reciprocal-math"),
optimize ("no-associative-math")))
  f () { return 0; }

  int __attribute__ ((optimize ("no-associative-math"),
  optimize ("no-reciprocal-math")))
  f ();

Martin



[PATCH, rs6000] Fix PR70963: Wrong code for V2DF/V2DI vec_cts with zero scale factor

2016-05-09 Thread Bill Schmidt
Hi,

PR70963 reports a problem with vec_cts when used to convert vector double to 
vector long long.
This is due to a register with an undefined value that is generated only when 
the scale factor is
zero.  This patch adds logic to provide the correct value when the scale factor 
is zero.

The problem from the PR is in the define_expand for vsx_xvcvdpsxds_scale.  The 
define_expand
for vsx_xvcvdpuxds_scale clearly has the same problem, although it is not 
possible to reach this
via a call to vec_cts.  The raw builtin __builtin_vsx_xvcvdpuxds_scale can be 
used, however, and
I’ve shown this in the test case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.  
Is this ok for
trunk, and eventual backport to 6 and 5?

Thanks,
Bill


[gcc]

2016-05-09  Bill Schmidt  

* config/rs6000/vsx.md (vsx_xvcvdpsxds_scale): Generate correct
code for a zero scale factor.
(vsx_xvcvdpuxds_scale): Likewise.

[gcc/testsuite]

2016-05-09  Bill Schmidt  

* gcc.target/powerpc/pr70963.c: New.


Index: gcc/config/rs6000/vsx.md 
===
--- gcc/config/rs6000/vsx.md(revision 236051)   
+++ gcc/config/rs6000/vsx.md(working copy)  
@@ -1717,10 +1717,15 @@
 {  
   rtx op0 = operands[0];   
   rtx op1 = operands[1];   
-  rtx tmp = gen_reg_rtx (V2DFmode);
+  rtx tmp; 
   int scale = INTVAL(operands[2]); 
-  if (scale != 0)  
-rs6000_scale_v2df (tmp, op1, scale);   
+  if (scale == 0)  
+tmp = op1; 
+  else 
+{  
+  tmp  = gen_reg_rtx (V2DFmode);   
+  rs6000_scale_v2df (tmp, op1, scale); 
+}  
   emit_insn (gen_vsx_xvcvdpsxds (op0, tmp));   
   DONE;
 }) 
@@ -1741,10 +1746,15 @@
 {  
   rtx op0 = operands[0];   
   rtx op1 = operands[1];   
-  rtx tmp = gen_reg_rtx (V2DFmode);
+  rtx tmp; 
   int scale = INTVAL(operands[2]); 
-  if (scale != 0)  
-rs6000_scale_v2df (tmp, op1, scale);   
+  if (scale == 0)  
+tmp = op1; 
+  else 
+{  
+  tmp = gen_reg_rtx (V2DFmode);
+  rs6000_scale_v2df (tmp, op1, scale); 
+}  
   emit_insn (gen_vsx_xvcvdpuxds (op0, tmp));   
   DONE;
 }) 
Index: gcc/testsuite/gcc.target/powerpc/pr70963.c   
===
--- gcc/testsuite/gcc.target/powerpc/pr70963.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr70963.c  (working copy)  
@@ -0,0 +1,39 @@
+/* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */   
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */  
+/* { dg-options "-maltivec" } */  

[PATCH] PR driver/69265: add hint for options with misspelled arguments

2016-05-09 Thread David Malcolm
opts-common.c's cmdline_handle_error handles invalid arguments
for options with CL_ERR_ENUM_ARG by building a strings listing the
valid arguments.  By also building a vec of valid arguments, we
can use find_closest_string and provide a hint if we see a close
misspelling.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
PR driver/69265
* Makefile.in (GCC_OBJS): Move spellcheck.o to...
(OBJS-libcommon-target): ...here.
* opts-common.c: Include spellcheck.h.
(cmdline_handle_error): Build a vec of valid options and use it
to suggest provide hints for misspelled arguments.

gcc/testsuite/ChangeLog:
PR driver/69265
* gcc.dg/spellcheck-options-11.c: New test case.
---
 gcc/Makefile.in  |  4 ++--
 gcc/opts-common.c| 11 ++-
 gcc/testsuite/gcc.dg/spellcheck-options-11.c |  7 +++
 3 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-options-11.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6c5adc0..525482f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1159,7 +1159,7 @@ CXX_TARGET_OBJS=@cxx_target_objs@
 FORTRAN_TARGET_OBJS=@fortran_target_objs@
 
 # Object files for gcc many-languages driver.
-GCC_OBJS = gcc.o gcc-main.o ggc-none.o spellcheck.o
+GCC_OBJS = gcc.o gcc-main.o ggc-none.o
 
 c-family-warn = $(STRICT_WARN)
 
@@ -1548,7 +1548,7 @@ OBJS-libcommon = diagnostic.o diagnostic-color.o 
diagnostic-show-locus.o \
 # compiler and containing target-dependent code.
 OBJS-libcommon-target = $(common_out_object_file) prefix.o params.o \
opts.o opts-common.o options.o vec.o hooks.o common/common-targhooks.o \
-   hash-table.o file-find.o
+   hash-table.o file-find.o spellcheck.o
 
 # This lists all host objects for the front ends.
 ALL_HOST_FRONTEND_OBJS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_OBJS))
diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index bb68982..4e1ef49 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "options.h"
 #include "diagnostic.h"
+#include "spellcheck.h"
 
 static void prune_options (struct cl_decoded_option **, unsigned int *);
 
@@ -1113,6 +1114,7 @@ cmdline_handle_error (location_t loc, const struct 
cl_option *option,
   for (i = 0; e->values[i].arg != NULL; i++)
len += strlen (e->values[i].arg) + 1;
 
+  auto_vec  candidates;
   s = XALLOCAVEC (char, len);
   p = s;
   for (i = 0; e->values[i].arg != NULL; i++)
@@ -1123,9 +1125,16 @@ cmdline_handle_error (location_t loc, const struct 
cl_option *option,
  memcpy (p, e->values[i].arg, arglen);
  p[arglen] = ' ';
  p += arglen + 1;
+ candidates.safe_push (e->values[i].arg);
}
   p[-1] = 0;
-  inform (loc, "valid arguments to %qs are: %s", option->opt_text, s);
+  const char *hint = find_closest_string (arg, );
+  if (hint)
+   inform (loc, "valid arguments to %qs are: %s; did you mean %qs?",
+   option->opt_text, s, hint);
+  else
+   inform (loc, "valid arguments to %qs are: %s", option->opt_text, s);
+
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.dg/spellcheck-options-11.c 
b/gcc/testsuite/gcc.dg/spellcheck-options-11.c
new file mode 100644
index 000..8e27141
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-options-11.c
@@ -0,0 +1,7 @@
+/* Verify that we provide a hint if the user misspells an option argument
+   (PR driver/69265).  */
+
+/* { dg-do compile } */
+/* { dg-options "-ftls-model=global-dinamic" } */
+/* { dg-error "unknown TLS model 'global-dinamic'"  "" { target *-*-* } 0 } */
+/* { dg-message "valid arguments to '-ftls-model=' are: global-dynamic 
initial-exec local-dynamic local-exec; did you mean 'global-dynamic'?"  "" { 
target *-*-* } 0 } */
-- 
1.8.5.3



PING^4 [PATCH, GCC 5] PR 70613, -fabi-version docs don't match implementation

2016-05-09 Thread Jim Wilson
On Mon, May 2, 2016 at 12:13 PM, Jim Wilson  wrote:
> Here is a patch to correct the -fabi-version docs on the GCC 5 branch.
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00480.html

Maybe I didn't put enough info in the email the first 3 times?

You can see the default -fabi-version in gcc/c-family/c-opts.c on the
gcc-5 branch which has

  /* Change flag_abi_version to be the actual current ABI level for the
 benefit of c_cpp_builtins.  */
  if (flag_abi_version == 0)
flag_abi_version = 9;

You can see in the docs that -fabi-version only goes up to 8.

https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/C_002b_002b-Dialect-Options.html#C_002b_002b-Dialect-Options

As for how we got here...
I see that the patch for bug 65945 was back ported to the gcc-5
branch, which required a partial backport of the patch for bug 44282,
which added abi version 9.  The original patch for 44282 is missing
the doc change.

The missing doc change was then added here
https://gcc.gnu.org/viewcvs/gcc?view=revision=228017
which has the invoke.texi hunk we need, but is missing a ChangeLog
entry for it.  So it appears all we need is a partial backport of this
invoke.texi hunk.  This is mostly documenting a change to -Wabi, so we
only need parts of two hunks that document -fabi-version=9 and mention
gcc-5.2.

The patch is attached again.

Jim
Index: ChangeLog
===
--- ChangeLog	(revision 234867)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2016-04-11  Jim Wilson  
+
+	Partial backport from trunk r228017.
+	2015-09-22  Jason Merrill  
+
+	PR c++/70613
+	* doc/invoke.texi (-fabi-version): Document version 9.
+	(-Wabi): Document version 9.  Mention version 8 is default for GCC 5.1.
+
 2016-04-09  Oleg Endo  
 
 	Backport from mainline
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 234867)
+++ doc/invoke.texi	(working copy)
@@ -2118,6 +2118,9 @@ scope.
 Version 8, which first appeared in G++ 4.9, corrects the substitution
 behavior of function types with function-cv-qualifiers.
 
+Version 9, which first appeared in G++ 5.2, corrects the alignment of
+@code{nullptr_t}.
+
 See also @option{-Wabi}.
 
 @item -fabi-compat-version=@var{n}
@@ -2619,7 +2622,15 @@ When mangling a function type with function-cv-qua
 un-qualified function type was incorrectly treated as a substitution
 candidate.
 
-This was fixed in @option{-fabi-version=8}.
+This was fixed in @option{-fabi-version=8}, the default for GCC 5.1.
+
+@item
+@code{decltype(nullptr)} incorrectly had an alignment of 1, leading to
+unaligned accesses.  Note that this did not affect the ABI of a
+function with a @code{nullptr_t} parameter, as parameters have a
+minimum alignment.
+
+This was fixed in @option{-fabi-version=9}, the default for GCC 5.2.
 @end itemize
 
 It also warns about psABI-related changes.  The known psABI changes at this


Re: [C PATCH] Warn for optimize attribute on decl after definition (PR c/70255)

2016-05-09 Thread Joseph Myers
On Mon, 9 May 2016, Marek Polacek wrote:

> In this PR, Richi pointed out that we don't warn for the case when a
> declaration with attribute optimize follows the definition which is lacking
> that attribute.  This patch adds such a warning.  Though the question is
> whether this shouldn't apply to more attributes than just "optimize".  And,
> as can be seen in the testcase, we'll warn for even for the case when the
> definition has
>   optimize ("no-associative-math,O2")
> and the declaration
>   optimize ("O2,no-associative-math")
> Not sure if we have something better than attribute_value_equal, though.
> 
> (The C++ FE lacks these kind of warnings; I opened PR71024 for that.)
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix PR c++/70822 (bogus error with parenthesized SCOPE_REF)

2016-05-09 Thread Patrick Palka
On Fri, Apr 29, 2016 at 11:55 AM, Patrick Palka  wrote:
> The problem here is that some code paths are not prepared to handle a
> non-dependent PAREN_EXPR, which my fix for PR c++/70106 introduced.  In
> particular lvalue_kind() returns clk_none for a PAREN_EXPR which makes
> build_x_unary_op() emit a bogus error for an expression like &(A::b).
> (If the PAREN_EXPR were dependent then lvalue_kind() wouldn't get called
> in the first place, build_x_unary_op() would exit early.)
>
> This patch replaces the 70106 fix.  Instead of wrapping a SCOPE_REF in a
> PAREN_EXPR, this patch overloads the REF_PARENTHESIZED_P to apply to
> SCOPE_REFs too.  This makes sense to me because the two tree codes are
> closely related (e.g. a SCOPE_REF before instantiation may become a
> COMPONENT_REF after instantiation) so they should be treated similarly
> by force_paren_expr().
>
> There are two rather simpler ways to fix this PR.  One is to make
> lvalue_kind() recurse into PAREN_EXPRs (although other parts of the FE
> may be mishandling non-dependent PAREN_EXPRs as well), and the other is
> to make force_paren_expr() never return a non-dependent PAREN_EXPR,
> which can be achieved by building the PAREN_EXPR with build_nt().  I am
> not sure which approach is best for GCC 7 and for GCC 6.
>
> Somewhat unrelated the fix: I couldn't find an existing test that
> checked that force_paren_expr handles SCOPE_REFs properly wrt auto
> deduction so I added one.
>
> Bootstrap and regtesting in progress on x86_64-pc-linux-gnu.
>
> gcc/cp/ChangeLog:
>
> PR c++/70822
> PR c++/70106
> * cp-tree.h (REF_PARENTHESIZED_P): Make this flag apply to
> SCOPE_REFs too.
> * pt.c (tsubst_qualified_id): If REF_PARENTHESIZED_P is set
> on the qualified_id then propagate it to the resulting
> expression.
> (do_auto_deduction): Check REF_PARENTHESIZED_P on SCOPE_REFs
> too.
> * semantics.c (force_paren_expr): If given a SCOPE_REF, just set
> its REF_PARENTHESIZED_P flag.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/70822
> PR c++/70106
> * g++.dg/cpp1y/auto-fn31.C: New test.
> * g++.dg/cpp1y/paren4.C: New test.
> ---
>  gcc/cp/cp-tree.h   |  4 ++--
>  gcc/cp/pt.c| 15 +++
>  gcc/cp/semantics.c | 13 +++--
>  gcc/testsuite/g++.dg/cpp1y/auto-fn31.C | 33 +
>  gcc/testsuite/g++.dg/cpp1y/paren4.C| 14 ++
>  5 files changed, 63 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn31.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren4.C
>
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 2caf7ce..0df5953 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -170,7 +170,7 @@ operator == (const cp_expr , tree rhs)
>TARGET_EXPR_DIRECT_INIT_P (in TARGET_EXPR)
>FNDECL_USED_AUTO (in FUNCTION_DECL)
>DECLTYPE_FOR_LAMBDA_PROXY (in DECLTYPE_TYPE)
> -  REF_PARENTHESIZED_P (in COMPONENT_REF, INDIRECT_REF)
> +  REF_PARENTHESIZED_P (in COMPONENT_REF, INDIRECT_REF, SCOPE_REF)
>AGGR_INIT_ZERO_FIRST (in AGGR_INIT_EXPR)
>CONSTRUCTOR_MUTABLE_POISON (in CONSTRUCTOR)
> 3: (TREE_REFERENCE_EXPR) (in NON_LVALUE_EXPR) (commented-out).
> @@ -3403,7 +3403,7 @@ extern void decl_shadowed_for_var_insert (tree, tree);
> some of the time in C++14 mode.  */
>
>  #define REF_PARENTHESIZED_P(NODE) \
> -  TREE_LANG_FLAG_2 (TREE_CHECK2 ((NODE), COMPONENT_REF, INDIRECT_REF))
> +  TREE_LANG_FLAG_2 (TREE_CHECK3 ((NODE), COMPONENT_REF, INDIRECT_REF, 
> SCOPE_REF))
>
>  /* Nonzero if this AGGR_INIT_EXPR provides for initialization via a
> constructor call, rather than an ordinary function call.  */
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index e7ec629..7adf308 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -13741,8 +13741,10 @@ tsubst_qualified_id (tree qualified_id, tree args,
>  {
>if (is_template)
> expr = build_min_nt_loc (loc, TEMPLATE_ID_EXPR, expr, template_args);
> -  return build_qualified_name (NULL_TREE, scope, expr,
> -  QUALIFIED_NAME_IS_TEMPLATE (qualified_id));
> +  tree r = build_qualified_name (NULL_TREE, scope, expr,
> +QUALIFIED_NAME_IS_TEMPLATE 
> (qualified_id));
> +  REF_PARENTHESIZED_P (r) = REF_PARENTHESIZED_P (qualified_id);
> +  return r;
>  }
>
>if (!BASELINK_P (name) && !DECL_P (expr))
> @@ -13822,6 +13824,9 @@ tsubst_qualified_id (tree qualified_id, tree args,
>&& TREE_CODE (expr) != OFFSET_REF)
>  expr = convert_from_reference (expr);
>
> +  if (REF_PARENTHESIZED_P (qualified_id))
> +expr = force_paren_expr (expr);
> +
>return expr;
>  }
>
> @@ -23966,8 +23971,10 @@ do_auto_deduction (tree type, tree init, tree 
> auto_node,
>
>if (AUTO_IS_DECLTYPE 

Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-09 Thread Bernd Edlinger
On 05/09/16 15:46, Bernd Schmidt wrote:
> On 05/09/2016 03:37 PM, Bernd Edlinger wrote:
>> On 05/09/16 09:56, Richard Biener wrote:
>>>
>>> At least it sounds to me that its semantics can be fully expressed
>>> with generic asms?  (Maybe apart from the only-if-ASM_STRING-is-empty
>>> part)
>>>
>>
>> That was also my first idea too.
>>
>> In simple cases an asm ("whatever"); should do the same as
>> asm ("whatever" ::: );
>>
>> Adding a "memory" to the clobber list would be simple that's true.
>>
>> But in general it can be pretty complicated, especially if the
>> string contains the special characters % { | }.
>
> Is the only difference in how the string is output? Maybe we can have a
> slightly different form of ASM_OPERANDS (with a bit set, or with the
> string wrapped in something else) to indicate that it's old-style.

Most of the difference is what happens in final.c, and adding a new
attribute to the ASM_OPERANDS tree node is definitely one option.
I tried to implement it in a way that causes the least confusion.

There are lots of different tree representations for an extended asm
statement in genereal, but only one for a basic asm.

An extended asm that has no outputs and no clobbers, is an ASM_OPERAND
node with optional vector of ASM_INPUTs containig the input constraint:

ASM_OPERAND { "asm", "", 0, VEC { inputs...}, VEC { ASM_INPUT ("x")...}

but if it has any CLOBBERS, it will look like this:

PARALLEL { ASM_OPERAND, CLOBBER... }

if it has one output, and zero clobbers we have:

SET { x, ASM_OPERAND }

and in case we have more than one output we have:

PARALLEL { SET { x, ASM_OPERAND }... , CLOBBER... }


A basic asm is just an ASM_INPUT that is not underneath an ASM_OPERAND.

But to add any CLOBBERs to this ASM_INPUT it has to be in PARALLEL
with the CLOBBERs, so that would look like this:

PARALLEL { ASM_INPUT{ "asm" }, CLOBBER... }


There are lots of places where we need to know if a statement is an
assembler statement, in most places this is done in this way:

GET_CODE (PATTERN (insn)) == ASM_INPUT
|| asm_noperands (PATTERN (insn)) >= 0

There are a handful of places where it is done it this way:

GET_CODE (PATTERN (insn)) == ASM_INPUT
|| extract_asm_operands (PATTERN (insn)) != NULL_RTX

extract_asm_operands locates the ASM_OPERAND node from an extended
asm that can have either of the several forms above, but in most
cases the result is not looked at.  Making extract_asm_operands
return anything but an ASM_OPERANDS is impossible, but making
asm_noperands return 0 for a PARALLEL { ASM_INPUT, CLOBBER... }
is not too complicated.

Fortunately, all the remaining uses of extract_asm_operands really
mean an extended asm.

Hope that explains my idea.


Thanks
Bernd.


[gomp4] backport fix for PR70626

2016-05-09 Thread Cesar Philippidis
This patch backports the change in the way that 'acc parallel loop'
reductions are handled in trunk. Before, the reduction clause only used
to be associated with the split acc loop. Now the reduction clause is
associated with both the loop and the parallel region. That's beneficial
because the gimplifier adds implicit copy clauses if necessary for any
reduction variable attached to a parallel construct.

I had to update reduction-2.f95 because of the way that gomp4 implements
device_type, which tends to rearrange the ordering of the clauses. Also,
libgomp.oacc-c++/template-reduction.C is broken in gomp4, so I had to
xfail it. Apparently, it exposed an async bug. My forthcoming patch
which uses firstprivate pointers for subarrays should fix it.

This patch has been committed to gomp-4_0-branch.

Cesar
2016-05-09  Cesar Philippidis  

	Backport trunk r235651:
	2016-04-29  Cesar Philippidis  

	gcc/c-family/
	PR middle-end/70626
	* c-common.h (c_oacc_split_loop_clauses): Add boolean argument.
	* c-omp.c (c_oacc_split_loop_clauses): Use it to duplicate
	reduction clauses in acc parallel loops.

	gcc/c/
	PR middle-end/70626
	* c-parser.c (c_parser_oacc_loop): Don't augment mask with
	OACC_LOOP_CLAUSE_MASK.
	(c_parser_oacc_kernels_parallel): Update call to
	c_oacc_split_loop_clauses.

	gcc/cp/
	PR middle-end/70626
	* parser.c (cp_parser_oacc_loop): Don't augment mask with
	OACC_LOOP_CLAUSE_MASK.
	(cp_parser_oacc_kernels_parallel): Update call to
	c_oacc_split_loop_clauses.

	gcc/fortran/
	PR middle-end/70626
	* trans-openmp.c (gfc_trans_oacc_combined_directive): Duplicate
	the reduction clause in both parallel and loop directives.

	gcc/testsuite/
	PR middle-end/70626
	* c-c++-common/goacc/combined-reduction.c: New test.
	* gfortran.dg/goacc/reduction-2.f95: Add check for kernels reductions.

	libgomp/
	PR middle-end/70626
	* testsuite/libgomp.oacc-c++/template-reduction.C: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/combined-reduction.c: New test.
	* testsuite/libgomp.oacc-fortran/combined-reduction.f90: New test.

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ef3493e..daa77f9 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1285,7 +1285,7 @@ extern bool c_omp_check_loop_iv (tree, tree, walk_tree_lh);
 extern bool c_omp_check_loop_iv_exprs (location_t, tree, tree, tree, tree,
    walk_tree_lh);
 extern tree c_finish_oacc_wait (location_t, tree, tree);
-extern tree c_oacc_split_loop_clauses (tree, tree *);
+extern tree c_oacc_split_loop_clauses (tree, tree *, bool);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
  tree, tree *);
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 4d3f7dc..614ee1f 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -861,9 +861,10 @@ c_omp_check_loop_iv_exprs (location_t stmt_loc, tree declv, tree decl,
#pragma acc parallel loop  */
 
 tree
-c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
+c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses,
+			   bool is_parallel)
 {
-  tree next, loop_clauses;
+  tree next, loop_clauses, nc;
 
   loop_clauses = *not_loop_clauses = NULL_TREE;
   for (; clauses ; clauses = next)
@@ -882,7 +883,23 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_PRIVATE:
+	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
+	  loop_clauses = clauses;
+	  break;
+
+	  /* Reductions must be duplicated on both constructs.  */
 	case OMP_CLAUSE_REDUCTION:
+	  if (is_parallel)
+	{
+	  nc = build_omp_clause (OMP_CLAUSE_LOCATION (clauses),
+ OMP_CLAUSE_REDUCTION);
+	  OMP_CLAUSE_DECL (nc) = OMP_CLAUSE_DECL (clauses);
+	  OMP_CLAUSE_REDUCTION_CODE (nc)
+		= OMP_CLAUSE_REDUCTION_CODE (clauses);
+	  OMP_CLAUSE_CHAIN (nc) = *not_loop_clauses;
+	  *not_loop_clauses = nc;
+	}
+
 	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
 	  loop_clauses = clauses;
 	  break;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 48fa26a..0f2d871 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -14012,6 +14012,8 @@ static tree
 c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 		omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
+  bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+
   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
 
@@ -14020,7 +14022,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 	cclauses == NULL);
   if (cclauses)
 {
-  clauses = c_oacc_split_loop_clauses (clauses, cclauses);
+  clauses = c_oacc_split_loop_clauses (clauses, cclauses, is_parallel);
   if (*cclauses)
 	*cclauses = c_finish_omp_clauses (*cclauses, C_ORT_ACC);
   if (clauses)
@@ -14128,8 +14130,6 @@ 

[gomp4] backport the *finish_omp_clauses changes

2016-05-09 Thread Cesar Philippidis
This patch backports the *finish_omp_clauses changes I made to the c and
c++ front ends in trunk revision 235780. Like the cilk patch, there were
enough changes in gomp-4_0-branch which prevented this patch from
applying cleanly on that branch.

I've applied this patch to gomp-4_0-branch.

Cesar
2016-05-09  Cesar Philippidis  

	Backport trunk r235780:
	2016-05-02  Cesar Philippidis  

	gcc/c-family/
	* c-common.h (enum c_omp_region_type): Define.

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_all_clauses): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Replace bool arguments
	is_omp, declare_simd, and is_cilk with enum c_omp_region_type ort.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target):
	(cp_parser_cilk_simd_all_clauses): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_omp_clauses): Replace allow_fields and declare_simd
	arguments with enum c_omp_region_type ort.
	(tsubst_omp_clauses): Update calls to finish_omp_clauses.
	(tsubst_omp_attribute): Update calls to tsubst_omp_clauses.
	(tsubst_omp_for_iterator): Update calls to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Replace bool arguments
	allow_fields, declare_simd, and is_cilk with bitmask ort.
	(finish_omp_for): Update call to finish_omp_clauses.


diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 59da6c8..ef3493e 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1259,6 +1259,15 @@ enum c_omp_clause_split
   C_OMP_CLAUSE_SPLIT_TASKLOOP = C_OMP_CLAUSE_SPLIT_FOR
 };
 
+enum c_omp_region_type
+{
+  C_ORT_OMP			= 1 << 0,
+  C_ORT_CILK			= 1 << 1,
+  C_ORT_ACC			= 1 << 2,
+  C_ORT_DECLARE_SIMD		= 1 << 3,
+  C_ORT_OMP_DECLARE_SIMD	= C_ORT_OMP | C_ORT_DECLARE_SIMD,
+};
+
 extern tree c_finish_omp_master (location_t, tree);
 extern tree c_finish_omp_taskgroup (location_t, tree);
 extern tree c_finish_omp_critical (location_t, tree, tree, tree);
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7667715..48fa26a 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13367,7 +13367,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
   c_parser_skip_to_pragma_eol (parser);
 
   if (finish_p)
-return c_finish_omp_clauses (clauses, true, false);
+return c_finish_omp_clauses (clauses, C_ORT_ACC);
 
   return clauses;
 }
@@ -13652,8 +13652,8 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
   if (finish_p)
 {
   if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_UNIFORM)) != 0)
-	return c_finish_omp_clauses (clauses, false, true, true);
-  return c_finish_omp_clauses (clauses, false, true);
+	return c_finish_omp_clauses (clauses, C_ORT_OMP_DECLARE_SIMD);
+  return c_finish_omp_clauses (clauses, C_ORT_OMP);
 }
 
   return clauses;
@@ -13685,7 +13685,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
   tree stmt, clauses;
 
   clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE__CACHE_, NULL);
-  clauses = c_finish_omp_clauses (clauses, true, false);
+  clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
 
   c_parser_skip_to_pragma_eol (parser);
 
@@ -14022,9 +14022,9 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 {
   clauses = c_oacc_split_loop_clauses (clauses, cclauses);
   if (*cclauses)
-	*cclauses = c_finish_omp_clauses (*cclauses, true, false);
+	*cclauses = c_finish_omp_clauses (*cclauses, C_ORT_ACC);
   if (clauses)
-	clauses = c_finish_omp_clauses (clauses, true, false);
+	clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
 }
 
   tree block = c_begin_compound_stmt (true);
@@ -15228,7 +15228,7 @@ omp_split_clauses (location_t loc, enum tree_code code,
   c_omp_split_clauses (loc, code, mask, clauses, cclauses);
   for (i = 0; i < C_OMP_CLAUSE_SPLIT_COUNT; i++)
 if (cclauses[i])
-  cclauses[i] = c_finish_omp_clauses (cclauses[i], false, true);
+  cclauses[i] = c_finish_omp_clauses (cclauses[i], C_ORT_OMP);
 }
 
 /* OpenMP 4.0:
@@ -16759,7 +16759,7 @@ c_parser_omp_declare_target (c_parser *parser)
 {
   clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_TO_DECLARE,
 	  clauses);
-  clauses = c_finish_omp_clauses (clauses, false, true);
+  clauses = c_finish_omp_clauses (clauses, C_ORT_OMP);
   c_parser_skip_to_pragma_eol (parser);
 }
   

[gomp4] backport fix for PR69363

2016-05-09 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which backports some cilk
changes in the c and c++ front ends to gomp-4_0-branch. These changes
were necessary for my recent finish_omp_clauses patch, which I'll be
committing next.

Cesar
2016-05-09  Cesar Philippidis  

	Backport trunk r235290:
	2016-04-20  Ilya Verbin  

	gcc/c-family/
	PR c++/69363
	* c-cilkplus.c (c_finish_cilk_clauses): Remove function.
	* c-common.h (c_finish_cilk_clauses): Remove declaration.

	gcc/c/
	PR c++/69363
	* c-parser.c (c_parser_cilk_all_clauses): Use c_finish_omp_clauses
	instead of c_finish_cilk_clauses.
	* c-tree.h (c_finish_omp_clauses): Add new default argument.
	* c-typeck.c (c_finish_omp_clauses): Add new argument.  Allow
	floating-point variables in the linear clause for Cilk Plus.

	gcc/cp/
	PR c++/69363
	* cp-tree.h (finish_omp_clauses): Add new default argument.
	* parser.c (cp_parser_cilk_simd_all_clauses): Use finish_omp_clauses
	instead of c_finish_cilk_clauses.
	* semantics.c (finish_omp_clauses): Add new argument.  Allow
	floating-point variables in the linear clause for Cilk Plus.

	gcc/testsuite/
	PR c++/69363
	* c-c++-common/cilk-plus/PS/clauses3.c: Adjust dg-error string.
	* c-c++-common/cilk-plus/PS/clauses4.c: New test.
	* c-c++-common/cilk-plus/PS/pr69363.c: New test.


diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c
index 3e7902fd..9f1f364 100644
--- a/gcc/c-family/c-cilkplus.c
+++ b/gcc/c-family/c-cilkplus.c
@@ -41,56 +41,6 @@ c_check_cilk_loop (location_t loc, tree decl)
   return true;
 }
 
-/* Validate and emit code for <#pragma simd> clauses.  */
-
-tree
-c_finish_cilk_clauses (tree clauses)
-{
-  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  tree prev = clauses;
-
-  /* If a variable appears in a linear clause it cannot appear in
-	 any other OMP clause.  */
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR)
-	for (tree c2 = clauses; c2; c2 = OMP_CLAUSE_CHAIN (c2))
-	  {
-	if (c == c2)
-	  continue;
-	enum omp_clause_code code = OMP_CLAUSE_CODE (c2);
-
-	switch (code)
-	  {
-	  case OMP_CLAUSE_LINEAR:
-	  case OMP_CLAUSE_PRIVATE:
-	  case OMP_CLAUSE_FIRSTPRIVATE:
-	  case OMP_CLAUSE_LASTPRIVATE:
-	  case OMP_CLAUSE_REDUCTION:
-		break;
-
-	  case OMP_CLAUSE_SAFELEN:
-		goto next;
-
-	  default:
-		gcc_unreachable ();
-	  }
-
-	if (OMP_CLAUSE_DECL (c) == OMP_CLAUSE_DECL (c2))
-	  {
-		error_at (OMP_CLAUSE_LOCATION (c2),
-			  "variable appears in more than one clause");
-		inform (OMP_CLAUSE_LOCATION (c),
-			"other clause defined here");
-		// Remove problematic clauses.
-		OMP_CLAUSE_CHAIN (prev) = OMP_CLAUSE_CHAIN (c2);
-	  }
-	  next:
-	prev = c2;
-	  }
-}
-  return clauses;
-}
-
 /* Calculate number of iterations of CILK_FOR.  */
 
 tree
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ddd5c07..59da6c8 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1370,7 +1370,6 @@ extern enum stv_conv scalar_to_vector (location_t loc, enum tree_code code,
    tree op0, tree op1, bool);
 
 /* In c-cilkplus.c  */
-extern tree c_finish_cilk_clauses (tree);
 extern tree c_validate_cilk_plus_loop (tree *, int *, void *);
 extern bool c_check_cilk_loop (location_t, tree);
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1a4356f..7667715 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17728,7 +17728,7 @@ c_parser_cilk_all_clauses (c_parser *parser)
 
  saw_error:
   c_parser_skip_to_pragma_eol (parser);
-  return c_finish_cilk_clauses (clauses);
+  return c_finish_omp_clauses (clauses, false, false, false, true);
 }
 
 /* This function helps parse the grainsize pragma for a _Cilk_for statement.
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 70b7bd9..1703162 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -661,7 +661,7 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
-extern tree c_finish_omp_clauses (tree, bool, bool, bool = false);
+extern tree c_finish_omp_clauses (tree, bool, bool, bool = false, bool = false);
 extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4813d4b..067ce82 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12496,7 +12496,8 @@ c_find_omp_placeholder_r (tree *tp, int *, void *data)
Remove any elements from the list that are invalid.  */
 
 tree
-c_finish_omp_clauses (tree clauses, bool is_oacc, bool is_omp, bool declare_simd)
+c_finish_omp_clauses (tree clauses, bool is_oacc, bool is_omp,
+		  bool declare_simd, bool is_cilk)
 {
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
   bitmap_head 

Re: [PATCH] Remove constraints from further i386 define_expand patterns

2016-05-09 Thread Uros Bizjak
On Mon, May 9, 2016 at 6:49 PM, Jakub Jelinek  wrote:
> Hi!
>
> I believe this cleans up all remaining define_expands (have looked
> at tmp-mddump.md with sed picking up only define_expand patterns in there
> and have been looking for any constraints and none were left).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

reload_noff_store and reload_noff_load are part of secondary_reload
infrastructure, and this expander must have constraints, as said in
the documentation for TARGET_SECONDARY_RELOAD:

 You do this by setting 'sri->icode' to the instruction code of a
 pattern in the md file which performs the move.  Operands 0 and 1
 are the output and input of this copy, respectively.  Operands from
 operand 2 onward are for scratch operands.  These scratch operands
 must have a mode, and a single-register-class output constraint.

It is true, that the doc mentions only scratch operands, so it is
probably OK also to remove constraint from non-scratch operands of
these two patterns. Please confirm this with reload expert.

Others are OK.

Uros.

> 2016-05-09  Jakub Jelinek  
>
> * config/i386/i386.md (reload_noff_store, reload_noff_load, set_got,
> set_got_labelled, lwp_llwpcb, lwp_lwpval3, lwp_lwpins3):
> Remove constraints from expanders.
> * config/i386/sse.md (vec_interleave_high,
> vec_interleave_low, _vpermi2var3_maskz,
> _vpermt2var3_maskz): Likewise.
>
> --- gcc/config/i386/i386.md.jj  2016-05-09 11:38:36.0 +0200
> +++ gcc/config/i386/i386.md 2016-05-09 13:33:12.883238591 +0200
> @@ -1891,9 +1891,9 @@ (define_insn "*popfl1"
>  ;; Reload patterns to support multi-word load/store
>  ;; with non-offsetable address.
>  (define_expand "reload_noff_store"
> -  [(parallel [(match_operand 0 "memory_operand" "=m")
> -  (match_operand 1 "register_operand" "r")
> -  (match_operand:DI 2 "register_operand" "=")])]
> +  [(parallel [(match_operand 0 "memory_operand")
> + (match_operand 1 "register_operand")
> + (match_operand:DI 2 "register_operand")])]
>"TARGET_64BIT"
>  {
>rtx mem = operands[0];
> @@ -1907,9 +1907,9 @@ (define_expand "reload_noff_store"
>  })
>
>  (define_expand "reload_noff_load"
> -  [(parallel [(match_operand 0 "register_operand" "=r")
> -  (match_operand 1 "memory_operand" "m")
> -  (match_operand:DI 2 "register_operand" "=r")])]
> +  [(parallel [(match_operand 0 "register_operand")
> + (match_operand 1 "memory_operand")
> + (match_operand:DI 2 "register_operand")])]
>"TARGET_64BIT"
>  {
>rtx mem = operands[1];
> @@ -12522,7 +12522,7 @@ (define_expand "prologue"
>
>  (define_expand "set_got"
>[(parallel
> - [(set (match_operand:SI 0 "register_operand" "=r")
> + [(set (match_operand:SI 0 "register_operand")
>(unspec:SI [(const_int 0)] UNSPEC_SET_GOT))
>(clobber (reg:CC FLAGS_REG))])]
>"!TARGET_64BIT"
> @@ -12542,7 +12542,7 @@ (define_insn "*set_got"
>
>  (define_expand "set_got_labelled"
>[(parallel
> - [(set (match_operand:SI 0 "register_operand" "=r")
> + [(set (match_operand:SI 0 "register_operand")
>(unspec:SI [(label_ref (match_operand 1))]
>   UNSPEC_SET_GOT))
>(clobber (reg:CC FLAGS_REG))])]
> @@ -19041,7 +19041,7 @@ (define_insn "fnclex"
>  ;
>
>  (define_expand "lwp_llwpcb"
> -  [(unspec_volatile [(match_operand 0 "register_operand" "r")]
> +  [(unspec_volatile [(match_operand 0 "register_operand")]
> UNSPECV_LLWP_INTRINSIC)]
>"TARGET_LWP")
>
> @@ -19055,7 +19055,7 @@ (define_insn "*lwp_llwpcb1"
> (set_attr "length" "5")])
>
>  (define_expand "lwp_slwpcb"
> -  [(set (match_operand 0 "register_operand" "=r")
> +  [(set (match_operand 0 "register_operand")
> (unspec_volatile [(const_int 0)] UNSPECV_SLWP_INTRINSIC))]
>"TARGET_LWP"
>  {
> @@ -19079,9 +19079,9 @@ (define_insn "lwp_slwpcb"
> (set_attr "length" "5")])
>
>  (define_expand "lwp_lwpval3"
> -  [(unspec_volatile [(match_operand:SWI48 1 "register_operand" "r")
> -(match_operand:SI 2 "nonimmediate_operand" "rm")
> -(match_operand:SI 3 "const_int_operand" "i")]
> +  [(unspec_volatile [(match_operand:SWI48 1 "register_operand")
> +(match_operand:SI 2 "nonimmediate_operand")
> +(match_operand:SI 3 "const_int_operand")]
> UNSPECV_LWPVAL_INTRINSIC)]
>"TARGET_LWP"
>;; Avoid unused variable warning.
> @@ -19101,11 +19101,11 @@ (define_insn "*lwp_lwpval3_1"
>
>  (define_expand "lwp_lwpins3"
>[(set (reg:CCC FLAGS_REG)
> -   (unspec_volatile:CCC [(match_operand:SWI48 1 "register_operand" "r")
> - (match_operand:SI 2 "nonimmediate_operand" "rm")
> - 

Re: [RS6000] complex long double ABI_V4 fix

2016-05-09 Thread Michael Meissner
On Fri, May 06, 2016 at 03:54:43PM +0930, Alan Modra wrote:
> Revision 235792 regressed compat/scalar-by-value-6 for powerpc-linux
> -m32 due to accidentally changing the ABI.  By another historical
> accident, complex long double is stupidly passed in gprs for -m32.

Sorry about the breakage.  Thanks for digging into it.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH,rs6000] Add built-in support for new Power9 darn (deliver a random number) instruction

2016-05-09 Thread Peter Bergner
On Mon, 2016-05-09 at 12:35 -0500, Bill Schmidt wrote:
> On Mon, 2016-05-09 at 08:58 -0500, Segher Boessenkool wrote:
> > On Thu, May 05, 2016 at 10:26:01AM -0600, Kelvin Nilsen wrote:


> > Do we really want to #define short words like "darn"?  If this is already
> > set in stone, so be it.
> 
> I don't think we do, and in any case altivec.h would not be the place to
> do it.  darn is not a vector instruction.
> 
> For these, just having __builtin_darn* be the available interfaces will
> be fine.
> 

Agreed, I don't think we need a fancy short names for this builtin
which will be infrequently used.  The __builtin_darn name is enough.


Peter




Re: [PATCH], Add PowerPC ISA 3.0 min/max support

2016-05-09 Thread Michael Meissner
On Mon, May 09, 2016 at 09:31:43AM -0500, Segher Boessenkool wrote:
> On Thu, May 05, 2016 at 03:18:39PM -0400, Michael Meissner wrote:
> > At the present time, the code does not support comparisons involving >= and 
> > <=
> > unless the -ffast-math option is used. I hope eventually to support 
> > generating
> > these instructions without having -ffast-math used.
> > 
> > The underlying reason is when fast math is not used, we change the condition
> > from:
> > 
> > (ge:SI (reg:CCFP ) (const_int 0))
> > 
> > to:
> > 
> > (ior:SI (gt:SI (reg:CCFP ) (const_int 0))
> > (eq:SI (reg:CCFP ) (const_int 0)))
> > 
> > The machine independent portion of the compiler does not recognize this when
> > trying to generate conditional moves.
> > 
> > I would imagine the 'fix' is to generate GE/LE all of the time, and then 
> > have a
> > splitter that converts it to IOR of GT/EQ if it is not a conditional move 
> > with
> > ISA 3.0 instructions.
> 
> That sounds like a plan :-)

Sure, but at the moment it is lower priority to do it than all of the other
things that I'm doing at the moment.  Patches by other people are welcome.

> > -;; Return true if operand is MIN or MAX operator.
> > +;; Return true if operand is MIN or MAX operator.  Since this is only used 
> > to
> > +;; convert floating point MIN/MAX operations into FSEL on pre-vsx systems,
> > +;; don't include UMIN or UMAX.
> >  (define_predicate "min_max_operator"
> > -  (match_code "smin,smax,umin,umax"))
> > +  (match_code "smin,smax"))
> 
> Please name it signed_min_max_operator instead?
> 
> > --- gcc/config/rs6000/rs6000.c  
> > (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> > (revision 235831)
> > +++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
> > @@ -20534,6 +20534,12 @@ print_operand (FILE *file, rtx x, int co
> > "local dynamic TLS references");
> >return;
> >  
> > +case '@':
> > +  /* If -mpower9-minmax, use xsmaxcpdp instead of xsmaxdp.  */
> > +  if (TARGET_P9_MINMAX)
> > +   putc ('c', file);
> > +  return;
> 
> I don't think @ is very mnemonic, nor is this special enough for such
> a nice letter.

I don't care what punctuation letter is used, but it needs to be one. What do
you prefer?

> 
> Form looking at how it is used, it seems you can make it part of code_attr
> minmax (and give that a better name, minmax_fp or such)?

It is used to distinguish between generating

XSMAXDP on power7 with -ffast-math
and XSMAXCDPon power9 with/without -ffast-math

I would prefer not to have to change this to C code, hence the use of a
punctuation print operand code. But if you insist, I can just do it with if's.

> > +  rs6000_emit_minmax (dest, (max_p) ? SMAX : SMIN, op0, op1);
> 
> Superfluous parentheses.
> 
> > +rs6000_emit_power9_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
> 
> Maybe put some "fp" in the name?  For "minmax" as well.

Sigh, ok.

> > +  if (swap_p)
> > +compare_rtx = gen_rtx_fmt_ee (code, CCFPmode, op1, op0);
> > +  else
> > +compare_rtx = gen_rtx_fmt_ee (code, CCFPmode, op0, op1);
> 
> if (swap_p)
>   std::swap (op0, op1);

I'll look into it.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add PowerPC ISA 3.0 vector d-form addressing

2016-05-09 Thread Michael Meissner
On Mon, May 09, 2016 at 08:11:54AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Thu, May 05, 2016 at 02:05:19PM -0400, Michael Meissner wrote:
> > > > With this patch, I enable -mlra if the user did not specify either 
> > > > -mlra or
> > > > -mno-lra on the command line, and -mcpu=power9 or -mpower9-dform-vector 
> > > > were
> > > > used. I also enabled -mvsx-timode if LRA was used, which also is a 
> > > > RELOAD
> > > > issue, that works with LRA.
> > > 
> > > I don't like enabling LRA if the user didn't ask for it; it is a bit too
> > > surprising.  What do you do if there is -mno-lra explicitly?  You can just
> > > do the same if no-lra is implicit?
> > 
> > Ok.
> 
> You didn't however change this afaics?

The patch kept reload as the default. You have to explicitly enable LRA to get
vector d-forms by default with -mcpu=power9.

> 
> > > > --- gcc/config/rs6000/rs6000.opt
> > > > (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> > > > (revision 235831)
> > > > +++ gcc/config/rs6000/rs6000.opt(.../gcc/config/rs6000) 
> > > > (working copy)
> > > > @@ -470,8 +470,8 @@ Target RejectNegative Joined UInteger Va
> > > >  -mlong-double-  Specify size of long double (64 or 128 bits).
> > > >  
> > > >  mlra
> > > > -Target Report Var(rs6000_lra_flag) Init(0) Save
> > > > -Use LRA instead of reload.
> > > > +Target Undocumented Mask(LRA) Var(rs6000_isa_flags)
> > > > +Use the LRA register allocator instead of the reload register 
> > > > allocator.
> > > 
> > > It wasn't "undocumented" before?  Why the change to a mask bit btw?
> > 
> > It was always meant to be undocumented, but I changed to be similar to
> > before. I am trying to change all of the random switches that set a word to 
> > be
> > an option mask, so I made that part of the change in these next patches.
> 
> I agree it should be undocumented because hopefully one day all reload
> will not exist at all anymore.  OTOH, all other archs with an -mlra
> switch do not have it hidden, so we might as well follow suit there.

I will write up some documentation.

> > I did remove setting it for -mcpu=power9.
> 
> It doesn't look like it?  Please check.
> 
> > @@ -94,6 +95,7 @@
> >  | OPTION_MASK_FPRND\
> >  | OPTION_MASK_HTM  \
> >  | OPTION_MASK_ISEL \
> > +| OPTION_MASK_LRA  \
> >  | OPTION_MASK_MFCRF\
> >  | OPTION_MASK_MFPGPR   \
> >  | OPTION_MASK_MODULO   \

That is POWERPC_MASKS, which is the mask of all option bits that COULD be set
by -mcpu= options. But none of the -mcpu= set it any more (the
previous patch did set it in ISA_3_0_MASKS_SERVER, but it doesn't do it any
more. In retrospect, when I created the option masks, I should have used 2
separate words, one for things like -m32 that can never be changed, and the
other for all of the normal bits, and not need POWERPC_MASKS. But in general, I
always add things to POWERPC_MASKS unless it explictly should not be.

> > > > +mpower9-dform-scalar
> > > > +Target Report Mask(P9_DFORM_SCALAR) Var(rs6000_isa_flags)
> > > > +Use/do not use scalar register+offset memory instructions added in ISA 
> > > > 3.0.
> > > > +
> > > > +mpower9-dform-vector
> > > > +Target Report Mask(P9_DFORM_VECTOR) Var(rs6000_isa_flags)
> > > > +Use/do not use vector register+offset memory instructions added in ISA 
> > > > 3.0.
> > > > +
> > > >  mpower9-dform
> > > > -Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
> > > > -Use/do not use vector and scalar instructions added in ISA 3.0.
> > > > +Target Report Var(TARGET_P9_DFORM_BOTH) Init(-1) Save
> > > > +Use/do not use register+offset memory instructions added in ISA 3.0.
> > > 
> > > These should probably all be undocumented, though (they're not something
> > > users should use).
> > 
> > I will make -mpower9-dform public (which I thought it was, but evidently I
> > missed adding the documentation for GCC 6), but I will make the -scalar and
> > -vector forms private.
> 
> You think this is something users are expected to twiddle?  Okay then.

I don't expect users to normally twiddle it, but there are times in either bug
fixing and/or extreme benchmarking where they do.

> > [gcc]
> > 2016-05-05  Michael Meissner  
> > 
> > * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Use
> > -mpower9-dform-scalar instead of -mpower9-dform. Add note not to
> > include -mpower9-dform-vector until we switch over to LRA.
> 
> Thanks for the better changelog, much appreciated.  Two spaces after
> a full stop though.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 

Re: [PATCH] Load external function address via GOT slot

2016-05-09 Thread H.J. Lu
On Fri, Apr 22, 2016 at 6:03 AM, Uros Bizjak  wrote:
> On Fri, Apr 22, 2016 at 2:54 PM, H.J. Lu  wrote:
>> For -fno-plt, we load the external function address via the GOT slot
>> so that linker won't create an PLT entry for extern function address.
>>
>> Tested on x86-64. I also built GCC with -fno-plt.  It removes 99% PLT
>> entries.  OK for trunk?
>>
>> H.J.
>> --
>> gcc/
>>
>> PR target/pr67400
>> * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
>> * config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
>> (ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL for
>> ix86_force_load_from_GOT_p returns true.
>> (ix86_print_operand_address): Support UNSPEC_GOTPCREL if
>> ix86_force_load_from_GOT_p returns true.
>> (ix86_expand_move): Load the external function address via the
>> GOT slot if ix86_force_load_from_GOT_p returns true.
>> * config/i386/predicates.md (x86_64_immediate_operand): Return
>> false if ix86_force_load_from_GOT_p returns true.
>>
>> gcc/testsuite/
>>
>> PR target/pr67400
>> * gcc.target/i386/pr67400-1.c: New test.
>> * gcc.target/i386/pr67400-2.c: Likewise.
>> * gcc.target/i386/pr67400-3.c: Likewise.
>> * gcc.target/i386/pr67400-4.c: Likewise.
>
> Please get someone that knows this linker magic to review the
> functionality first. Maybe Jakub can help?
>

Hi Jakub,

Can you review this patch?

Thanks.

-- 
H.J.
From 3c81d37bb422f9856b373c63dfc6e19e035a7714 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 28 Aug 2015 19:14:49 -0700
Subject: [PATCH] Load external function address via GOT slot

For -fno-plt, we load the external function address via the GOT slot
so that linker won't create an PLT entry for extern function address.

gcc/

	PR target/67400
	* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
	* config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
	(ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL if
	ix86_force_load_from_GOT_p returns true.
	(ix86_print_operand_address): Support UNSPEC_GOTPCREL if
	ix86_force_load_from_GOT_p returns true.
	(ix86_expand_move): Load the external function address via the
	GOT slot if ix86_force_load_from_GOT_p returns true.
	* config/i386/predicates.md (x86_64_immediate_operand): Return
	false if ix86_force_load_from_GOT_p returns true.

gcc/testsuite/

	PR target/67400
	* gcc.target/i386/pr67400-1.c: New test.
	* gcc.target/i386/pr67400-2.c: Likewise.
	* gcc.target/i386/pr67400-3.c: Likewise.
	* gcc.target/i386/pr67400-4.c: Likewise.
---
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.c| 42 +++
 gcc/config/i386/predicates.md |  4 +++
 gcc/testsuite/gcc.target/i386/pr67400-1.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr67400-2.c | 14 +++
 gcc/testsuite/gcc.target/i386/pr67400-3.c | 16 
 gcc/testsuite/gcc.target/i386/pr67400-4.c | 13 ++
 7 files changed, 103 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-4.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 447f67e..99775cb 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -70,6 +70,7 @@ extern bool ix86_expand_set_or_movmem (rtx, rtx, rtx, rtx, rtx, rtx,
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
+extern bool ix86_force_load_from_GOT_p (rtx);
 extern void print_reg (rtx, int, FILE*);
 extern void ix86_print_operand (FILE *, rtx, int);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 05476f3..6d73651 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14833,6 +14833,24 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
   return true;
 }
 
+/* True if operand X should be loaded from GOT.  */
+
+bool
+ix86_force_load_from_GOT_p (rtx x)
+{
+  /* External function symbol should be loaded via the GOT slot for
+ -fno-plt.  */
+  return (!flag_plt
+	  && !flag_pic
+	  && ix86_cmodel != CM_LARGE
+	  && TARGET_64BIT
+	  && !TARGET_PECOFF
+	  && !TARGET_MACHO
+	  && GET_CODE (x) == SYMBOL_REF
+	  && SYMBOL_REF_FUNCTION_P (x)
+	  && !SYMBOL_REF_LOCAL_P (x));
+}
+
 /* Determine if it's legal to put X into the constant pool.  This
is not possible for the address of thread-local symbols, which
is checked above.  */
@@ -15213,6 +15231,10 @@ ix86_legitimate_address_p (machine_mode, rtx addr, bool strict)
 	return false;
 
 	  case UNSPEC_GOTPCREL:
+	gcc_assert (flag_pic
+			|| ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)));
+	  

Re: [PATCH,rs6000] Add built-in support for new Power9 darn (deliver a random number) instruction

2016-05-09 Thread Bill Schmidt
On Mon, 2016-05-09 at 08:58 -0500, Segher Boessenkool wrote:
> Hi Kelvin,
> 
> On Thu, May 05, 2016 at 10:26:01AM -0600, Kelvin Nilsen wrote:
> > (UNSPEC_DARN_32): New usnpec constant.
> 
> Typo.
> 
> > ("darn_32"): New instruction.
> 
> We don't normally use quotes for insn names.
> 
> > (rs6000_builtin_mask_calculate): Add in the RS6000_BTM_MODULO and
> > RS6000_BTM_64BIT flags to the returned mask, depending on
> > configuration. 
> 
> Trailing space (many, in this changelog).
> 
> > --- gcc/config/rs6000/altivec.h (revision 235884)
> > +++ gcc/config/rs6000/altivec.h (working copy)
> > @@ -382,6 +382,11 @@
> >  #define vec_vsubuqm __builtin_vec_vsubuqm
> >  #define vec_vupkhsw __builtin_vec_vupkhsw
> >  #define vec_vupklsw __builtin_vec_vupklsw
> > +
> > +/* Non-Vector additions added in ISA 3.0. */
> > +#define darn __builtin_darn
> > +#define darn_32 __builtin_darn_32
> > +#define darn_raw __builtin_darn_raw
> >  #endif
> 
> Do we really want to #define short words like "darn"?  If this is already
> set in stone, so be it.

I don't think we do, and in any case altivec.h would not be the place to
do it.  darn is not a vector instruction.

For these, just having __builtin_darn* be the available interfaces will
be fine.

My two cents,
Bill

> 
> > +(define_insn "darn_32"
> > +  [(set (match_operand:SI 0 "register_operand" "")
> 
> The constraint should be "r" I suppose?
> 
> > +(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
> > +  "TARGET_MODULO"
> > +  {
> > + return "darn %0,0";
> > +  }
> > +  [(set_attr "type" "add")  
> 
> Trailing spaces.  "add" isn't the correct type; use "integer" if there
> is no better type.
> 
> > +   (set_attr "length" "4")])
> 
> That is the default, no need to mention it.  Most insns are implicitly
> length 4.
> 
> > +/* Miscellaneous builtins for instructions added in ISA 3.0.  These
> > +   instructions don't require either the DFP or VSX options, just the 
> > basic 
> 
> Trailing space.
> 
> > @@ -3634,6 +3639,8 @@ rs6000_builtin_mask_calculate (void)
> >   | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL  : 0)
> >   | ((TARGET_P8_VECTOR) ? RS6000_BTM_P8_VECTOR : 0)
> >   | ((TARGET_P9_VECTOR) ? RS6000_BTM_P9_VECTOR : 0)
> > + | ((TARGET_MODULO)? RS6000_BTM_MODULO: 0)
> > + | ((TARGET_64BIT) ? RS6000_BTM_64BIT: 0)
> 
> Missing space?
> 
> > +  /* RS6000_BTC_SPECIAL represents no-operand operators.  */
> >gcc_assert (attr == RS6000_BTC_UNARY
> >   || attr == RS6000_BTC_BINARY
> > - || attr == RS6000_BTC_TERNARY);
> > -
> > + || attr == RS6000_BTC_TERNARY
> > + || attr == RS6000_BTC_SPECIAL);
> > +  
> 
> Why SPECIAL and not NULLARY or such?
> 
> > +  if (rs6000_overloaded_builtin_p (d->code))
> > +   {
> > + if (! (type = opaque_ftype_opaque))
> > +   type = opaque_ftype_opaque
> > + = build_function_type_list (opaque_V4SI_type_node,
> > + NULL_TREE);
> > +   }
> 
> Eww.
> 
>   if (!opaque_ftype_opaque)
> opaque_ftype_opaque = build_function_type_list (...);
>   type = opaque_ftype_opaque;
> 
> > + enum insn_code icode = d->icode;
> > + if (d->name == 0)
> > +   {
> > + if (TARGET_DEBUG_BUILTIN)
> > +   fprintf (stderr, "rs6000_builtin, bdesc_0arg[%ld] no name\n",
> > +(long unsigned)i);
> 
> unsigned is %u, not %d.  Space after cast.
> 
> Cheers,
> 
> 
> Segher
> 




Re: [ARM] mno-pic-data-is-text-relative & msingle-pic-base

2016-05-09 Thread Nathan Sidwell

Joey,

This patch will do what you intend it to do. However, I am not sure in part 
related to VxWorks. The logic behind this patch is that 
-mno-pic-data-is-text-relative should enable -msingle-pic-base because 
otherwise it will be useless. The logic itself is orthogonal to OS. So I am not 
convinced the 'else if' shouldn't be just 'if'. It should not change VxWorks 
behaviour if VxWorks enables -msingle-pic-base explicitly. Or otherwise there 
is at least one use case that -mno-pic-data-is-text-relative can be used 
without -msingle-pic-base, which breaks the logic that this whole patch stands 
on.


VxWorks has two modes of code generation -- kernel and RTP.  RTPs don't have a 
fixed mapping between code and data (and use special sequence to initialize the 
PIC register, using vxworks-specific relocs).  Kernel mode doesn't support PIC 
code generation -- see config/vxworks.c


So I don't think there's a problem.

nathan


RE: [ARM] mno-pic-data-is-text-relative & msingle-pic-base

2016-05-09 Thread Joey Ye
Nathan,

This patch will do what you intend it to do. However, I am not sure in part 
related to VxWorks. The logic behind this patch is that 
-mno-pic-data-is-text-relative should enable -msingle-pic-base because 
otherwise it will be useless. The logic itself is orthogonal to OS. So I am not 
convinced the 'else if' shouldn't be just 'if'. It should not change VxWorks 
behaviour if VxWorks enables -msingle-pic-base explicitly. Or otherwise there 
is at least one use case that -mno-pic-data-is-text-relative can be used 
without -msingle-pic-base, which breaks the logic that this whole patch stands 
on.

Thanks,
Joey

> -Original Message-
> From: Nathan Sidwell [mailto:nathanmsidw...@gmail.com] On Behalf Of
> Nathan Sidwell
> Sent: 09 May 2016 15:07
> To: Richard Earnshaw; GCC Patches
> Cc: Joey Ye
> Subject: [ARM] mno-pic-data-is-text-relative & msingle-pic-base
> 
> This patch comes from an off-list conversation between Joey & me.  The
> context is with RTOSs not all singing & dancing dynamic objects and OSes.
> 
> currently, the documentation for -mno-pic-data-is-text-relative (-mno-PDITR)
> says 'Assume that each data segments are relative to text segment at load
> time.
>   Therefore, it permits addressing data using PC-relative operations.
>   This option is on by default for targets other than VxWorks RTP.'
> 
> However, if you use just this option, you still end up with a pic-register 
> init
> sequence that  presumes a fixed mapping.  That's a surprise.  Joey tells me
> its expected use is with -msingle-pic-base (-mSPB), which reserves a global
> register to point at the (single) GOT.  That's what I had expected the -mno-
> PDITR option to have implied.
> 
> Apparently there are legitimate reasons one might want the -mno-PDITR
> behaviour without -mSPB.  I don't know what those are, perhaps Joey could
> clarify?
> 
> Anyway, IMHO that is the rare case and the more common case is that one
> would want to have -mnoPDITR imply -mSPB. (The reverse probably doesn't
> apply.)
> 
> This patch does 3 things.
> 1) have -mno-PDITR imply -mSPB, unless one has explictly provided -m[no-
> ]SPB.
> 2) clarified  the -m[no-]PDITR documentation.
> 3) Added some testcases -- there didn't appear to be any.
> 
> ok?
> 
> nathan



[PATCH] Improve sse2_loadld

2016-05-09 Thread Jakub Jelinek
Hi!

I hope this pattern actually shouldn't be used for AVX512*, because
vpinsr should match instead, but just in case it doesn't, all the insns
involved are in AVX512F.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-09  Jakub Jelinek  

* config/i386/sse.md (sse2_loadld): Use v instead of x
constraint in alternatives 0,1,4.

--- gcc/config/i386/sse.md.jj   2016-05-09 14:15:50.0 +0200
+++ gcc/config/i386/sse.md  2016-05-09 15:08:36.034622080 +0200
@@ -13013,11 +13013,11 @@ (define_expand "sse2_loadd"
   "operands[2] = CONST0_RTX (V4SImode);")
 
 (define_insn "sse2_loadld"
-  [(set (match_operand:V4SI 0 "register_operand"   "=x,Yi,x,x,x")
+  [(set (match_operand:V4SI 0 "register_operand"   "=v,Yi,x,x,v")
(vec_merge:V4SI
  (vec_duplicate:V4SI
-   (match_operand:SI 2 "nonimmediate_operand" "m ,r ,m,x,x"))
- (match_operand:V4SI 1 "reg_or_0_operand" "C ,C ,C,0,x")
+   (match_operand:SI 2 "nonimmediate_operand" "m ,r ,m,x,v"))
+ (match_operand:V4SI 1 "reg_or_0_operand" "C ,C ,C,0,v")
  (const_int 1)))]
   "TARGET_SSE"
   "@
@@ -13028,7 +13028,7 @@ (define_insn "sse2_loadld"
vmovss\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "sse2,sse2,noavx,noavx,avx")
(set_attr "type" "ssemov")
-   (set_attr "prefix" "maybe_vex,maybe_vex,orig,orig,vex")
+   (set_attr "prefix" "maybe_vex,maybe_vex,orig,orig,maybe_evex")
(set_attr "mode" "TI,TI,V4SF,SF,SF")])
 
 ;; QI and HI modes handled by pextr patterns.

Jakub


[PATCH] Improve XMM16-XMM31 handling in vpinsr*

2016-05-09 Thread Jakub Jelinek
Hi!

vpinsr{b,w} are AVX512BW, vpinsr{d,q} are AVX512DQ.
This patch makes us use v constraint instead of x in those
cases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-09  Jakub Jelinek  

* config/i386/sse.md (pinsr_evex_isa): New mode attr.
(_pinsr): Add 2 alternatives with
v constraints instead of x and  isa attribute.

* gcc.target/i386/avx512bw-vpinsr-1.c: New test.
* gcc.target/i386/avx512dq-vpinsr-1.c: New test.
* gcc.target/i386/avx512vl-vpinsr-1.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-09 13:31:21.0 +0200
+++ gcc/config/i386/sse.md  2016-05-09 14:15:50.241028739 +0200
@@ -12036,13 +12036,17 @@ (define_mode_attr sse2p4_1
   [(V16QI "sse4_1") (V8HI "sse2")
(V4SI "sse4_1") (V2DI "sse4_1")])
 
+(define_mode_attr pinsr_evex_isa
+  [(V16QI "avx512bw") (V8HI "avx512bw")
+   (V4SI "avx512dq") (V2DI "avx512dq")])
+
 ;; sse4_1_pinsrd must come before sse2_loadld since it is preferred.
 (define_insn "_pinsr"
-  [(set (match_operand:PINSR_MODE 0 "register_operand" "=x,x,x,x")
+  [(set (match_operand:PINSR_MODE 0 "register_operand" "=x,x,x,x,v,v")
(vec_merge:PINSR_MODE
  (vec_duplicate:PINSR_MODE
-   (match_operand: 2 "nonimmediate_operand" "r,m,r,m"))
- (match_operand:PINSR_MODE 1 "register_operand" "0,0,x,x")
+   (match_operand: 2 "nonimmediate_operand" 
"r,m,r,m,r,m"))
+ (match_operand:PINSR_MODE 1 "register_operand" "0,0,x,x,v,v")
  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE2
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
@@ -12059,16 +12063,18 @@ (define_insn "_pinsr\t{%3, %2, %0|%0, %2, %3}";
 case 2:
+case 4:
   if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
return "vpinsr\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
   /* FALLTHRU */
 case 3:
+case 5:
   return "vpinsr\t{%3, %2, %1, %0|%0, %1, %2, %3}";
 default:
   gcc_unreachable ();
 }
 }
-  [(set_attr "isa" "noavx,noavx,avx,avx")
+  [(set_attr "isa" "noavx,noavx,avx,avx,,")
(set_attr "type" "sselog")
(set (attr "prefix_rex")
  (if_then_else
@@ -12089,7 +12095,7 @@ (define_insn "_pinsr_vinsert_mask"
--- gcc/testsuite/gcc.target/i386/avx512bw-vpinsr-1.c.jj2016-05-09 
14:36:49.618145755 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-vpinsr-1.c   2016-05-09 
14:49:57.830574216 +0200
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512bw" } */
+
+typedef char v16qi __attribute__((vector_size (16)));
+typedef short v8hi __attribute__((vector_size (16)));
+
+v16qi
+f1 (v16qi a, char b)
+{
+  register v16qi c __asm ("xmm16") = a;
+  asm volatile ("" : "+v" (c));
+  v16qi d = c;
+  ((char *) )[3] = b;
+  c = d;
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler "vpinsrb\[^\n\r]*xmm16" } } */
+
+v8hi
+f2 (v8hi a, short b)
+{
+  register v8hi c __asm ("xmm16") = a;
+  asm volatile ("" : "+v" (c));
+  v8hi d = c;
+  ((short *) )[3] = b;
+  c = d;
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler "vpinsrw\[^\n\r]*xmm16" } } */
--- gcc/testsuite/gcc.target/i386/avx512dq-vpinsr-1.c.jj2016-05-09 
14:39:15.588184128 +0200
+++ gcc/testsuite/gcc.target/i386/avx512dq-vpinsr-1.c   2016-05-09 
14:48:38.0 +0200
@@ -0,0 +1,33 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mavx512dq" } */
+
+typedef int v4si __attribute__((vector_size (16)));
+typedef long long v2di __attribute__((vector_size (16)));
+
+v4si
+f1 (v4si a, int b)
+{
+  register v4si c __asm ("xmm16") = a;
+  asm volatile ("" : "+v" (c));
+  v4si d = c;
+  ((int *) )[3] = b;
+  c = d;
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler "vpinsrd\[^\n\r]*xmm16" } } */
+
+v2di
+f2 (v2di a, long long b)
+{
+  register v2di c __asm ("xmm16") = a;
+  asm volatile ("" : "+v" (c));
+  v2di d = c;
+  ((long long *) )[1] = b;
+  c = d;
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler "vpinsrq\[^\n\r]*xmm16" } } */
--- gcc/testsuite/gcc.target/i386/avx512vl-vpinsr-1.c.jj2016-05-09 
14:41:21.195496147 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-vpinsr-1.c   2016-05-09 
14:50:32.188114909 +0200
@@ -0,0 +1,63 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl -mno-avx512bw -mno-avx512dq" } */
+
+typedef char v16qi __attribute__((vector_size (16)));
+typedef short v8hi __attribute__((vector_size (16)));
+typedef int v4si __attribute__((vector_size (16)));
+typedef long long v2di __attribute__((vector_size (16)));
+
+v16qi
+f1 (v16qi a, char b)
+{
+  register v16qi c __asm ("xmm16") = a;
+  asm volatile ("" : "+v" (c));
+  v16qi d = c;
+  ((char *) )[3] = b;
+  c = d;
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler-not "vpinsrb\[^\n\r]*xmm16" } } 

[PATCH] vec_extract XMM16-XMM17 improvements

2016-05-09 Thread Jakub Jelinek
Hi!

vpextr{b,w} are in AVX512BW, so is vpsrldq, and vpextr{d,q} are in
AVX512DQ.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-09  Jakub Jelinek  

* config/i386/i386.md (isa): Add x64_avx512dq, enable if
TARGET_64BIT && TARGET_AVX512DQ.
* config/i386/sse.md (*vec_extract): Add avx512bw alternatives.
(*vec_extract_zext): Add avx512bw alternative.
(*vec_extract_0, *vec_extractv4si_0_zext,
*vec_extractv2di_0_sse): Use v constraint instead of x constraint.
(*vec_extractv4si): Add avx512dq and avx512bw alternatives.
(*vec_extractv4si_zext): Add avx512dq alternative.
(*vec_extractv2di_1): Add x64_avx512dq and avx512bw alternatives,
use v instead of x constraint in other alternatives where possible.

* gcc.target/i386/avx512bw-vpextr-1.c: New test.
* gcc.target/i386/avx512dq-vpextr-1.c: New test.

--- gcc/config/i386/i386.md.jj  2016-05-09 13:33:12.0 +0200
+++ gcc/config/i386/i386.md 2016-05-09 16:32:32.219961730 +0200
@@ -796,7 +796,7 @@ (define_attr "isa" "base,x64,x64_sse4,x6
sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
-   avx512vl,noavx512vl"
+   avx512vl,noavx512vl,x64_avx512dq"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -807,6 +807,8 @@ (define_attr "enabled" ""
   (symbol_ref "TARGET_64BIT && TARGET_SSE4_1 && !TARGET_AVX")
 (eq_attr "isa" "x64_avx")
   (symbol_ref "TARGET_64BIT && TARGET_AVX")
+(eq_attr "isa" "x64_avx512dq")
+  (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
 (eq_attr "isa" "nox64") (symbol_ref "!TARGET_64BIT")
 (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
 (eq_attr "isa" "sse2_noavx")
--- gcc/config/i386/sse.md.jj   2016-05-09 15:08:36.0 +0200
+++ gcc/config/i386/sse.md  2016-05-09 16:43:54.213638239 +0200
@@ -13036,39 +13036,44 @@ (define_mode_iterator PEXTR_MODE12
   [(V16QI "TARGET_SSE4_1") V8HI])
 
 (define_insn "*vec_extract"
-  [(set (match_operand: 0 "register_sse4nonimm_operand" "=r,m")
+  [(set (match_operand: 0 "register_sse4nonimm_operand" 
"=r,m,r,m")
(vec_select:
- (match_operand:PEXTR_MODE12 1 "register_operand" "x,x")
+ (match_operand:PEXTR_MODE12 1 "register_operand" "x,x,v,v")
  (parallel
[(match_operand:SI 2 "const_0_to__operand")])))]
   "TARGET_SSE2"
   "@
%vpextr\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextr\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,sse4")
+   %vpextr\t{%2, %1, %0|%0, %1, %2}
+   vpextr\t{%2, %1, %k0|%k0, %1, %2}
+   vpextr\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,sse4,avx512bw,avx512bw")
(set_attr "type" "sselog1")
(set_attr "prefix_data16" "1")
(set (attr "prefix_extra")
  (if_then_else
-   (and (eq_attr "alternative" "0")
+   (and (eq_attr "alternative" "0,2")
(eq (const_string "mode") (const_string "V8HImode")))
(const_string "*")
(const_string "1")))
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "maybe_vex,maybe_vex,evex,evex")
(set_attr "mode" "TI")])
 
 (define_insn "*vec_extract_zext"
-  [(set (match_operand:SWI48 0 "register_operand" "=r")
+  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
(zero_extend:SWI48
  (vec_select:
-   (match_operand:PEXTR_MODE12 1 "register_operand" "x")
+   (match_operand:PEXTR_MODE12 1 "register_operand" "x,v")
(parallel
  [(match_operand:SI 2
"const_0_to__operand")]]
   "TARGET_SSE2"
-  "%vpextr\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
+  "@
+   %vpextr\t{%2, %1, %k0|%k0, %1, %2}
+   vpextr\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,avx512bw")
+   (set_attr "type" "sselog1")
(set_attr "prefix_data16" "1")
(set (attr "prefix_extra")
  (if_then_else
@@ -13089,9 +13094,9 @@ (define_insn "*vec_extract_mem"
   "#")
 
 (define_insn "*vec_extract_0"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r ,r,x ,m")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r ,r,v ,m")
(vec_select:SWI48
- (match_operand: 1 "nonimmediate_operand" "mYj,x,xm,x")
+ (match_operand: 1 "nonimmediate_operand" "mYj,v,vm,v")
  (parallel [(const_int 0)])))]
   "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
@@ -13101,7 +13106,7 @@ (define_insn_and_split "*vec_extractv4si
   [(set (match_operand:DI 0 "register_operand" "=r")
(zero_extend:DI
  (vec_select:SI
-   (match_operand:V4SI 1 "register_operand" "x")
+   (match_operand:V4SI 1 "register_operand" "v")
(parallel [(const_int 0)]]
   

[PATCH] Remove constraints from further i386 define_expand patterns

2016-05-09 Thread Jakub Jelinek
Hi!

I believe this cleans up all remaining define_expands (have looked
at tmp-mddump.md with sed picking up only define_expand patterns in there
and have been looking for any constraints and none were left).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-09  Jakub Jelinek  

* config/i386/i386.md (reload_noff_store, reload_noff_load, set_got,
set_got_labelled, lwp_llwpcb, lwp_lwpval3, lwp_lwpins3):
Remove constraints from expanders.
* config/i386/sse.md (vec_interleave_high,
vec_interleave_low, _vpermi2var3_maskz,
_vpermt2var3_maskz): Likewise.

--- gcc/config/i386/i386.md.jj  2016-05-09 11:38:36.0 +0200
+++ gcc/config/i386/i386.md 2016-05-09 13:33:12.883238591 +0200
@@ -1891,9 +1891,9 @@ (define_insn "*popfl1"
 ;; Reload patterns to support multi-word load/store
 ;; with non-offsetable address.
 (define_expand "reload_noff_store"
-  [(parallel [(match_operand 0 "memory_operand" "=m")
-  (match_operand 1 "register_operand" "r")
-  (match_operand:DI 2 "register_operand" "=")])]
+  [(parallel [(match_operand 0 "memory_operand")
+ (match_operand 1 "register_operand")
+ (match_operand:DI 2 "register_operand")])]
   "TARGET_64BIT"
 {
   rtx mem = operands[0];
@@ -1907,9 +1907,9 @@ (define_expand "reload_noff_store"
 })
 
 (define_expand "reload_noff_load"
-  [(parallel [(match_operand 0 "register_operand" "=r")
-  (match_operand 1 "memory_operand" "m")
-  (match_operand:DI 2 "register_operand" "=r")])]
+  [(parallel [(match_operand 0 "register_operand")
+ (match_operand 1 "memory_operand")
+ (match_operand:DI 2 "register_operand")])]
   "TARGET_64BIT"
 {
   rtx mem = operands[1];
@@ -12522,7 +12522,7 @@ (define_expand "prologue"
 
 (define_expand "set_got"
   [(parallel
- [(set (match_operand:SI 0 "register_operand" "=r")
+ [(set (match_operand:SI 0 "register_operand")
   (unspec:SI [(const_int 0)] UNSPEC_SET_GOT))
   (clobber (reg:CC FLAGS_REG))])]
   "!TARGET_64BIT"
@@ -12542,7 +12542,7 @@ (define_insn "*set_got"
 
 (define_expand "set_got_labelled"
   [(parallel
- [(set (match_operand:SI 0 "register_operand" "=r")
+ [(set (match_operand:SI 0 "register_operand")
   (unspec:SI [(label_ref (match_operand 1))]
  UNSPEC_SET_GOT))
   (clobber (reg:CC FLAGS_REG))])]
@@ -19041,7 +19041,7 @@ (define_insn "fnclex"
 ;
 
 (define_expand "lwp_llwpcb"
-  [(unspec_volatile [(match_operand 0 "register_operand" "r")]
+  [(unspec_volatile [(match_operand 0 "register_operand")]
UNSPECV_LLWP_INTRINSIC)]
   "TARGET_LWP")
 
@@ -19055,7 +19055,7 @@ (define_insn "*lwp_llwpcb1"
(set_attr "length" "5")])
 
 (define_expand "lwp_slwpcb"
-  [(set (match_operand 0 "register_operand" "=r")
+  [(set (match_operand 0 "register_operand")
(unspec_volatile [(const_int 0)] UNSPECV_SLWP_INTRINSIC))]
   "TARGET_LWP"
 {
@@ -19079,9 +19079,9 @@ (define_insn "lwp_slwpcb"
(set_attr "length" "5")])
 
 (define_expand "lwp_lwpval3"
-  [(unspec_volatile [(match_operand:SWI48 1 "register_operand" "r")
-(match_operand:SI 2 "nonimmediate_operand" "rm")
-(match_operand:SI 3 "const_int_operand" "i")]
+  [(unspec_volatile [(match_operand:SWI48 1 "register_operand")
+(match_operand:SI 2 "nonimmediate_operand")
+(match_operand:SI 3 "const_int_operand")]
UNSPECV_LWPVAL_INTRINSIC)]
   "TARGET_LWP"
   ;; Avoid unused variable warning.
@@ -19101,11 +19101,11 @@ (define_insn "*lwp_lwpval3_1"
 
 (define_expand "lwp_lwpins3"
   [(set (reg:CCC FLAGS_REG)
-   (unspec_volatile:CCC [(match_operand:SWI48 1 "register_operand" "r")
- (match_operand:SI 2 "nonimmediate_operand" "rm")
- (match_operand:SI 3 "const_int_operand" "i")]
+   (unspec_volatile:CCC [(match_operand:SWI48 1 "register_operand")
+ (match_operand:SI 2 "nonimmediate_operand")
+ (match_operand:SI 3 "const_int_operand")]
 UNSPECV_LWPINS_INTRINSIC))
-   (set (match_operand:QI 0 "nonimmediate_operand" "=qm")
+   (set (match_operand:QI 0 "nonimmediate_operand")
(eq:QI (reg:CCC FLAGS_REG) (const_int 0)))]
   "TARGET_LWP")
 
--- gcc/config/i386/sse.md.jj   2016-05-09 13:15:55.0 +0200
+++ gcc/config/i386/sse.md  2016-05-09 13:31:21.086735811 +0200
@@ -11991,9 +11991,9 @@ (define_insn "vec_interleave_lowv4si"
-  [(match_operand:VI_256 0 "register_operand" "=x")
-   (match_operand:VI_256 1 "register_operand" "x")
-   (match_operand:VI_256 2 "nonimmediate_operand" "xm")]
+  [(match_operand:VI_256 0 "register_operand")
+   (match_operand:VI_256 1 "register_operand")
+   (match_operand:VI_256 2 

[PATCH] vinsertps XMM16-XMM31 fixes

2016-05-09 Thread Jakub Jelinek
Hi!

The testcases show that we emit AVX512BW instructions even when
AVX512BW is disabled.  Additionally, two of the 4 patterns were using
weirdo constraint for the output (x instead of v, while they used v for
input).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?

2016-05-09  Jakub Jelinek  

PR target/71019
* config/i386/sse.md (_packssdw,
_packusdw): Make sure EVEX encoded insn
is not emitted unless TARGET_AVX512BW.
(_packuswb, _packsswb):
Likewise.  For TARGET_AVX512BW, use "=v" constraint instead of "=x"
for the result operand.

* gcc.target/i386/avx512vl-pack-1.c: New test.
* gcc.target/i386/avx512vl-pack-2.c: New test.
* gcc.target/i386/avx512bw-pack-2.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-09 11:38:36.0 +0200
+++ gcc/config/i386/sse.md  2016-05-09 12:34:58.839865460 +0200
@@ -11500,54 +11500,57 @@ (define_expand "vec_pack_trunc_"
 })
 
 (define_insn "_packsswb"
-  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
(vec_concat:VI1_AVX512
  (ss_truncate:
-   (match_operand: 1 "register_operand" "0,v"))
+   (match_operand: 1 "register_operand" "0,x,v"))
  (ss_truncate:
-   (match_operand: 2 "vector_operand" "xBm,vm"]
+   (match_operand: 2 "vector_operand" "xBm,xm,vm"]
   "TARGET_SSE2 &&  && "
   "@
packsswb\t{%2, %0|%0, %2}
+   vpacksswb\t{%2, %1, %0|%0, %1, %2}
vpacksswb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix" "orig,,evex")
(set_attr "mode" "")])
 
 (define_insn "_packssdw"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v")
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x,v")
(vec_concat:VI2_AVX2
  (ss_truncate:
-   (match_operand: 1 "register_operand" "0,v"))
+   (match_operand: 1 "register_operand" "0,x,v"))
  (ss_truncate:
-   (match_operand: 2 "vector_operand" "xBm,vm"]
+   (match_operand: 2 "vector_operand" "xBm,xm,vm"]
   "TARGET_SSE2 &&  && "
   "@
packssdw\t{%2, %0|%0, %2}
+   vpackssdw\t{%2, %1, %0|%0, %1, %2}
vpackssdw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix" "orig,,evex")
(set_attr "mode" "")])
 
 (define_insn "_packuswb"
-  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x")
+  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
(vec_concat:VI1_AVX512
  (us_truncate:
-   (match_operand: 1 "register_operand" "0,v"))
+   (match_operand: 1 "register_operand" "0,x,v"))
  (us_truncate:
-   (match_operand: 2 "vector_operand" "xBm,vm"]
+   (match_operand: 2 "vector_operand" "xBm,xm,vm"]
   "TARGET_SSE2 &&  && "
   "@
packuswb\t{%2, %0|%0, %2}
+   vpackuswb\t{%2, %1, %0|%0, %1, %2}
vpackuswb\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sselog")
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix" "orig,,evex")
(set_attr "mode" "")])
 
 (define_insn "avx512bw_interleave_highv64qi"
@@ -14572,21 +14575,22 @@ (define_insn "_mpsadbw"
(set_attr "mode" "")])
 
 (define_insn "_packusdw"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=Yr,*x,v")
+  [(set (match_operand:VI2_AVX2 0 "register_operand" "=Yr,*x,x,v")
(vec_concat:VI2_AVX2
  (us_truncate:
-   (match_operand: 1 "register_operand" "0,0,v"))
+   (match_operand: 1 "register_operand" "0,0,x,v"))
  (us_truncate:
-   (match_operand: 2 "vector_operand" 
"YrBm,*xBm,vm"]
+   (match_operand: 2 "vector_operand" 
"YrBm,*xBm,xm,vm"]
   "TARGET_SSE4_1 &&  && "
   "@
packusdw\t{%2, %0|%0, %2}
packusdw\t{%2, %0|%0, %2}
+   vpackusdw\t{%2, %1, %0|%0, %1, %2}
vpackusdw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,noavx,avx")
+  [(set_attr "isa" "noavx,noavx,avx,avx512bw")
(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "prefix" "orig,orig,,evex")
(set_attr "mode" "")])
 
 (define_insn "_pblendvb"
--- gcc/testsuite/gcc.target/i386/avx512vl-pack-1.c.jj  2016-05-09 
12:16:52.062562903 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-pack-1.c 2016-05-09 
12:21:42.786628535 +0200
@@ 

[PATCH] vinsertps XMM16-XMM31 fixes

2016-05-09 Thread Jakub Jelinek
Hi!

vinsertps is already in AVX512F, so we can use use v constraints
freely.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-09  Jakub Jelinek  

* config/i386/sse.md (*vec_setv4sf_sse4_1, sse4_1_insertps): Use v
constraint instead of x in avx alternatives.  Use maybe_evex instead
of vex prefix.

* gcc.target/i386/avx512vl-vinsertps-1.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-09 12:34:58.839865460 +0200
+++ gcc/config/i386/sse.md  2016-05-09 13:15:55.400130875 +0200
@@ -6657,11 +6657,11 @@ (define_insn "vec_set_0"
 
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
-  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,x")
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
(vec_merge:V4SF
  (vec_duplicate:V4SF
-   (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,xm"))
- (match_operand:V4SF 1 "register_operand" "0,0,x")
+   (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,vm"))
+ (match_operand:V4SF 1 "register_operand" "0,0,v")
  (match_operand:SI 3 "const_int_operand")))]
   "TARGET_SSE4_1
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
@@ -6684,13 +6684,13 @@ (define_insn "*vec_setv4sf_sse4_1"
(set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "V4SF")])
 
 (define_insn "sse4_1_insertps"
-  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,x")
-   (unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "Yrm,*xm,xm")
- (match_operand:V4SF 1 "register_operand" "0,0,x")
+  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
+   (unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "Yrm,*xm,vm")
+ (match_operand:V4SF 1 "register_operand" "0,0,v")
  (match_operand:SI 3 "const_0_to_255_operand" "n,n,n")]
 UNSPEC_INSERTPS))]
   "TARGET_SSE4_1"
@@ -6718,7 +6718,7 @@ (define_insn "sse4_1_insertps"
(set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "V4SF")])
 
 (define_split
--- gcc/testsuite/gcc.target/i386/avx512vl-vinsertps-1.c.jj 2016-05-09 
13:10:08.277794535 +0200
+++ gcc/testsuite/gcc.target/i386/avx512vl-vinsertps-1.c2016-05-09 
13:13:51.788792211 +0200
@@ -0,0 +1,39 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+#include 
+
+__m128
+f1 (__m128 a, __m128 b)
+{
+  register __m128 c __asm ("xmm16") = a;
+  asm volatile ("" : "+v" (c));
+  c = _mm_insert_ps (c, b, 1);
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler "vinsertps\[^\n\r\]*xmm16" } } */
+
+__v4sf
+f2 (__v4sf a, float b)
+{
+  register __v4sf c __asm ("xmm17") = a;
+  asm volatile ("" : "+v" (c));
+  c[1] = b;
+  asm volatile ("" : "+v" (c));
+  return c;
+}
+
+/* { dg-final { scan-assembler "vinsertps\[^\n\r\]*xmm17" } } */
+
+__v4sf
+f3 (__v4sf a, float b)
+{
+  register float c __asm ("xmm18") = b;
+  asm volatile ("" : "+v" (c));
+  a[1] = c;
+  return a;
+}
+
+/* { dg-final { scan-assembler "vinsertps\[^\n\r\]*xmm18" } } */

Jakub


[PATCH] Don't emit AVX512DQ insns for -mavx512vl -mno-avx512dq (PR target/70927, take 2)

2016-05-09 Thread Jakub Jelinek
On Tue, May 03, 2016 at 08:23:28PM +0200, Jakub Jelinek wrote:
> While working on a patch I'm going to post momentarily, I've noticed that
> we sometimes emit AVX512DQ specific instructions even when avx512dq is not
> enabled (in particular, EVEX andnps and andnpd are AVX512DQ plus if
> they have 128-bit or 256-bit arguments, also AVX512VL).
> 
> I'm not 100% happy about the patch, because (pre-existing issue)
> get_attr_mode doesn't reflect that the insn is in that not a vector float
> insn, but perhaps we'd need to use another alternative and some ugly
> conditionals in mode attribute for that case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> after a while 6.2, or do you prefer some other fix?

Here is perhaps better variant, which handles stuff in the mode attribute.
Now also with testcases.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-09  Jakub Jelinek  

PR target/70927
* config/i386/sse.md (_andnot3),
*3): For !TARGET_AVX512DQ and EVEX encoding,
use vp*[dq] instead of v*p[sd] instructions and adjust mode attribute
accordingly.

* gcc.target/i386/avx512vl-logic-1.c: New test.
* gcc.target/i386/avx512vl-logic-2.c: New test.
* gcc.target/i386/avx512dq-logic-2.c: New test.

--- gcc/config/i386/sse.md.jj   2016-05-09 10:20:27.280249673 +0200
+++ gcc/config/i386/sse.md  2016-05-09 10:52:44.391756028 +0200
@@ -2783,54 +2783,61 @@ (define_expand "vcond_mask__andnot3"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,v")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
(and:VF_128_256
  (not:VF_128_256
-   (match_operand:VF_128_256 1 "register_operand" "0,v"))
- (match_operand:VF_128_256 2 "vector_operand" "xBm,vm")))]
+   (match_operand:VF_128_256 1 "register_operand" "0,x,v,v"))
+ (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
   "TARGET_SSE && "
 {
   static char buf[128];
   const char *ops;
   const char *suffix;
 
-  switch (get_attr_mode (insn))
-{
-case MODE_V8SF:
-case MODE_V4SF:
-  suffix = "ps";
-  break;
-default:
-  suffix = "";
-}
-
   switch (which_alternative)
 {
 case 0:
   ops = "andn%s\t{%%2, %%0|%%0, %%2}";
   break;
 case 1:
+case 2:
+case 3:
   ops = "vandn%s\t{%%2, %%1, %%0|%%0, 
%%1, %%2}";
   break;
 default:
   gcc_unreachable ();
 }
 
-  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
-  if ( && !TARGET_AVX512DQ)
+  switch (get_attr_mode (insn))
 {
+case MODE_V8SF:
+case MODE_V4SF:
+  suffix = "ps";
+  break;
+case MODE_OI:
+case MODE_TI:
+  /* There is no vandnp[sd] in avx512f.  Use vpandn[qd].  */
   suffix = GET_MODE_INNER (mode) == DFmode ? "q" : "d";
   ops = "vpandn%s\t{%%2, %%1, %%0|%%0, 
%%1, %%2}";
+  break;
+default:
+  suffix = "";
 }
 
   snprintf (buf, sizeof (buf), ops, suffix);
   return buf;
 }
-  [(set_attr "isa" "noavx,avx")
+  [(set_attr "isa" "noavx,avx,avx512dq,avx512f")
(set_attr "type" "sselog")
-   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "prefix" "orig,maybe_vex,evex,evex")
(set (attr "mode")
-   (cond [(and (match_test " == 16")
+   (cond [(and (match_test "")
+   (and (eq_attr "alternative" "1")
+(match_test "!TARGET_AVX512DQ")))
+(const_string "")
+  (eq_attr "alternative" "3")
+(const_string "")
+  (and (match_test " == 16")
(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
 (const_string "")
   (match_test "TARGET_AVX")
@@ -2870,7 +2877,10 @@ (define_insn "_andnot3")])
+   (set (attr "mode")
+(if_then_else (match_test "TARGET_AVX512DQ")
+ (const_string "")
+ (const_string "XI")))])
 
 (define_expand "3"
   [(set (match_operand:VF_128_256 0 "register_operand")
@@ -2889,10 +2899,10 @@ (define_expand "3
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*3"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=x,v")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=x,x,v,v")
(any_logic:VF_128_256
- (match_operand:VF_128_256 1 "vector_operand" "%0,v")
- (match_operand:VF_128_256 2 "vector_operand" "xBm,vm")))]
+ (match_operand:VF_128_256 1 "vector_operand" "%0,x,v,v")
+ (match_operand:VF_128_256 2 "vector_operand" "xBm,xm,vm,vm")))]
   "TARGET_SSE && 
&& ix86_binary_operator_ok (, mode, operands)"
 {
@@ -2900,43 +2910,50 @@ (define_insn "*3"
   const char *ops;
   const char *suffix;
 
-  switch (get_attr_mode (insn))
-{
-case MODE_V8SF:
-case MODE_V4SF:
-  suffix = "ps";
-  break;
-default:
-  suffix = "";
-}
-
   switch 

PING [PATCH] integer overflow checking builtins in constant expressions

2016-05-09 Thread Martin Sebor

Pinging the following patch:
  https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00013.html

On 05/01/2016 10:39 AM, Martin Sebor wrote:

c/68120 - can't easily deal with integer overflow at compile time,
is an enhancement request to make the integer overflow intrinsics
usable in constant expressions in C (in addition to letting them
be invoked with just two arguments).

The inability to use the built-ins in constant expressions also
limited to non-constexpr the contexts in which the patch for c++/
69517 - SEGV on a VLA with excess initializer elements, was able
to prevent the SEGV.  This limitation is noted in c++/70507 -
integer overflow builtins not constant expressions.

The attached patch implements the request in c/68120 for both
the C and C++ front-ends.  It stops short of providing the new
__builtin_add_wrapv function requested in c/68120.  It doesn't
seem necessary since the same functionality is available with
the patch via the existing built-ins.

With this enhancement in place it will be possible to add the
C++ VLA checking to constexpr functions and fully resolve c++/
69517 (which I plan to do next).

While testing the patch, I also noticed an minor inconsistency
in the text of the diagnostic GCC issues for invalid calls to
the built-ins with insufficient numbers of arguments:  for one
set of built-ins the error says: "not enough arguments," while
for another it says: "too few arguments."  I raised PR c/70883
- inconsistent error message for calls to __builtin_add_overflow
with too few arguments, for this and include a fix in this patch
as well.

Martin

PS The enhancement to call the type-generic built-ins with a null
pointer is not available in C++ 98 mode because GCC doesn't allow
null pointers in constant expressions.  Since C and later versions
of C++ do, it seems that it might be worthwhile to relax the rules
and accept them in C++ 98 as well so that the built-ins can be used
portably across all versions of C++.





Re: [PATCH, rs6000] Add support for int versions of vec_addec

2016-05-09 Thread Segher Boessenkool
Hi Bill,

On Fri, May 06, 2016 at 09:29:11AM -0500, Bill Seurer wrote:
> 2016-05-06  Bill Seurer  
> 
>   * config/rs6000/rs6000-builtin.def (vec_addec): Change vec_addec to a
>   special case builtin.
>   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
>   support for ALTIVEC_BUILTIN_VEC_ADDEC.
>   * config/rs6000/rs6000.c (altivec_init_builtins): Add definition
>   for __builtin_vec_addec.
> 
> [gcc/testsuite]
> 
> 2016-05-06  Bill Seurer  
> 
>   * gcc.target/powerpc/vec-addec.c: New test.
>   * gcc.target/powerpc/vec-addec-int128.c: New test.


> +  /* All 3 arguments must be vectors of (signed or unsigned) (int or
> + __int128) and the types must match.  */
> +  if ((arg0_type != arg1_type) || (arg1_type != arg2_type))
> + goto bad; 

Superfluous parens; trailing space.  Please fix (throughout).

> +   /* For {un}signed ints, 
> +   vec_addec (va, vb, carryv) == vec_or (vec_addc (va, vb),
> + vec_addc(vec_add(va, vb),
> +  vec_and (carryv, 0x1))).  */

"0x1" looks really silly btw ;-)

> + /* Use save_expr to ensure that operands used more than once
> + that may have side effects (like calls) are only evaluated
> + once.  */
> + arg0 = save_expr(arg0);
> + arg1 = save_expr(arg1);
> + vec *params = make_tree_vector();

Space before function arg opening parenthesis.

> + vec_safe_push (params, arg0);
> + vec_safe_push (params, arg1);
> + tree call1 = altivec_resolve_overloaded_builtin
> + (loc, rs6000_builtin_decls[ALTIVEC_BUILTIN_VEC_ADDC], params);

That's not how you're supposed to break long lines.  Probably easiest if you
first assign the decl to a new temporary?

> + tree const1 = build_vector_from_val (arg0_type, 
> + build_int_cstu(TREE_TYPE (arg0_type), 1));

That's not right either.

> + tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR,
> + arg0_type, arg2, const1);

Nor this.  A continuation line should start at the same indent as the
thing it continues (so "arg0_type" should line up with "loc" in this case).


Segher


Re: [PATCH] add reassociation width target function for power8

2016-05-09 Thread Segher Boessenkool
Hi Aaron,

On Wed, May 04, 2016 at 11:20:12AM -0500, Aaron Sawdey wrote:
> This patch enables TARGET_SCHED_REASSOCIATION_WIDTH for power8 and up.
> The widths returned are derived from testing with SPEC 2006 and some
> simple tests on power8.
> 
> Bootstrapped and regtested on powerpc64le-unknown-linux-gnu, ok for
> trunk?
> 
> 2016-05-04  Aaron Sawdey 

Two spaces before the email address.

> * config/rs6000/rs6000.c (rs6000_reassociation_width): Add
> function for TARGET_SCHED_REASSOCIATION_WIDTH to enable
> parallel reassociation for power8 and forward.

This is okay for trunk.  Thanks,


Segher


[PATCH, i386]: Fix btrq peephole2

2016-05-09 Thread Uros Bizjak
Hello!

As shown by attached testcase, btrq peephole2 generates wrong
immediate value for AND mask.

Attached patch fixes this problem and also improves btr{s,r,c}q
peephole2 patterns a bit.

2016-05-09  Uros Bizjak  

* config/i386/i386.md (absneg splitters with general regs): Use
general_reg_operand predicate.
(btsq peephole2): Use x86_64_immediate_operand to check if new
value is suitable for immediate operand.  Generate emitted insn
using RTL expressions.
(btcq peephole2): Ditto.
(btrq peephole2): Ditto.  Generate correct immediate operand
for AND masking.

testsuite/ChangeLog:

2016-05-09  Uros Bizjak  

* gcc.target/i386/fabsneg-1.c New test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN, will be backported to gcc-6 and gcc-5 branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 236034)
+++ config/i386/i386.md (working copy)
@@ -9306,7 +9306,7 @@
 })
 
 (define_split
-  [(set (match_operand:SF 0 "register_operand")
+  [(set (match_operand:SF 0 "general_reg_operand")
(match_operator:SF 1 "absneg_operator" [(match_dup 0)]))
(use (match_operand:V4SF 2))
(clobber (reg:CC FLAGS_REG))]
@@ -9330,7 +9330,7 @@
 })
 
 (define_split
-  [(set (match_operand:DF 0 "register_operand")
+  [(set (match_operand:DF 0 "general_reg_operand")
(match_operator:DF 1 "absneg_operator" [(match_dup 0)]))
(use (match_operand 2))
(clobber (reg:CC FLAGS_REG))]
@@ -9368,7 +9368,7 @@
 })
 
 (define_split
-  [(set (match_operand:XF 0 "register_operand")
+  [(set (match_operand:XF 0 "general_reg_operand")
(match_operator:XF 1 "absneg_operator" [(match_dup 0)]))
(use (match_operand 2))
(clobber (reg:CC FLAGS_REG))]
@@ -11049,20 +11049,19 @@
   (const_int 1))
  (clobber (reg:CC FLAGS_REG))])]
   "TARGET_64BIT && !TARGET_USE_BT"
-  [(const_int 0)]
+  [(parallel [(set (match_dup 0)
+  (ior:DI (match_dup 0) (match_dup 3)))
+ (clobber (reg:CC FLAGS_REG))])]
 {
   int i = INTVAL (operands[1]);
 
-  rtx op1 = gen_int_mode (HOST_WIDE_INT_1U << i, DImode);
+  operands[3] = gen_int_mode (HOST_WIDE_INT_1U << i, DImode);
 
-  if (i >= 31)
+  if (!x86_64_immediate_operand (operands[3], DImode))
 {
-  emit_move_insn (operands[2], op1);
-  op1 = operands[2];
+  emit_move_insn (operands[2], operands[3]);
+  operands[3] = operands[2];
 }
-
-  emit_insn (gen_iordi3 (operands[0], operands[0], op1));
-  DONE;
 })
 
 (define_peephole2
@@ -11074,20 +11073,19 @@
   (const_int 0))
  (clobber (reg:CC FLAGS_REG))])]
   "TARGET_64BIT && !TARGET_USE_BT"
-  [(const_int 0)]
+  [(parallel [(set (match_dup 0)
+  (and:DI (match_dup 0) (match_dup 3)))
+ (clobber (reg:CC FLAGS_REG))])]
 {
   int i = INTVAL (operands[1]);
 
-  rtx op1 = gen_int_mode (HOST_WIDE_INT_1U << i, DImode);
+  operands[3] = gen_int_mode (~(HOST_WIDE_INT_1U << i), DImode);
  
-  if (i >= 32)
+  if (!x86_64_immediate_operand (operands[3], DImode))
 {
-  emit_move_insn (operands[2], op1);
-  op1 = operands[2];
+  emit_move_insn (operands[2], operands[3]);
+  operands[3] = operands[2];
 }
-
-  emit_insn (gen_anddi3 (operands[0], operands[0], op1));
-  DONE;
 })
 
 (define_peephole2
@@ -11100,20 +11098,19 @@
(match_dup 0) (const_int 1) (match_dup 1
  (clobber (reg:CC FLAGS_REG))])]
   "TARGET_64BIT && !TARGET_USE_BT"
-  [(const_int 0)]
+  [(parallel [(set (match_dup 0)
+  (xor:DI (match_dup 0) (match_dup 3)))
+ (clobber (reg:CC FLAGS_REG))])]
 {
   int i = INTVAL (operands[1]);
 
-  rtx op1 = gen_int_mode (HOST_WIDE_INT_1U << i, DImode);
+  operands[3] = gen_int_mode (HOST_WIDE_INT_1U << i, DImode);
 
-  if (i >= 31)
+  if (!x86_64_immediate_operand (operands[3], DImode))
 {
-  emit_move_insn (operands[2], op1);
-  op1 = operands[2];
+  emit_move_insn (operands[2], operands[3]);
+  operands[3] = operands[2];
 }
-
-  emit_insn (gen_xordi3 (operands[0], operands[0], op1));
-  DONE;
 })
 
 (define_insn "*bt"
Index: testsuite/gcc.target/i386/fabsneg-1.c
===
--- testsuite/gcc.target/i386/fabsneg-1.c   (nonexistent)
+++ testsuite/gcc.target/i386/fabsneg-1.c   (working copy)
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mtune=nocona" } */
+
+double x;
+
+void
+__attribute__ ((noinline, noclone))
+test_fabs (double a)
+{
+  asm volatile ("" : "+r" (a));
+  x = __builtin_fabs (a);
+}
+
+void
+__attribute__ ((noinline, noclone))
+test_neg (double a)
+{
+  asm volatile ("" : "+r" (a));
+  x = -a;
+}
+
+int main ()
+{
+  test_fabs (-1.0);
+
+  if (x != 1.0)
+

[Patch, Fortran] Update documentation of UNION to be endian agnostic

2016-05-09 Thread Fritz Reese
In light of https://gcc.gnu.org/ml/fortran/2016-05/msg00018.html I
should also update the documentation...

---
Fritz Reese


0001-Update-documentation-of-UNION-to-be-endian-agnostic.patch
Description: Binary data


Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations

2016-05-09 Thread Segher Boessenkool
Hi Christophe,

On Mon, May 09, 2016 at 03:54:26PM +0200, Christophe Lyon wrote:
> After this patch, I've noticed that
> gcc.target/arm/pr43920-2.c
> now fails at:
> /* { dg-final { scan-assembler-times "pop" 2 } } */
> 
> Before the patch, the generated code was:
> [...]
> pop {r3, r4, r5, r6, r7, pc}
> .L4:
> mov r0, #-1
> .L1:
> pop {r3, r4, r5, r6, r7, pc}
> 
> it is now:
> [...]
> .L1:
> pop {r3, r4, r5, r6, r7, pc}
> .L4:
> mov r0, #-1
> b   .L1
> 
> The new version does not seem better, as it adds a branch on the path
> and it is not smaller.

That looks like bb-reorder isn't doing its job?  Maybe it thinks that
pop is too expensive to copy?


Segher


[C PATCH] Warn for optimize attribute on decl after definition (PR c/70255)

2016-05-09 Thread Marek Polacek
In this PR, Richi pointed out that we don't warn for the case when a
declaration with attribute optimize follows the definition which is lacking
that attribute.  This patch adds such a warning.  Though the question is
whether this shouldn't apply to more attributes than just "optimize".  And,
as can be seen in the testcase, we'll warn for even for the case when the
definition has
  optimize ("no-associative-math,O2")
and the declaration
  optimize ("O2,no-associative-math")
Not sure if we have something better than attribute_value_equal, though.

(The C++ FE lacks these kind of warnings; I opened PR71024 for that.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-09  Marek Polacek  

PR c/70255
* c-decl.c (diagnose_mismatched_decls): Warn for optimize attribute
on a declaration following the definition.

* gcc.dg/attr-opt-1.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 7094efc..6f97ed9 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2228,6 +2228,18 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 
   if (TREE_CODE (newdecl) == FUNCTION_DECL)
 {
+  tree a1 = lookup_attribute ("optimize", DECL_ATTRIBUTES (olddecl));
+  tree a2 = lookup_attribute ("optimize", DECL_ATTRIBUTES (newdecl));
+  /* An optimization attribute applied on a declaration after the
+definition is likely not what the user wanted.  */
+  if (a2 != NULL_TREE
+ && DECL_SAVED_TREE (olddecl) != NULL_TREE
+ && (a1 == NULL_TREE || !attribute_value_equal (a1, a2)))
+   warned |= warning (OPT_Wattributes,
+  "optimization attribute on %qD follows "
+  "definition but the attribute doesn%'t match",
+  newdecl);
+
   /* Diagnose inline __attribute__ ((noinline)) which is silly.  */
   if (DECL_DECLARED_INLINE_P (newdecl)
  && lookup_attribute ("noinline", DECL_ATTRIBUTES (olddecl)))
diff --git gcc/testsuite/gcc.dg/attr-opt-1.c gcc/testsuite/gcc.dg/attr-opt-1.c
index e69de29..07fa4db 100644
--- gcc/testsuite/gcc.dg/attr-opt-1.c
+++ gcc/testsuite/gcc.dg/attr-opt-1.c
@@ -0,0 +1,30 @@
+/* PR c/70255 */
+/* { dg-do compile } */
+
+double
+fn1 (double h, double l) /* { dg-message "previous definition" } */
+{
+  return h + l;
+}
+double fn1 (double, double) __attribute__ ((optimize 
("no-associative-math"))); /* { dg-warning "optimization attribute on .fn1. 
follows definition" } */
+
+__attribute__ ((optimize ("no-associative-math"))) double
+fn2 (double h, double l)
+{
+  return h + l;
+}
+double fn2 (double, double) __attribute__ ((optimize ("no-associative-math")));
+
+__attribute__ ((optimize ("no-associative-math"))) double
+fn3 (double h, double l) /* { dg-message "previous definition" } */
+{
+  return h + l;
+}
+double fn3 (double, double) __attribute__ ((optimize 
("O2,no-associative-math"))); /* { dg-warning "optimization attribute on .fn3. 
follows definition" } */
+
+__attribute__ ((optimize ("no-associative-math,O2"))) double
+fn4 (double h, double l) /* { dg-message "previous definition" } */
+{
+  return h + l;
+}
+double fn4 (double, double) __attribute__ ((optimize 
("O2,no-associative-math"))); /* { dg-warning "optimization attribute on .fn4. 
follows definition" } */

Marek


[Patch, Fortran] Fix for test case dec_union_4.f90

2016-05-09 Thread Fritz Reese
I was silly to assume little-endian integer representations for this
test case - of course it fails on big-endian machines. Fixed to use
endian-agnostic character strings.

ref
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56226
https://gcc.gnu.org/ml/gcc-testresults/2016-05/msg00779.html

---
Fritz Reese


dec_union_4.patch
Description: Binary data


Re: [RS6000] complex long double ABI_V4 fix

2016-05-09 Thread Segher Boessenkool
On Fri, May 06, 2016 at 03:54:43PM +0930, Alan Modra wrote:
> Revision 235792 regressed compat/scalar-by-value-6 for powerpc-linux
> -m32 due to accidentally changing the ABI.  By another historical
> accident, complex long double is stupidly passed in gprs for -m32.
> 
> Bootstrapped and regression tested powerpc64-linux.  Also fixes
> gfortran.dg/{large_real_kind_2.F90,large_real_kind_form_io_1.f90}.
> OK to apply?

>   * config/rs6000/rs6000.c (rs6000_function_arg): Exclude IBM
>   complex long double from args passed in fprs for ABI_V4.
>   (rs6000_function_arg_boundary, rs6000_function_arg_advance_1,
>   rs6000_gimplify_va_arg): Likewise.
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 1215925..9c7a37b 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -10142,6 +10142,7 @@ rs6000_function_arg_boundary (machine_mode mode, 
> const_tree type)
>&& (GET_MODE_SIZE (mode) == 8
> || (TARGET_HARD_FLOAT
> && TARGET_FPRS
> +   && !(mode == ICmode || (!TARGET_IEEEQUAD && mode == TCmode))
> && FLOAT128_2REG_P (mode

Since this monstruous, unreadable condition is used a lot, use a nicely named
helper function instead?


Segher


Re: [PATCH], Add PowerPC ISA 3.0 min/max support

2016-05-09 Thread Segher Boessenkool
On Thu, May 05, 2016 at 03:18:39PM -0400, Michael Meissner wrote:
> At the present time, the code does not support comparisons involving >= and <=
> unless the -ffast-math option is used. I hope eventually to support generating
> these instructions without having -ffast-math used.
> 
> The underlying reason is when fast math is not used, we change the condition
> from:
> 
>   (ge:SI (reg:CCFP ) (const_int 0))
> 
> to:
> 
>   (ior:SI (gt:SI (reg:CCFP ) (const_int 0))
>   (eq:SI (reg:CCFP ) (const_int 0)))
> 
> The machine independent portion of the compiler does not recognize this when
> trying to generate conditional moves.
> 
> I would imagine the 'fix' is to generate GE/LE all of the time, and then have 
> a
> splitter that converts it to IOR of GT/EQ if it is not a conditional move with
> ISA 3.0 instructions.

That sounds like a plan :-)

> -;; Return true if operand is MIN or MAX operator.
> +;; Return true if operand is MIN or MAX operator.  Since this is only used to
> +;; convert floating point MIN/MAX operations into FSEL on pre-vsx systems,
> +;; don't include UMIN or UMAX.
>  (define_predicate "min_max_operator"
> -  (match_code "smin,smax,umin,umax"))
> +  (match_code "smin,smax"))

Please name it signed_min_max_operator instead?

> --- gcc/config/rs6000/rs6000.c
> (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> (revision 235831)
> +++ gcc/config/rs6000/rs6000.c(.../gcc/config/rs6000) (working copy)
> @@ -20534,6 +20534,12 @@ print_operand (FILE *file, rtx x, int co
>   "local dynamic TLS references");
>return;
>  
> +case '@':
> +  /* If -mpower9-minmax, use xsmaxcpdp instead of xsmaxdp.  */
> +  if (TARGET_P9_MINMAX)
> + putc ('c', file);
> +  return;

I don't think @ is very mnemonic, nor is this special enough for such
a nice letter.

Form looking at how it is used, it seems you can make it part of code_attr
minmax (and give that a better name, minmax_fp or such)?

> +  rs6000_emit_minmax (dest, (max_p) ? SMAX : SMIN, op0, op1);

Superfluous parentheses.

> +rs6000_emit_power9_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)

Maybe put some "fp" in the name?  For "minmax" as well.

> +  if (swap_p)
> +compare_rtx = gen_rtx_fmt_ee (code, CCFPmode, op1, op0);
> +  else
> +compare_rtx = gen_rtx_fmt_ee (code, CCFPmode, op0, op1);

if (swap_p)
  std::swap (op0, op1);

and then just generate the one form?


Segher


[PATCH, Fortran, OpenACC] Fix PR70598, Fortran host_data ICE

2016-05-09 Thread Chung-Lin Tang
Hi, this patch resolves an ICE for Fortran when using the OpenACC
host_data directive.  Actually, rather than say resolve, it's more like
adjusting the front-end to same middle-end restrictions as C/C++,
namely that we only support pointers or arrays for host_data right now.

This patch contains a little bit of adjustments in 
fortran/openmp.c:resolve_omp_clauses(),
and some testcase adjustments. This has been tested without regressions for 
Fortran.

Is this okay for trunk?

Thanks,
Chung-Lin

2015-05-09  Chung-Lin Tang  

gcc/
* fortran/openmp.c (resolve_omp_clauses): Adjust use_device clause
handling to only allow pointers and arrays.

gcc/testsuite/
* gfortran.dg/goacc/host_data-tree.f95: Adjust to use accept pointers 
in use_device clause.
* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
* gfortran.dg/goacc/list.f95: Adjust to catch "neither a pointer nor an 
array" error messages.

libgomp/testsuite/
* libgomp.oacc-fortran/host_data-1.f90: New testcase.
Index: gcc/fortran/openmp.c
===
--- gcc/fortran/openmp.c	(revision 236020)
+++ gcc/fortran/openmp.c	(working copy)
@@ -3743,11 +3743,18 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_claus
 			  && CLASS_DATA (n->sym)->attr.allocatable))
 			gfc_error ("ALLOCATABLE object %qs in %s clause at %L",
    n->sym->name, name, >where);
-		  if (n->sym->attr.pointer
-			  || (n->sym->ts.type == BT_CLASS && CLASS_DATA (n->sym)
-			  && CLASS_DATA (n->sym)->attr.class_pointer))
-			gfc_error ("POINTER object %qs in %s clause at %L",
-   n->sym->name, name, >where);
+		  if (n->sym->attr.flavor == FL_VARIABLE
+			  && !n->sym->as && !n->sym->attr.pointer
+			  && !n->sym->attr.cray_pointer
+			  && !n->sym->attr.cray_pointee)
+			gfc_error ("%s clause variable %qs at %L is neither "
+   "a pointer nor an array", name,
+   n->sym->name, >where);
+		  if (n->sym->ts.type == BT_CLASS && CLASS_DATA (n->sym)
+			  && CLASS_DATA (n->sym)->attr.class_pointer)
+			gfc_error ("POINTER object %qs of polymorphic type in "
+   "%s clause at %L", n->sym->name, name,
+   >where);
 		  if (n->sym->attr.cray_pointer)
 			gfc_error ("Cray pointer object %qs in %s clause at %L",
    n->sym->name, name, >where);
Index: gcc/testsuite/gfortran.dg/goacc/uninit-use-device-clause.f95
===
--- gcc/testsuite/gfortran.dg/goacc/uninit-use-device-clause.f95	(revision 236020)
+++ gcc/testsuite/gfortran.dg/goacc/uninit-use-device-clause.f95	(working copy)
@@ -2,9 +2,9 @@
 ! { dg-additional-options "-Wuninitialized" }
 
 subroutine test
-  integer :: i
+  integer, pointer :: p
 
-  !$acc host_data use_device(i) ! { dg-warning "is used uninitialized in this function" }
+  !$acc host_data use_device(p) ! { dg-warning "is used uninitialized in this function" }
   !$acc end host_data
 end subroutine test
 
Index: gcc/testsuite/gfortran.dg/goacc/list.f95
===
--- gcc/testsuite/gfortran.dg/goacc/list.f95	(revision 236020)
+++ gcc/testsuite/gfortran.dg/goacc/list.f95	(working copy)
@@ -76,19 +76,19 @@ program test
   !$acc parallel private (i) firstprivate (i) ! { dg-error "present on multiple clauses" }
   !$acc end parallel
 
-  !$acc host_data use_device(i)
+  !$acc host_data use_device(i) ! { dg-error "neither a pointer nor an array" }
   !$acc end host_data
 
-  !$acc host_data use_device(c, d)
+  !$acc host_data use_device(c, d) ! { dg-error "neither a pointer nor an array" }
   !$acc end host_data
 
   !$acc host_data use_device(a)
   !$acc end host_data
 
-  !$acc host_data use_device(i, j, k, l, a)
+  !$acc host_data use_device(i, j, k, l, a) ! { dg-error "neither a pointer nor an array" }
   !$acc end host_data  
 
-  !$acc host_data use_device (i) use_device (j)
+  !$acc host_data use_device (i) use_device (j) ! { dg-error "neither a pointer nor an array" }
   !$acc end host_data
 
   !$acc host_data use_device ! { dg-error "Unclassifiable OpenACC directive" }
@@ -99,13 +99,17 @@ program test
 
   !$acc host_data use_device(10) ! { dg-error "Syntax error" }
 
-  !$acc host_data use_device(/b/, /b/) ! { dg-error "present on multiple clauses" }
+  !$acc host_data use_device(/b/, /b/)
   !$acc end host_data
+  ! { dg-error "neither a pointer nor an array" "" { target *-*-* } 102 }
+  ! { dg-error "present on multiple clauses" "" { target *-*-* } 102 }
 
-  !$acc host_data use_device(i, j, i) ! { dg-error "present on multiple clauses" }
+  !$acc host_data use_device(i, j, i)
   !$acc end host_data
+  ! { dg-error "neither a pointer nor an array" "" { target *-*-* } 107 }
+  ! { dg-error "present on multiple clauses" "" { target *-*-* } 107 }
 
-  !$acc host_data use_device(p1) ! { dg-error "POINTER" }
+  !$acc host_data use_device(p1)
   !$acc end 

Re: [RX] Add support for atomic operations

2016-05-09 Thread Nick Clifton
Hi Oleg,

> gcc/ChangeLog:
>   * config/rx/rx-protos.h (is_interrupt_func, is_fast_interrupt_func):
>   Forward declare.
>   (rx_atomic_sequence): New class.
>   * config/rx/rx.c (rx_print_operand): Use symbolic names for PSW bits.
>   (is_interrupt_func, is_fast_interrupt_func): Make non-static and
>   non-inline.
>   (rx_atomic_sequence::rx_atomic_sequence,
>   rx_atomic_sequence::~rx_atomic_sequence): New functions.
>   * config/rx/rx.md (CTRLREG_PSW, CTRLREG_USP, CTRLREG_FPSW, CTRLREG_CPEN,
>   CTRLREG_BPSW, CTRLREG_BPC, CTRLREG_ISP, CTRLREG_FINTV,
>   CTRLREG_INTB): New constants.
>   (FETCHOP): New code iterator.
>   (fethcop_name, fetchop_name2): New iterator code attributes.
>   (QIHI): New mode iterator.
>   (atomic_exchange, atomic_exchangesi, xchg_mem,
>   atomic_fetch_si, atomic_fetch_nandsi,
>   atomic__fetchsi, atomic_nand_fetchsi): New patterns.

Approved - please apply.

Cheers
  Nick



Re: [PATCH] Fix memory leak in tree-inliner

2016-05-09 Thread Martin Liška
On 05/09/2016 12:57 PM, Richard Biener wrote:
> does that resolve the issue?  If so, this is ok for trunk and branches.

It does. I'll commit the patch to trunk after it finishes regression tests.
Related patches for branches will be prepared after that.

Martin
>From 53c0a7fe057fd2ddd1ab4cea1d5ce47bba6dfa0b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 May 2016 16:02:15 +0200
Subject: [PATCH] Handle memory leak in tree-inline.c.

gcc/ChangeLog:

2016-01-06  Martin Liska  

	* tree-inline.c (remap_dependence_clique): Do not remap
	debugging statements.
---
 gcc/tree-inline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 2ee3f63..e571140 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -840,7 +840,7 @@ is_parm (tree decl)
 static unsigned short
 remap_dependence_clique (copy_body_data *id, unsigned short clique)
 {
-  if (clique == 0)
+  if (clique == 0 || processing_debug_stmt)
 return 0;
   if (!id->dependence_map)
 id->dependence_map = new hash_map;
-- 
2.8.1



[ARM] mno-pic-data-is-text-relative & msingle-pic-base

2016-05-09 Thread Nathan Sidwell
This patch comes from an off-list conversation between Joey & me.  The context 
is with RTOSs not all singing & dancing dynamic objects and OSes.


currently, the documentation for -mno-pic-data-is-text-relative (-mno-PDITR) 
says
'Assume that each data segments are relative to text segment at load time.
 Therefore, it permits addressing data using PC-relative operations.
 This option is on by default for targets other than VxWorks RTP.'

However, if you use just this option, you still end up with a pic-register init 
sequence that  presumes a fixed mapping.  That's a surprise.  Joey tells me its 
expected use is with -msingle-pic-base (-mSPB), which reserves a global register 
to point at the (single) GOT.  That's what I had expected the -mno-PDITR option 
to have implied.


Apparently there are legitimate reasons one might want the -mno-PDITR behaviour 
without -mSPB.  I don't know what those are, perhaps Joey could clarify?


Anyway, IMHO that is the rare case and the more common case is that one would 
want to have -mnoPDITR imply -mSPB. (The reverse probably doesn't apply.)


This patch does 3 things.
1) have -mno-PDITR imply -mSPB, unless one has explictly provided -m[no-]SPB.
2) clarified  the -m[no-]PDITR documentation.
3) Added some testcases -- there didn't appear to be any.

ok?

nathan
2016-05-09  Nathan Sidwell  

	gcc/
	* config/arm/arm.c (arm_option_override): Set MASK_SINGLE_PIC_BASE
	when -mno-pic-data-is-text-relative is in effect, by default.
	* doc/invoke.texi (mpic-data-is-text-relative): Document new
	behavior and clarify.

	gcc/testsuite/
	* gcc.target/arm/data-rel-1.c: New.
	* gcc.target/arm/data-rel-2.c: New.
	* gcc.target/arm/data-rel-3.c: New.

Index: config/arm/arm.c
===
--- config/arm/arm.c	(revision 235980)
+++ config/arm/arm.c	(working copy)
@@ -3298,6 +3298,20 @@ arm_option_override (void)
 	}
 }
 
+  if (TARGET_VXWORKS_RTP)
+{
+  if (!global_options_set.x_arm_pic_data_is_text_relative)
+	arm_pic_data_is_text_relative = 0;
+}
+  else if (flag_pic
+	   && !arm_pic_data_is_text_relative
+	   && !(global_options_set.x_target_flags & MASK_SINGLE_PIC_BASE))
+/* When text & data segments don't have a fixed displacement, the
+   intended use is with a single, read only, pic base register.
+   Unless the user explicitly requested not to do that, set
+   it.  */
+target_flags |= MASK_SINGLE_PIC_BASE;
+
   /* If stack checking is disabled, we can use r10 as the PIC register,
  which keeps r9 available.  The EABI specifies r9 as the PIC register.  */
   if (flag_pic && TARGET_SINGLE_PIC_BASE)
@@ -3329,10 +3343,6 @@ arm_option_override (void)
 	arm_pic_register = pic_register;
 }
 
-  if (TARGET_VXWORKS_RTP
-  && !global_options_set.x_arm_pic_data_is_text_relative)
-arm_pic_data_is_text_relative = 0;
-
   /* Enable -mfix-cortex-m3-ldrd by default for Cortex-M3 cores.  */
   if (fix_cm3_ldrd == 2)
 {
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 235980)
+++ doc/invoke.texi	(working copy)
@@ -14197,9 +14197,12 @@ otherwise the default is @samp{R10}.
 
 @item -mpic-data-is-text-relative
 @opindex mpic-data-is-text-relative
-Assume that each data segments are relative to text segment at load time.
-Therefore, it permits addressing data using PC-relative operations.
-This option is on by default for targets other than VxWorks RTP.
+Assume that the displacement between the text and data segments is fixed
+at static link time.  This permits using PC-relative addressing
+operations to access data known to be in the data segment.  For
+non-VxWorks RTP targets, this option is enabled by default.  When
+disabled on such targets, it will enable @option{-msingle-pic-base} by
+default.
 
 @item -mpoke-function-name
 @opindex mpoke-function-name
Index: testsuite/gcc.target/arm/data-rel-1.c
===
--- testsuite/gcc.target/arm/data-rel-1.c	(nonexistent)
+++ testsuite/gcc.target/arm/data-rel-1.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-options "-fPIC -mno-pic-data-is-text-relative" } */
+/* { dg-final { scan-assembler-not "j-\\(.LPIC"  } } */
+/* { dg-final { scan-assembler-not "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
+/* { dg-final { scan-assembler "j\\(GOT\\)" } } */
+/* { dg-final { scan-assembler "(ldr|mov)\tr\[0-9\]+, \\\[?r9" } } */
+
+static int j;
+
+int *Foo ()
+{
+  return 
+}
Index: testsuite/gcc.target/arm/data-rel-2.c
===
--- testsuite/gcc.target/arm/data-rel-2.c	(nonexistent)
+++ testsuite/gcc.target/arm/data-rel-2.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-options "-fPIC -mno-pic-data-is-text-relative -mno-single-pic-base" } */
+/* { dg-final { scan-assembler-not "j-\\(.LPIC"  } } */
+/* { dg-final { scan-assembler "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
+/* { 

Re: [PATCH,rs6000] Add built-in support for new Power9 darn (deliver a random number) instruction

2016-05-09 Thread Segher Boessenkool
Hi Kelvin,

On Thu, May 05, 2016 at 10:26:01AM -0600, Kelvin Nilsen wrote:
>   (UNSPEC_DARN_32): New usnpec constant.

Typo.

>   ("darn_32"): New instruction.

We don't normally use quotes for insn names.

>   (rs6000_builtin_mask_calculate): Add in the RS6000_BTM_MODULO and
>   RS6000_BTM_64BIT flags to the returned mask, depending on
>   configuration. 

Trailing space (many, in this changelog).

> --- gcc/config/rs6000/altivec.h   (revision 235884)
> +++ gcc/config/rs6000/altivec.h   (working copy)
> @@ -382,6 +382,11 @@
>  #define vec_vsubuqm __builtin_vec_vsubuqm
>  #define vec_vupkhsw __builtin_vec_vupkhsw
>  #define vec_vupklsw __builtin_vec_vupklsw
> +
> +/* Non-Vector additions added in ISA 3.0. */
> +#define darn __builtin_darn
> +#define darn_32 __builtin_darn_32
> +#define darn_raw __builtin_darn_raw
>  #endif

Do we really want to #define short words like "darn"?  If this is already
set in stone, so be it.

> +(define_insn "darn_32"
> +  [(set (match_operand:SI 0 "register_operand" "")

The constraint should be "r" I suppose?

> +(unspec:SI [(const_int 0)] UNSPEC_DARN_32))]
> +  "TARGET_MODULO"
> +  {
> + return "darn %0,0";
> +  }
> +  [(set_attr "type" "add")  

Trailing spaces.  "add" isn't the correct type; use "integer" if there
is no better type.

> +   (set_attr "length" "4")])

That is the default, no need to mention it.  Most insns are implicitly
length 4.

> +/* Miscellaneous builtins for instructions added in ISA 3.0.  These
> +   instructions don't require either the DFP or VSX options, just the basic 

Trailing space.

> @@ -3634,6 +3639,8 @@ rs6000_builtin_mask_calculate (void)
> | ((rs6000_cpu == PROCESSOR_CELL) ? RS6000_BTM_CELL  : 0)
> | ((TARGET_P8_VECTOR) ? RS6000_BTM_P8_VECTOR : 0)
> | ((TARGET_P9_VECTOR) ? RS6000_BTM_P9_VECTOR : 0)
> +   | ((TARGET_MODULO)? RS6000_BTM_MODULO: 0)
> +   | ((TARGET_64BIT) ? RS6000_BTM_64BIT: 0)

Missing space?

> +  /* RS6000_BTC_SPECIAL represents no-operand operators.  */
>gcc_assert (attr == RS6000_BTC_UNARY
> || attr == RS6000_BTC_BINARY
> -   || attr == RS6000_BTC_TERNARY);
> -
> +   || attr == RS6000_BTC_TERNARY
> +   || attr == RS6000_BTC_SPECIAL);
> +  

Why SPECIAL and not NULLARY or such?

> +  if (rs6000_overloaded_builtin_p (d->code))
> + {
> +   if (! (type = opaque_ftype_opaque))
> + type = opaque_ftype_opaque
> +   = build_function_type_list (opaque_V4SI_type_node,
> +   NULL_TREE);
> + }

Eww.

  if (!opaque_ftype_opaque)
opaque_ftype_opaque = build_function_type_list (...);
  type = opaque_ftype_opaque;

> +   enum insn_code icode = d->icode;
> +   if (d->name == 0)
> + {
> +   if (TARGET_DEBUG_BUILTIN)
> + fprintf (stderr, "rs6000_builtin, bdesc_0arg[%ld] no name\n",
> +  (long unsigned)i);

unsigned is %u, not %d.  Space after cast.

Cheers,


Segher


Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations

2016-05-09 Thread Christophe Lyon
On 3 May 2016 at 08:59, Segher Boessenkool  wrote:
> Now that cfgcleanup knows how to optimize with return statements, the
> epilogue insertion code doesn't have to deal with it itself anymore.
>
>
> 2016-05-03  Segher Boessenkool  
>
> * function.c (emit_use_return_register_into_block): Delete.
> (gen_return_pattern): Delete.
> (emit_return_into_block): Delete.
> (active_insn_between): Delete.
> (convert_jumps_to_returns): Delete.
> (emit_return_for_exit): Delete.
> (thread_prologue_and_epilogue_insns): Delete all code dealing with
> simple_return for shrink-wrapped blocks.
> * shrink-wrap.c (try_shrink_wrapping): Insert simple_return at the
> end of blocks that need one.
> (get_unconverted_simple_return): Delete.
> (convert_to_simple_return): Delete.
> * shrink-wrap.c (get_unconverted_simple_return): Delete declaration.
> (convert_to_simple_return): Ditto.
>

Hi,

After this patch, I've noticed that
gcc.target/arm/pr43920-2.c
now fails at:
/* { dg-final { scan-assembler-times "pop" 2 } } */

Before the patch, the generated code was:
[...]
pop {r3, r4, r5, r6, r7, pc}
.L4:
mov r0, #-1
.L1:
pop {r3, r4, r5, r6, r7, pc}

it is now:
[...]
.L1:
pop {r3, r4, r5, r6, r7, pc}
.L4:
mov r0, #-1
b   .L1

The new version does not seem better, as it adds a branch on the path
and it is not smaller.

Christophe.


> ---
>  gcc/function.c| 213 
> +-
>  gcc/shrink-wrap.c | 171 ---
>  gcc/shrink-wrap.h |   6 --
>  3 files changed, 16 insertions(+), 374 deletions(-)
>
> diff --git a/gcc/function.c b/gcc/function.c
> index f6eb56c..b9a6338 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -5753,49 +5753,6 @@ prologue_epilogue_contains (const_rtx insn)
>return 0;
>  }
>
> -/* Insert use of return register before the end of BB.  */
> -
> -static void
> -emit_use_return_register_into_block (basic_block bb)
> -{
> -  start_sequence ();
> -  use_return_register ();
> -  rtx_insn *seq = get_insns ();
> -  end_sequence ();
> -  rtx_insn *insn = BB_END (bb);
> -  if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, PATTERN (insn)))
> -insn = prev_cc0_setter (insn);
> -
> -  emit_insn_before (seq, insn);
> -}
> -
> -
> -/* Create a return pattern, either simple_return or return, depending on
> -   simple_p.  */
> -
> -static rtx_insn *
> -gen_return_pattern (bool simple_p)
> -{
> -  return (simple_p
> - ? targetm.gen_simple_return ()
> - : targetm.gen_return ());
> -}
> -
> -/* Insert an appropriate return pattern at the end of block BB.  This
> -   also means updating block_for_insn appropriately.  SIMPLE_P is
> -   the same as in gen_return_pattern and passed to it.  */
> -
> -void
> -emit_return_into_block (bool simple_p, basic_block bb)
> -{
> -  rtx_jump_insn *jump = emit_jump_insn_after (gen_return_pattern (simple_p),
> - BB_END (bb));
> -  rtx pat = PATTERN (jump);
> -  if (GET_CODE (pat) == PARALLEL)
> -pat = XVECEXP (pat, 0, 0);
> -  gcc_assert (ANY_RETURN_P (pat));
> -  JUMP_LABEL (jump) = pat;
> -}
>
>  /* Set JUMP_LABEL for a return insn.  */
>
> @@ -5811,135 +5768,6 @@ set_return_jump_label (rtx_insn *returnjump)
>  JUMP_LABEL (returnjump) = ret_rtx;
>  }
>
> -/* Return true if there are any active insns between HEAD and TAIL.  */
> -bool
> -active_insn_between (rtx_insn *head, rtx_insn *tail)
> -{
> -  while (tail)
> -{
> -  if (active_insn_p (tail))
> -   return true;
> -  if (tail == head)
> -   return false;
> -  tail = PREV_INSN (tail);
> -}
> -  return false;
> -}
> -
> -/* LAST_BB is a block that exits, and empty of active instructions.
> -   Examine its predecessors for jumps that can be converted to
> -   (conditional) returns.  */
> -vec
> -convert_jumps_to_returns (basic_block last_bb, bool simple_p,
> - vec unconverted ATTRIBUTE_UNUSED)
> -{
> -  int i;
> -  basic_block bb;
> -  edge_iterator ei;
> -  edge e;
> -  auto_vec src_bbs (EDGE_COUNT (last_bb->preds));
> -
> -  FOR_EACH_EDGE (e, ei, last_bb->preds)
> -if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
> -  src_bbs.quick_push (e->src);
> -
> -  rtx_insn *label = BB_HEAD (last_bb);
> -
> -  FOR_EACH_VEC_ELT (src_bbs, i, bb)
> -{
> -  rtx_insn *jump = BB_END (bb);
> -
> -  if (!JUMP_P (jump) || JUMP_LABEL (jump) != label)
> -   continue;
> -
> -  e = find_edge (bb, last_bb);
> -
> -  /* If we have an unconditional jump, we can replace that
> -with a simple return instruction.  */
> -  if (simplejump_p (jump))
> -   {
> - /* The use of the return register might be present in the exit
> -fallthru block.  Either:
> -- removing 

Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-09 Thread Bernd Schmidt

On 05/09/2016 03:37 PM, Bernd Edlinger wrote:

On 05/09/16 09:56, Richard Biener wrote:


At least it sounds to me that its semantics can be fully expressed
with generic asms?  (Maybe apart from the only-if-ASM_STRING-is-empty
part)



That was also my first idea too.

In simple cases an asm ("whatever"); should do the same as
asm ("whatever" ::: );

Adding a "memory" to the clobber list would be simple that's true.

But in general it can be pretty complicated, especially if the
string contains the special characters % { | }.


Is the only difference in how the string is output? Maybe we can have a 
slightly different form of ASM_OPERANDS (with a bit set, or with the 
string wrapped in something else) to indicate that it's old-style.



Bernd


Re: [PATCH] Fix PR70497, missed "subreg" CSE on GIMPLE

2016-05-09 Thread Richard Biener
On Mon, 9 May 2016, Marc Glisse wrote:

> On Mon, 9 May 2016, Richard Biener wrote:
> 
> > The following patch implements CSEing of "subreg" reads from memory
> > like (from the testcase in the PR)
> > 
> > union U { int i[16]; char c; };
> > 
> > char foo(int i)
> > {
> >  union U u;
> >  u.i[0] = i;
> >  return u.c;
> > }
> > 
> > CSEing u.c as (char)i and thus removing u during GIMPLE optimizations.
> > 
> > The patch always goes via generating BIT_FIELD_REFs and letting them
> > be simplified via the match-and-simplify machinery.  This means it
> > replaces handling of complex component and vector extracts we've been
> > able to do before.
> > 
> > I didn't restrict the kind of BIT_FIELD_REFs much apart from requiring
> > byte-size accesses.  I did inspect code generated on powerpc (big-endian)
> > for the testcase though (also to verify any endianess issues) and didn't
> > spot anything wrong (even for non-lowpart "subregs").
> 
> I expect this will also fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28367
> (reading an element from a vector through a union)?

Yes.  Will add the testcase and adjust the ChangeLog entry.

Thanks for the heads-up.

Richard.


Re: [PATCH v2] add support for placing variables in shared memory

2016-05-09 Thread Nathan Sidwell

On 05/06/16 13:11, Alexander Monakov wrote:

Allow using __attribute__((shared)) to place static variables in '.shared'
memory space.

Changes in v2:
- reword diagnostic message in nvptx_handle_shared_attribute to follow other
  backends ("... attribute not allowed with auto storage class");
- reject explicit initialization of ".shared" memory variables;
- add testcases.


thanks, this is ok to commit now -- I see no reason why it needs to wait.

nathan


Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-09 Thread Bernd Edlinger
On 05/09/16 09:56, Richard Biener wrote:
> On Thu, 5 May 2016, Bernd Edlinger wrote:
>
>> Hi!
>>
>> this patch is inspired by recent discussion about basic asm:
>>
>> Currently a basic asm is an instruction scheduling barrier,
>> but not a memory barrier, and most surprising, basic asm
>> does _not_ implicitly clobber CC on targets where
>> extended asm always implicitly clobbers CC, even if
>> nothing is in the clobber section.
>>
>> This patch makes basic asm implicitly clobber CC on certain
>> targets, and makes the basic asm implicitly clobber memory,
>> but no general registers, which is what could be expected.
>>
>> This is however only done for basic asm with non-empty
>> assembler string, which is in sync with David's proposed
>> basic asm warnings patch.
>>
>> Due to the change in the tree representation, where
>> ASM_INPUT can now be the first element of a
>> PARALLEL block with the implicit clobber elements,
>> there are some changes necessary.
>>
>> Most of the changes in the middle end, were necessary
>> because extract_asm_operands can not be used to find out
>> if a PARALLEL block is an asm statement, but in most cases
>> asm_noperands can be used instead.
>>
>> There are also changes necessary in two targets: pa, and ia64.
>> I have successfully built cross-compilers for these targets.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
>> OK for trunk?
>
> I'm generally sympathetic with the change but I wonder if it would
> make sense to re-write "basic asm" into general asms to not
> need to special case them.  I'd do that during gimplification
> for example.
>
> At least it sounds to me that its semantics can be fully expressed
> with generic asms?  (Maybe apart from the only-if-ASM_STRING-is-empty
> part)
>

That was also my first idea too.

In simple cases an asm ("whatever"); should do the same as
asm ("whatever" ::: );

Adding a "memory" to the clobber list would be simple that's true.

But in general it can be pretty complicated, especially if the
string contains the special characters % { | }.

For example asm ("#%") is OK, but asm ("#%" :) is an ERROR.
So, single % must be duplicated, that may be a general rule (hopefully).

On some targets { | } mean different things, and must be escaped,
but on other targets these must not be escaped.  So that is target
dependent.

Some targets replace whatever they want with the ASM_OUTPUT_OPCODE hook.
Example: i386 replaces "%v" with "v" or "", tilegx replaces "pseudo"
with "", this hook is only called with extended asm.  And we must know
what it does and reverse it's effect.  Here it starts to become
difficult.

And the ia64 target have different semantics for basic asm than without,
i.e. they always emit stop bits for traditional asms.  So they
make a difference between extended and basic asm.

I have really no idea how to do this when extended and basic asm have
exactly the same tree structure.

That's why I thought I can as well add the clobbers to the basic asm's
tree representation.



Thanks
Bernd.

> Thanks,
> Richard.
>


Re: [PATCH] Take known zero bits into account when checking extraction.

2016-05-09 Thread Marc Glisse

On Mon, 9 May 2016, Dominik Vogt wrote:


This turns out to be quite difficult.  A small test function
effectively just returns the argument:

 unsigned long bar (unsigned long in)
 {
   if ((in & 1) == 0)
 in = (in & ~(unsigned long)1);

   return in;
 }

However, Gcc does not notice that the AND is a no-op.  As far as I
understand, zero bit tracking is only done in "combine", so when
folding the assignment statement the information that the lowest
bit is zero is not available and therefore the no-op is not
detected?


VRP is also supposed to track bits that may/must be non-zero. It may be 
possible to enhance it to handle this case.


--
Marc Glisse


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-09 Thread Christophe Lyon
On 9 May 2016 at 15:29, Jeff Law  wrote:
> On 05/09/2016 01:37 AM, Christophe Lyon wrote:
>
>>
>> Hi,
>> After this merge, I'm seeing lots of timeouts on arm (using QEMU).
>> Is this "expected"? (as in: should I increase my timeout value)
>
> I wouldn't say it's expected; this is the first time Cilk+ has been
> supported on ARM.  It could be a bug in the ARM support in the runtime, an
> ARM compiler bug or even a bug in the ARM QEMU support.
>
> Probably the first step is to see if it's working properly on real hardware.
> That would at least allow us to eliminate QEMU from the equation if it's
> failing in the same manner on a real machine.
>
OK, I'll check that.
I wanted to know if I was missing something obvious.

> Jeff
>


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-09 Thread Jeff Law

On 05/09/2016 01:37 AM, Christophe Lyon wrote:



Hi,
After this merge, I'm seeing lots of timeouts on arm (using QEMU).
Is this "expected"? (as in: should I increase my timeout value)
I wouldn't say it's expected; this is the first time Cilk+ has been 
supported on ARM.  It could be a bug in the ARM support in the runtime, 
an ARM compiler bug or even a bug in the ARM QEMU support.


Probably the first step is to see if it's working properly on real 
hardware.  That would at least allow us to eliminate QEMU from the 
equation if it's failing in the same manner on a real machine.


Jeff



Re: [RS6000] Stop regrename twiddling with split-stack prologue

2016-05-09 Thread Segher Boessenkool
On Wed, May 04, 2016 at 10:41:07PM +0930, Alan Modra wrote:
> Bootstrap and regression tested powerpc64le-linux.  Fixes 771 Go
> testsuite regressions.  OK to apply everywhere?
> 
> The alternative of adding all parameter regs used by cfun to the
> __morestack CALL_INSN_FUNCTION_USAGE and uses for cfun return value
> regs seems overkill when all we need to do is protect a very small
> sequence of insns.
> 
>   PR target/70947
>   * config/rs6000/rs6000.c (rs6000_expand_split_stack_prologue): Stop
>   regrename modifying insns saving lr before __morestack call.
>   * config/rs6000/rs6000.md (split_stack_return): Similarly for
>   insns restoring lr after __morestack call.

Okay for trunk.  Okay for the other branches after some suitable burn-in
period.  Thanks,


Segher


Re: [PATCH] Fix PR70497, missed "subreg" CSE on GIMPLE

2016-05-09 Thread Marc Glisse

On Mon, 9 May 2016, Richard Biener wrote:


The following patch implements CSEing of "subreg" reads from memory
like (from the testcase in the PR)

union U { int i[16]; char c; };

char foo(int i)
{
 union U u;
 u.i[0] = i;
 return u.c;
}

CSEing u.c as (char)i and thus removing u during GIMPLE optimizations.

The patch always goes via generating BIT_FIELD_REFs and letting them
be simplified via the match-and-simplify machinery.  This means it
replaces handling of complex component and vector extracts we've been
able to do before.

I didn't restrict the kind of BIT_FIELD_REFs much apart from requiring
byte-size accesses.  I did inspect code generated on powerpc (big-endian)
for the testcase though (also to verify any endianess issues) and didn't
spot anything wrong (even for non-lowpart "subregs").


I expect this will also fix 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28367 (reading an element 
from a vector through a union)?


--
Marc Glisse


Re: [PATCH], Add PowerPC ISA 3.0 vector d-form addressing

2016-05-09 Thread Segher Boessenkool
Hi Mike,

On Thu, May 05, 2016 at 02:05:19PM -0400, Michael Meissner wrote:
> > > With this patch, I enable -mlra if the user did not specify either -mlra 
> > > or
> > > -mno-lra on the command line, and -mcpu=power9 or -mpower9-dform-vector 
> > > were
> > > used. I also enabled -mvsx-timode if LRA was used, which also is a RELOAD
> > > issue, that works with LRA.
> > 
> > I don't like enabling LRA if the user didn't ask for it; it is a bit too
> > surprising.  What do you do if there is -mno-lra explicitly?  You can just
> > do the same if no-lra is implicit?
> 
> Ok.

You didn't however change this afaics?

> > > --- gcc/config/rs6000/rs6000.opt  
> > > (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> > > (revision 235831)
> > > +++ gcc/config/rs6000/rs6000.opt  (.../gcc/config/rs6000) (working copy)
> > > @@ -470,8 +470,8 @@ Target RejectNegative Joined UInteger Va
> > >  -mlong-double-Specify size of long double (64 or 128 bits).
> > >  
> > >  mlra
> > > -Target Report Var(rs6000_lra_flag) Init(0) Save
> > > -Use LRA instead of reload.
> > > +Target Undocumented Mask(LRA) Var(rs6000_isa_flags)
> > > +Use the LRA register allocator instead of the reload register allocator.
> > 
> > It wasn't "undocumented" before?  Why the change to a mask bit btw?
> 
> It was always meant to be undocumented, but I changed to be similar to
> before. I am trying to change all of the random switches that set a word to be
> an option mask, so I made that part of the change in these next patches.

I agree it should be undocumented because hopefully one day all reload
will not exist at all anymore.  OTOH, all other archs with an -mlra
switch do not have it hidden, so we might as well follow suit there.

> I did remove setting it for -mcpu=power9.

It doesn't look like it?  Please check.

> @@ -94,6 +95,7 @@
>| OPTION_MASK_FPRND\
>| OPTION_MASK_HTM  \
>| OPTION_MASK_ISEL \
> +  | OPTION_MASK_LRA  \
>| OPTION_MASK_MFCRF\
>| OPTION_MASK_MFPGPR   \
>| OPTION_MASK_MODULO   \

> > > +mpower9-dform-scalar
> > > +Target Report Mask(P9_DFORM_SCALAR) Var(rs6000_isa_flags)
> > > +Use/do not use scalar register+offset memory instructions added in ISA 
> > > 3.0.
> > > +
> > > +mpower9-dform-vector
> > > +Target Report Mask(P9_DFORM_VECTOR) Var(rs6000_isa_flags)
> > > +Use/do not use vector register+offset memory instructions added in ISA 
> > > 3.0.
> > > +
> > >  mpower9-dform
> > > -Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
> > > -Use/do not use vector and scalar instructions added in ISA 3.0.
> > > +Target Report Var(TARGET_P9_DFORM_BOTH) Init(-1) Save
> > > +Use/do not use register+offset memory instructions added in ISA 3.0.
> > 
> > These should probably all be undocumented, though (they're not something
> > users should use).
> 
> I will make -mpower9-dform public (which I thought it was, but evidently I
> missed adding the documentation for GCC 6), but I will make the -scalar and
> -vector forms private.

You think this is something users are expected to twiddle?  Okay then.

> [gcc]
> 2016-05-05  Michael Meissner  
> 
>   * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Use
>   -mpower9-dform-scalar instead of -mpower9-dform. Add note not to
>   include -mpower9-dform-vector until we switch over to LRA.

Thanks for the better changelog, much appreciated.  Two spaces after
a full stop though.


Segher


[PATCH] Fix PR70497, missed "subreg" CSE on GIMPLE

2016-05-09 Thread Richard Biener

The following patch implements CSEing of "subreg" reads from memory
like (from the testcase in the PR)

union U { int i[16]; char c; };

char foo(int i)
{
  union U u;
  u.i[0] = i;
  return u.c;
}

CSEing u.c as (char)i and thus removing u during GIMPLE optimizations.

The patch always goes via generating BIT_FIELD_REFs and letting them
be simplified via the match-and-simplify machinery.  This means it
replaces handling of complex component and vector extracts we've been
able to do before.

I didn't restrict the kind of BIT_FIELD_REFs much apart from requiring
byte-size accesses.  I did inspect code generated on powerpc (big-endian)
for the testcase though (also to verify any endianess issues) and didn't
spot anything wrong (even for non-lowpart "subregs").

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-05-09  Richard Biener  

PR tree-optimization/70497
* tree-ssa-sccvn.c (vn_nary_build_or_lookup): New function
split out from ...
(visit_reference_op_load): ... here.
(vn_reference_lookup_3): Use it to handle subreg-like accesses
with simplified BIT_FIELD_REFs.
* tree-ssa-pre.c (eliminate_insert): Handle inserting BIT_FIELD_REFs.
* tree-complex.c (extract_component): Handle BIT_FIELD_REFs
correctly.

* gcc.dg/torture/20160404-1.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-54.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c.orig   2016-05-03 15:15:58.002994741 +0200
--- gcc/tree-ssa-sccvn.c2016-05-03 15:35:41.976538344 +0200
*** vn_reference_lookup_or_insert_for_pieces
*** 1610,1615 
--- 1610,1714 
 operands.copy (), value, value_id);
  }
  
+ static vn_nary_op_t vn_nary_op_insert_stmt (gimple *stmt, tree result);
+ 
+ /* Hook for maybe_push_res_to_seq, lookup the expression in the VN tables.  */
+ 
+ static tree
+ vn_lookup_simplify_result (code_helper rcode, tree type, tree *ops)
+ {
+   if (!rcode.is_tree_code ())
+ return NULL_TREE;
+   vn_nary_op_t vnresult = NULL;
+   return vn_nary_op_lookup_pieces (TREE_CODE_LENGTH ((tree_code) rcode),
+  (tree_code) rcode, type, ops, );
+ }
+ 
+ /* Return a value-number for RCODE OPS... either by looking up an existing
+value-number for the simplified result or by inserting the operation.  */
+ 
+ static tree
+ vn_nary_build_or_lookup (code_helper rcode, tree type, tree *ops)
+ {
+   tree result = NULL_TREE;
+   /* We will be creating a value number for
+ROCDE (OPS...).
+  So first simplify and lookup this expression to see if it
+  is already available.  */
+   mprts_hook = vn_lookup_simplify_result;
+   bool res = false;
+   switch (TREE_CODE_LENGTH ((tree_code) rcode))
+ {
+ case 1:
+   res = gimple_resimplify1 (NULL, , type, ops, vn_valueize);
+   break;
+ case 2:
+   res = gimple_resimplify2 (NULL, , type, ops, vn_valueize);
+   break;
+ case 3:
+   res = gimple_resimplify3 (NULL, , type, ops, vn_valueize);
+   break;
+ }
+   mprts_hook = NULL;
+   gimple *new_stmt = NULL;
+   if (res
+   && gimple_simplified_result_is_gimple_val (rcode, ops))
+ /* The expression is already available.  */
+ result = ops[0];
+   else
+ {
+   tree val = vn_lookup_simplify_result (rcode, type, ops);
+   if (!val)
+   {
+ gimple_seq stmts = NULL;
+ result = maybe_push_res_to_seq (rcode, type, ops, );
+ if (result)
+   {
+ gcc_assert (gimple_seq_singleton_p (stmts));
+ new_stmt = gimple_seq_first_stmt (stmts);
+   }
+   }
+   else
+   /* The expression is already available.  */
+   result = val;
+ }
+   if (new_stmt)
+ {
+   /* The expression is not yet available, value-number lhs to
+the new SSA_NAME we created.  */
+   /* Initialize value-number information properly.  */
+   VN_INFO_GET (result)->valnum = result;
+   VN_INFO (result)->value_id = get_next_value_id ();
+   gimple_seq_add_stmt_without_update (_INFO (result)->expr,
+ new_stmt);
+   VN_INFO (result)->needs_insertion = true;
+   /* As all "inserted" statements are singleton SCCs, insert
+to the valid table.  This is strictly needed to
+avoid re-generating new value SSA_NAMEs for the same
+expression during SCC iteration over and over (the
+optimistic table gets cleared after each iteration).
+We do not need to insert into the optimistic table, as
+lookups there will fall back to the valid table.  */
+   if (current_info == optimistic_info)
+   {
+ current_info = valid_info;
+ vn_nary_op_insert_stmt (new_stmt, result);
+ current_info = optimistic_info;
+   }
+   

PING^2: [PATCH] PR target/70454: Build x86 libgomp with -march=i486 or better

2016-05-09 Thread H.J. Lu
On Mon, May 2, 2016 at 6:46 AM, H.J. Lu  wrote:
> On Mon, Apr 25, 2016 at 1:36 PM, H.J. Lu  wrote:
>> If x86 libgomp isn't compiled with -march=i486 or better, append
>> -march=i486 XCFLAGS for x86 libgomp build.
>>
>> Tested on i686 with and without --with-arch=i386.  Tested on
>> x86-64 with and without --with-arch_32=i386.  OK for trunk?
>>
>>
>> H.J.
>> ---
>> PR target/70454
>> * configure.tgt (XCFLAGS): Append -march=i486 to compile x86
>> libgomp if needed.
>> ---
>>  libgomp/configure.tgt | 36 
>>  1 file changed, 16 insertions(+), 20 deletions(-)
>>
>> diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
>> index 77e73f0..c876e80 100644
>> --- a/libgomp/configure.tgt
>> +++ b/libgomp/configure.tgt
>> @@ -67,28 +67,24 @@ if test x$enable_linux_futex = xyes; then
>> ;;
>>
>>  # Note that bare i386 is not included here.  We need cmpxchg.
>> -i[456]86-*-linux*)
>> +i[456]86-*-linux* | x86_64-*-linux*)
>> config_path="linux/x86 linux posix"
>> -   case " ${CC} ${CFLAGS} " in
>> - *" -m64 "*|*" -mx32 "*)
>> -   ;;
>> - *)
>> -   if test -z "$with_arch"; then
>> - XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
>> +   # Need i486 or better.
>> +   cat > conftestx.c <> +#if defined __x86_64__ || defined __i486__ || defined __pentium__ \
>> +  || defined __pentiumpro__ || defined __pentium4__ \
>> +  || defined __geode__ || defined __SSE__
>> +# error Need i486 or better
>> +#endif
>> +EOF
>> +   if ${CC} ${CFLAGS} -c -o conftestx.o conftestx.c > /dev/null 2>&1; 
>> then
>> +   if test "${target_cpu}" = x86_64; then
>> +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>> +   else
>> +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
>> fi
>> -   esac
>> -   ;;
>> -
>> -# Similar jiggery-pokery for x86_64 multilibs, except here we
>> -# can't rely on the --with-arch configure option, since that
>> -# applies to the 64-bit side.
>> -x86_64-*-linux*)
>> -   config_path="linux/x86 linux posix"
>> -   case " ${CC} ${CFLAGS} " in
>> - *" -m32 "*)
>> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>> -   ;;
>> -   esac
>> +   fi
>> +   rm -f conftestx.c conftestx.o
>> ;;
>>
>>  # Note that sparcv7 and sparcv8 is not included here.  We need cas.
>> --
>> 2.5.5
>>
>
> PING.
>

PING.

-- 
H.J.


Re: [RS6000] Fragile testcase breaks with -frename-registers

2016-05-09 Thread Segher Boessenkool
On Wed, May 04, 2016 at 12:05:45AM +0930, Alan Modra wrote:
> See comments added below.  Bootstrapped and regression tested
> powerpc64le-linux.  OK to apply?
> 
>   PR testsuite/70826
>   * gcc.target/powerpc/savres.c: Compile with -fno-rename-registers.

Okay for trunk.  Thanks!


Segher


RE: [PATCH 2/4] [MIPS] Add pipeline description for MSA

2016-05-09 Thread Robert Suchanek
Hi Matthew,

> > gcc/ChangeLog:
> >
> > * config/mips/i6400.md (i6400_fpu_intadd, i6400_fpu_logic)
> > (i6400_fpu_div, i6400_fpu_cmp, i6400_fpu_float, i6400_fpu_store)
> > (i6400_fpu_long_pipe, i6400_fpu_logic_l, i6400_fpu_float_l)
> > (i6400_fpu_mult): New cpu units.
> > (i6400_msa_add_d, i6400_msa_int_add, i6400_msa_short_logic3)
> > (i6400_msa_short_logic2, i6400_msa_short_logic, i6400_msa_move)
> > (i6400_msa_cmp, i6400_msa_short_float2, i6400_msa_div_d)
> > (i6400_msa_div_w, i6400_msa_div_h, i6400_msa_div_b, i6400_msa_copy)
> > (i6400_msa_branch, i6400_fpu_msa_store, i6400_fpu_msa_load)
> > (i6400_fpu_msa_move, i6400_msa_long_logic1, i6400_msa_long_logic2)
> > (i6400_msa_mult, i6400_msa_long_float2, i6400_msa_long_float4)
> > (i6400_msa_long_float5, i6400_msa_long_float8, i6400_msa_fdiv_df)
> > (i6400_msa_fdiv_sf): New reservations.
> > * config/mips/p5600.md (p5600_fpu_intadd, p5600_fpu_cmp)
> > (p5600_fpu_float, p5600_fpu_logic_a, p5600_fpu_logic_b,
> > p5600_fpu_div)
> > (p5600_fpu_logic, p5600_fpu_float_a, p5600_fpu_float_b,)
> 
> Typo with "," at the end of the list
> 
> > (p5600_fpu_float_c, p5600_fpu_float_d, p5600_fpu_mult,
> > p5600_fpu_fdiv)
> > (p5600_fpu_load): New cpu units.
> > (msa_short_int_add, msa_short_logic, msa_short_logic_move_v)
> > (msa_short_cmp, msa_short_float2, msa_short_logic3,
> > msa_short_store4)
> > (msa_long_load, msa_short_store, msa_long_logic, msa_long_float2)
> > (msa_long_float4, msa_long_float5, msa_long_float8, msa_long_mult)
> > (msa_long_fdiv, msa_long_div): New reservations.
> 
> I assume this patch has not changed since it was posted.

That's correct.

> 
> OK to commit.

Committed as r236031.

Regards,
Robert


RE: [PATCH 1/4] [MIPS] Add support for MIPS SIMD Architecture (MSA)

2016-05-09 Thread Robert Suchanek
Hi Matthew,

> One small tweak, ChangeLog should wrap at 74 columns.

Done.

> Please consider the
> full list of authors for this patch as it has had many major contributors
> now. I believe this includes at least the following for the implementation
> but fewer for the testsuite updates:
> 
> Robert Suchanek
> Sameera Deshpande
> Matthew Fortune
> Graham Stott
> Chao-ying Fu

Of course.  All patches have or will have the correct contributors in ChangeLog.

> 
> Otherwise, OK to commit!

I removed __builtin_msa_[d]lsa from extend.texi as part of the pre-commit 
checks.
These two were listed in the docs but never defined. Removed as obvious.

Committed as r236030.

Regards,
Robert


RE: [MIPS r5900] libgcc floating point fixes

2016-05-09 Thread Matthew Fortune
Hi Woon Yung,

Woon yung Liu  writes:
> Thank you for your feedback.
> 
> New changes in this updated version:
> 
> gcc/config/mips/mips.c:added a check against the use of both -mips16 and
> -march=r5900.
> libgcc/configure.ac:added a check for the support of -mips16 by the
> target.
> libgcc/config.host: replaced all explicit checks for the r5900 arch with
> a test for libgcc_cv_mips16, when checking for whether the MIPS16 ASE is
> supported.
> 
> 
> 
> I have tested the mechanism by replacing the test for -mips16 with -mno-
> mips16, and I can see the test result for libgcc_cv_mips16 within
> config.host getting changed.
> 
> 
> Please feel free to give further comments.

Sorry for the delay, I was waiting for trunk to open and then go side tracked.
With a tweak to the new check in mips.c I think this is looking OK.

Do you have GCC FSF copyright assignment in place? Your change might be small
enough not to need it but it assignment would make it easier.

I can find copyright assignment from Joergen who was original author of this
which is good news.

I assume you are planning on submitting your work on r5900 vector support which
will certainly need copyright assignment so this could be a good time to sort
it out. Sorry for not checking this earlier so this could already be in
progress.

Let me know and I'll send the forms.

Thanks,
Matthew

> 
> Thanks and regards,
> -W Y
> 
> 
> On Tuesday, February 23, 2016 5:09 PM, Matthew Fortune
>  wrote:
> Woon yung Liu  wries
> > Bump! Sorry, but could I please get an answer? I'm willing to update
> > the patch without credit, if necessary.
> 
> Hi WY,
> 
> Apologies for exceptionally slow response.
> 
> The patch you referenced is mostly OK but I would like to get the MIPS16
> check changed to a configure time check for MIPS16 support rather than
> checking for r5900. I.e. I think we should have GCC raise an error for -
> march=r5900 -mips16 and then a configure time check using just -mips16
> would fail. That can then be used to choose whether to build the mips16
> code instead of this:
> 
> +if test x$with_arch != xr5900; then
> +tmake_file="$tmake_file mips/t-mips16"
> +fi
> 
> This change should also make it possible to have mips.exp simply skip
> the mips16 tests for r5900 without having to tell it explicitly about
> r5900.
> 
> Thanks,
> Matthew
> 
> 
> > The patch is working for the R5900 hard-fp mode. I've also used the
> > same, patched copy of GCC, to build the toolchain for the IOP (MIPS
> > R3000A, 32-bit MIPS I with no FPU) and it also builds correctly.
> >
> > If I should be writing to someone else specifically, could someone
> > please tell me who I should be writing to instead?
> >
> >
> > Thanks and regards,
> > -W Y
> >
> >
> >
> > On Tuesday, January 26, 2016 5:41 PM, Woon yung Liu
>  wrote:
> > Hi,
> >
> > I refer to the previous message by Juergen, regarding his patch to
> libgcc.
> > https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01725.html
> >
> > As of now, libgcc (of GCC v5.3.0) still has the problem of building
> > support for both soft and hard floats, when there is no support for
> > hard floats by the R5900 (and hence resulting in the generation of
> recursive functions like extendsfdf2).
> >
> > That patch doesn't seem to have been committed. I would very much like
> > to help to see it get committed because GCC's support for the R5900 is
> > currently not suitable for PlayStation 2 development;
> > software-floating point emulation is severely detrimental to
> performance.
> > What else needs to be done first, before it can be accepted?
> >
> > Thanks and regards,
> > -W Y


Re: [RFC][PR70841] Reassoc fails to handle FP division

2016-05-09 Thread Richard Biener
On Thu, May 5, 2016 at 5:33 AM, kugan  wrote:
> Hi,
>
> I tried to handle reassoc fails to handle FP division. I think the best way
> to do this is to do like what is done for MINUS_EXPR. I.e, convert the
> RDIV_EXPR to MULT_EXPR by (1/x) early and later in opt_rdiv_with_multiply,
> optimize it.
>
> Here is a patch that passes bootstrap and regression testing on
> x86-64-linux-gnu.
>
> Does this look Ok for trunk?

Implementation-wise I'd rather do opt_rdiv_with_multiply from
optimize_ops_list like
eliminate_plus_minus_pair is implemented.

But then in general I am concerned about not doing sth like reproagate_negates.

That is, this patch will turn a / b into a * (1/b) and I think with
just -freciprocal-math
nothing will turn it back to a / b.  We also may do sth like a * b / c
-> a * (1/c) * b
-> (a / c) * b which is generally undesirable if we are not able to
cancel out sth
due to the change of the FP outcome this generally causes.

So in the end while my suggestion in the PR "works" it may not be something
we want to implement.  Eventually we want to treat divisions as association
barriers unless we can make them cancel out.

I also don't particularly like the way reassoc handles minus/negates
by re-writing
the GIMPLE IL.  I'd very much rather have a op->negate flag (or for reciprocals
op->recip).

With -freciprocal-math we made sure (back in time) that SPEC 2006
doesn't miscompare
when using it, so I'd like to verify that is still the case when doing
sth like with this patch.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/testsuite/ChangeLog:
>
> 2016-05-05  Kugan Vivekanandarajah  
>
> PR middle-end/70841
> * gcc.dg/tree-ssa/pr70841.c: New test.
>
> gcc/ChangeLog:
>
> 2016-05-05  Kugan Vivekanandarajah  
>
> PR middle-end/70841
> * tree-ssa-reassoc.c (should_break_up_rdiv): New.
> (break_up_rdiv): New
> (break_up_subtract_bb): Call should_break_up_rdiv and break_up_rdiv.
> (do_reassoc): Rename break_up_subtract_bb to
> break_up_subtract_and_div_bb.
> (sort_cmp_int): New.
> (opt_rdiv_with_multiply): New.
> (reassociate_bb): Call opt_rdiv_with_multiply.
> (do_reassoc): Renamed called function break_up_subtract_bb to
> break_up_subtract_and_div_bb.


Re: [patch] libstdc++/71004 fix recursive_directory_iterator default constructor

2016-05-09 Thread Jonathan Wakely

On 09/05/16 11:09 +0100, Jonathan Wakely wrote:

This fixes some uninitialized members in the
filesystem::recursive_directory_iterator default constructor.

Tested x86_64-linux, committed to trunk. Backports to follow soon.


Oops, I think I need more caffeine. The new function I added to the
test wasn't called, but would have crashed if it was.

Tested again x86_64-linux, committed to trunk.


commit d0bf5c26f645baf1f568466d5bd48021293a806a
Author: Jonathan Wakely 
Date:   Mon May 9 12:30:51 2016 +0100

libstdc++/71004 fix recent additions to testcase

	PR libstdc++/71004
	* testsuite/experimental/filesystem/iterators/
	recursive_directory_iterator.cc: Fix test02 to not call member
	functions on invalid iterator, and use VERIFY not assert.

diff --git a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
index a263602..b5f71be 100644
--- a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
+++ b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
@@ -100,15 +100,16 @@ test01()
 void
 test02()
 {
+  bool test __attribute__((unused)) = false;
+
   // libstdc++71004
   const fs::recursive_directory_iterator it;
-  assert( it.options() == fs::directory_options{} );
-  assert( it.depth() == 0 );
-  assert(it.recursion_pending() == false);
+  VERIFY( it == fs::recursive_directory_iterator() );
 }
 
 int
 main()
 {
   test01();
+  test02();
 }


Re: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2016-05-09 Thread Nick Clifton
Hi Kaushik,

> gcc/ChangeLog
> 2016-05-04  Kaushik Phatak  
> 
>   * config/rl78/rl78.c (rl78_expand_prologue): Save the MDUC related
>   registers in all interrupt handlers if necessary.
>   (rl78_option_override): Add warning.
>   (MUST_SAVE_MDUC_REGISTERS): New macro.
>   (rl78_expand_epilogue): Restore the MDUC registers if necessary.
>   * config/rl78/rl78.c (check_mduc_usage): New function.
>   * config/rl78/rl78.c (mduc_regs): New structure to hold MDUC register 
> data.
>   * config/rl78/rl78.md (is_g13_muldiv_insn): New attribute.
>   * config/rl78/rl78.md (mulsi3_g13): Add is_g13_muldiv_insn attribute.
>   * config/rl78/rl78.md (udivmodsi4_g13): Add is_g13_muldiv_insn 
> attribute.
>   * config/rl78/rl78.md (mulhi3_g13): Add is_g13_muldiv_insn attribute.
>   * config/rl78/rl78.opt (msave-mduc-in-interrupts): New option.
>   * doc/invoke.texi (RL78 Options): Add -msave-mduc-in-interrupts.
 
Approved and applied - thanks for persevering with this.

Cheers
  Nick



Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-05-09 Thread Richard Biener
On Thu, May 5, 2016 at 3:57 AM, kugan  wrote:
> Hi Richard,
>
>>
>> maybe instert_stmt_after will help here, I don't think you got the
>> insertion
>> logic correct, thus insert_stmt_after (mul_stmt, def_stmt) which I think
>> misses GIMPLE_NOP handling.  At least
>>
>> +  if (SSA_NAME_VAR (op) != NULL
>>
>> huh?  I suppose you could have tested SSA_NAME_IS_DEFAULT_DEF
>> but just the GIMPLE_NOP def-stmt test should be enough.
>>
>> + && gimple_code (def_stmt) == GIMPLE_NOP)
>> +   {
>> + gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN
>> (cfun)));
>> + stmt = gsi_stmt (gsi);
>> + gsi_insert_before (, mul_stmt, GSI_NEW_STMT);
>>
>> not sure if that is the best insertion point choice, it un-does all
>> code-sinking done
>> (and no further sinking is run after the last reassoc pass).  We do know
>> we
>> are handling all uses of op in our chain so inserting before the plus-expr
>> chain root should work here (thus 'stmt' in the caller context).  I'd
>> use that here instead.
>> I think I'd use that unconditionally even if it works and not bother
>> finding something
>> more optimal.
>>
>
> I now tried using instert_stmt_after with special handling for GIMPLE_PHI as
> you described.

Thanks - I'd still say doing my last suggestion is likely better, at least if
'def_stmt' is not in the same basic-block as 'stmt'.  So can you do

  if (gimple_code (def_stmt) == GIMPLE_NOP
  || gimple_bb (stmt) != gimple_bb (def_stmt))
   {
...

instead?

>
>> Apart from this this now looks ok to me.
>>
>> But the testcases need some work
>>
>>
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do compile } */
>> ...
>> +
>> +/* { dg-final { scan-tree-dump-times "\\\*" 4 "reassoc1" } } */
>>
>> I would have expected 3.
>
>
> We now have an additional _15 = x_1(D) * 2

Ok.

>   Also please check for \\\* 5 for example
>>
>> to be more specific (and change the cases so you get different constants
>> for the different functions).
>
>
>>
>> That said, please make the scans more specific.
>
>
> I have now changes the test-cases to scan more specific multiplication scan
> as you wanted.
>
>
> Does this now look better?

Yes.  Ok with the above suggested change.

Thanks,
Richard.

>
> Thanks,
> Kugan


Re: Error out on -fvtable-verify without --enable-vtable-verify

2016-05-09 Thread Bernd Schmidt

On 05/09/2016 01:28 PM, Rainer Orth wrote:

Ok now?


Yes, thanks.


Bernd



Re: Error out on -fvtable-verify without --enable-vtable-verify

2016-05-09 Thread Rainer Orth
Hi Bernd,

> On 05/08/2016 12:44 PM, Rainer Orth wrote:
>> With the recent change not to install libvtv without
>> --enable-vtable-verify, I noticed that gcc/g++ would still accept
>> -fvtable-verify without errors, only to emit obscure link-time errors
>> about missing vtv_*.o (which hadn't been installed in that situation
>> before) and libvtv.
>>
>> It seems to me a much better user experience to emit a clear error
>> message in this case, which is what this patch does.
>
> Generally ok, but...
>
>> +AC_ARG_ENABLE(vtable-verify,
>> +[AS_HELP_STRING([--enable-vtable-verify],
>> +[enable vtable verification feature])],
>> +[case "$enableval" in
>> + yes) enable_vtable_verify=yes ;;
>> + no)  enable_vtable_verify=no ;;
>> + *)   enable_vtable_verify=no;;
>> + esac],
>> +[enable_vtable_verify=no])
>> +vtable_verify=`if test $enable_vtable_verify != no; then echo 1; else
>> echo 0; fi`
>> +AC_DEFINE_UNQUOTED(ENABLE_VTABLE_VERIFY, $vtable_verify,
>> +[Define 0/1 if vtable verification feature is enabled.])
>
> That looks a little overly complicated. Don't you get the enable_ variables
> set by autoconf? And if you do need the case statement, you might as well
> set things to 0/1 directly and skip the enable_vtable_verify variable
> entirely.

that's what you get for blindly copying configure.ac fragments.  The
following works instead:

diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -865,6 +865,14 @@ Valid choices are 'yes' and 'no'.]) ;;
   esac
 ], [enable_tls=''])
 
+AC_ARG_ENABLE(vtable-verify,
+[AS_HELP_STRING([--enable-vtable-verify],
+		[enable vtable verification feature])],,
+[enable_vtable_verify=no])
+vtable_verify=`if test x$enable_vtable_verify = xyes; then echo 1; else echo 0; fi`
+AC_DEFINE_UNQUOTED(ENABLE_VTABLE_VERIFY, $vtable_verify,
+[Define 0/1 if vtable verification feature is enabled.])
+
 AC_ARG_ENABLE(objc-gc,
 [AS_HELP_STRING([--enable-objc-gc],
 		[enable the use of Boehm's garbage collector with

I had to tighten the $enable_vtable_verify test to guard against a
non-no argument to --enable-vtable-verify being interpreted as yes.

Tested by just running gcc/configure without and with
--enable-vtable-verify, and with --enable-vtable-verify=nonstd (handled
as no).

Ok now?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][ARM] Use proper output modifier for DImode register in store exclusive patterns

2016-05-09 Thread Kyrill Tkachov

Hi all,

This seems to have fallen through the cracks.
Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00558.html

Thanks,
Kyrill

On 09/03/16 12:56, Kyrill Tkachov wrote:

Hi all,

I notice that the output code for our store exclusive patterns accesses 
unallocated memory.
It wants to output an strexd instruction with a pair of consecutive registers 
corresponding
to a DImode value. For that it creates the SImode top half of the DImode 
register and puts it
into operands[3]. But the pattern only defines entries only up to operands[2], 
with no match_dup 3
or like that, so operands[3] should technically be out of bounds.

We already have a mechanism for printing the top half of a DImode register, 
that's the 'H' output modifier.
So this patch changes those patterns to use that, eliminating the out of bounds 
access and making
the code a bit simpler as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-03-09  Kyrylo Tkachov  

* config/arm/sync.md (arm_store_exclusive):
Use 'H' output modifier on operands[2] rather than creating a new
entry in out-of-bounds memory of the operands array.
(arm_store_release_exclusivedi): Likewise.




Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-05-09 Thread Bin.Cheng
On Mon, May 9, 2016 at 11:54 AM, Richard Biener
 wrote:
> On Fri, May 6, 2016 at 11:42 AM, Bin.Cheng  wrote:
>> On Fri, May 6, 2016 at 10:40 AM, Bin.Cheng  wrote:
>>> On Tue, May 3, 2016 at 11:08 AM, Richard Biener
>>>  wrote:
 On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng  wrote:
> On Mon, May 2, 2016 at 10:00 AM, Richard Biener
>  wrote:
>> On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng  wrote:
>>> On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
>>>  wrote:
 On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng  wrote:
> Hi,
> Tree if-conversion sometimes cannot convert conditional array 
> reference into unconditional one.  Root cause is GCC conservatively 
> assumes newly introduced array reference could be out of array bound 
> and thus trapping.  This patch improves the situation by proving the 
> converted unconditional array reference is within array bound using 
> loop niter information.  To be specific, it checks every index of 
> array reference to see if it's within bound in 
> ifcvt_memrefs_wont_trap.  This patch also factors out 
> base_object_writable checking if the base object is writable or not.
> Bootstrap and test on x86_64 and aarch64, is it OK?

 I think you miss to handle the case optimally where the only
 non-ARRAY_REF idx is the dereference of the
 base-pointer for, say, p->a[i].  In this case we can use
 base_master_dr to see if p is unconditionally dereferenced
>>> Yes, will pick up this case.
>>>
 in the loop.  You also fail to handle the case where we have
 MEM_REF[].a[i] that is, you see a decl base.
>>> I am having difficulty in creating this case for ifcvt, any advices?  
>>> Thanks.
>>
>> Sth like
>>
>> float a[128];
>> float foo (int n, int i)
>> {
>>   return (*((float(*)[n])a))[i];
>> }
>>
>> should do the trick (w/o the component-ref).  Any other type-punning
>> would do it, too.
>>
 I suppose for_each_index should be fixed for this particular case (to
 return true), same for TARGET_MEM_REF TMR_BASE.

 +  /* The case of nonconstant bounds could be handled, but it would be
 + complicated.  */
 +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
 +  || !high || TREE_CODE (high) != INTEGER_CST)
 +return false;
 +

 handling of a non-zero but constant low bound is important - otherwise
 all this is a no-op for Fortran.  It
 shouldn't be too difficult to handle after all.  In fact I think your
 code does handle it correctly already.

 +  if (!init || TREE_CODE (init) != INTEGER_CST
 +  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop 
 (step))
 +return false;

 step == 0 should be easy to handle as well, no?  The index will simply
 always be 'init' ...

 +  /* In case the relevant bound of the array does not fit in type, or
 + it does, but bound + step (in type) still belongs into the range 
 of the
 + array, the index may wrap and still stay within the range of the 
 array
 + (consider e.g. if the array is indexed by the full range of
 + unsigned char).
 +
 + To make things simpler, we require both bounds to fit into type, 
 although
 + there are cases where this would not be strictly necessary.  */
 +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
 +return false;
 +
 +  low = fold_convert (type, low);

 please use wide_int for all of this.
>>> Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
>>> sure what's the meaning by "handle "low = fold_convert (type, low);"
>>> related code in wide_int".   Do you mean to use tree_int_cst_compare
>>> instead of tree_int_cst_compare in the following code?
>>
>> I don't think you need any kind of fits-to-type check here.  You'd simply
>> use to_widest () when operating on / comparing with high/low.
> But what would happen if low/high and init/step are different in type
> sign-ness?  Anything special I need to do before using wi::ltu_p or
> wi::lts_p directly?

 You want to use to_widest (min) which extends according to sign to
 an "infinite" precision signed integer.  So you can then use the new
 operator< overloads as well.

>>> Hi,
>>> Here is the updated patch.  It includes below changes according to
>>> review 

Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-05-09 Thread Richard Biener
On Thu, May 5, 2016 at 2:41 AM, kugan  wrote:
> Hi Richard,
>
>>
>> + int last = ops.length () - 1;
>> + bool negate_result = false;
>>
>> Do
>>
>>oe  = ops.last ();
>>
>
> Done.
>
>>
>> + if (rhs_code == MULT_EXPR
>> + && ops.length () > 1
>> + && ((TREE_CODE (ops[last]->op) == INTEGER_CST
>>
>> and last.op here and below
>>
>> +  && integer_minus_onep (ops[last]->op))
>> + || ((TREE_CODE (ops[last]->op) == REAL_CST)
>> + && real_equal (_REAL_CST
>> (ops[last]->op), 
>>
>
> Done.
>
>> Here the checks !HONOR_SNANS () && (!HONOS_SIGNED_ZEROS ||
>> !COMPLEX_FLOAT_TYPE_P)
>> are missing.  The * -1 might appear literally and you are only allowed
>> to turn it into a negate
>> under the above conditions.
>
>
> Done.
>
>>
>> +   {
>> + ops.unordered_remove (last);
>>
>> use ops.pop ();
>>
> Done.
>
>> + negate_result = true;
>>
>> Please move the whole thing under the else { } case of the ops.length
>> == 0, ops.length == 1 test chain
>> as you did for the actual emit of the negate.
>>
>
> I see your point. However, when we remove the (-1) from the ops list, that
> intern can result in ops.length becoming 1. Therefore, I moved the  the
> following  if (negate_result), outside the condition.

Ah, indeed.   But now you have to care for ops.length () == 0 and thus
the unconditonally ops.last () may now trap.  So I suggest to
do

+ operand_entry *last;
+ bool negate_result = false;
+ if (rhs_code == MULT_EXPR
+ && ops.length () > 1
   && (last = ops.last ())
+ && ((TREE_CODE (last->op) == INTEGER_CST

to avoid this.

>
>>
>> + if (negate_result)
>> +   {
>> + tree tmp = make_ssa_name (TREE_TYPE (lhs));
>> + gimple_set_lhs (stmt, tmp);
>> + gassign *neg_stmt = gimple_build_assign (lhs,
>> NEGATE_EXPR,
>> +  tmp);
>> + gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>> + gsi_insert_after (, neg_stmt, GSI_NEW_STMT);
>> + update_stmt (stmt);
>>
>> I think that if powi_result is also built you end up using the wrong
>> stmt so you miss a
>>
>>  stmt = SSA_NAME_DEF_STMT (lhs);
>
>
> Yes, indeed. This can happen and I have added this.
>
>>
>> here.  Also see the new_lhs handling of the powi_result case - again
>> you need sth
>> similar here (it's handling looks a bit fishy as well - this all needs
>> some comments
>> and possibly a (lot more) testcases).
>>
>> So, please do the above requested changes and verify the 'lhs' issues I
>> pointed
>> out by trying to add a few more testcase that also cover the case where a
>> powi
>> is detected in addition to a negation.  Please also add a testcase that
>> catches
>> (-y) * x * (-z).
>>
>
> Added this to the testcase.
>
> Does this look better now?

Yes - the patch is ok with the above suggested change.

Thanks,
Richard.

>
> Thanks,
> Kugan
>
>
>>> 2016-04-23  Kugan Vivekanandarajah  
>>>
>>>  PR middle-end/40921
>>>  * gcc.dg/tree-ssa/pr40921.c: New test.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-04-23  Kugan Vivekanandarajah  
>>>
>>>  PR middle-end/40921
>>>  * tree-ssa-reassoc.c (try_special_add_to_ops): New.
>>>  (linearize_expr_tree): Call try_special_add_to_ops.
>>>  (reassociate_bb): Convert MULT_EXPR by (-1) to NEGATE_EXPR.
>>>
>


Re: [PING*2][PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-05-09 Thread Richard Biener
On Wed, May 4, 2016 at 4:23 PM, Pierre-Marie de Rodat
 wrote:
> Ping for the patch submitted at
> . It applies just
> fine on the current trunk and still bootstrapps and regtests successfuly on
> x86_64-linux.
>
> Thank you in advance,

Ok and sorry for the delay.

Thanks,
Richard.

> --
> Pierre-Marie de Rodat


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-09 Thread Bernd Schmidt

On 05/09/2016 12:42 PM, Jakub Jelinek wrote:

On Mon, May 09, 2016 at 12:38:14PM +0200, Bernd Schmidt wrote:

On 05/09/2016 12:23 PM, Bernd Schmidt wrote:

On 05/06/2016 01:38 AM, Dhole wrote:

I've wrote a test case which fails (when it shouldn't) and I don't see
why.


I think it's setting the env var when executing the test, not when
executing the compiler.


Here's something that seems to work (with an appropriately adjusted
testcase). Cc'ing testsuite maintainers.


Wouldn't it be better to add a more general directive, e.g. one that would
allow setting any env var for the compiler, rather than one specific,
similarly to how dg-set-target-env-var allows any env var to be set for the
execution?  dg-set-compiler-env-var ?


Maybe. Eduard, can you look into that?


Bernd



Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-09 Thread Jonathan Wakely

On 09/05/16 13:41 +0300, Ville Voutilainen wrote:

On 9 May 2016 at 12:31, Jonathan Wakely  wrote:

On 08/05/16 14:43 +0300, Ville Voutilainen wrote:


Tested on Linux-PPC64.



OK for trunk, and we should backport to gcc-6-branch (and
gcc-5-branch?) soon.



gcc-5-branch doesn't have the "Improving pair and tuple"
implementation at all yet, afaik.


OK, I couldn't remember when those changes happened.


I'm somewhat hesitant to backport all that.


Yes, let's not do that. So this fix is only needed for gcc-6.



Re: [PATCH] Fix memory leak in tree-inliner

2016-05-09 Thread Richard Biener
On Fri, May 6, 2016 at 2:35 PM, Martin Liška  wrote:
> On 05/06/2016 12:56 PM, Richard Biener wrote:
>> Hmmm.  But this means debug stmt remapping calls
>> remap_dependence_clique which may end up bumping
>> cfun->last_clique and thus may change code generation.
>>
>> So what debug stmts contain MEM_REFs?  If you put an assert
>> processing_debug_stmt == 0 in
>> remap_dependence_clique I'd like to see a testcase that triggers it.
>>
>> Richard.
>
> Ok, I've placed the suggested assert which is triggered for following debug 
> statement:
>
> (gdb) p debug_gimple_stmt(stmt)
> # DEBUG D#21 => a_1(D)->dim[0].ubound
>
> (gdb) p debug_tree(*tp)
>   type  size 
> unit size 
> align 64 symtab -160828560 alias set -1 canonical type 0x76a4f000
> fields  0x768a7348>
> unsigned DI file 
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/actual_array_constructor_1.f90
>  line 21 col 0
> size 
> unit size 
> align 64 offset_align 128
> offset 
> bit offset  context 
>  chain  0x76a47a18 offset>>
> pointer_to_this  reference_to_this 
>  chain >
>
> arg 0  type  array1_unknown>
> public unsigned restrict DI size  
> unit size 
> align 64 symtab 0 alias set -1 canonical type 0x76a53150>
> var def_stmt GIMPLE_NOP
>
> version 1>
> arg 1  
> constant 0>>
>
> for the following test-case:
> gfortran 
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/actual_array_constructor_1.f90
>  -O3 -g

Ok.  I suggest you instead do sth like

Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c   (revision 236021)
+++ gcc/tree-inline.c   (working copy)
@@ -840,7 +840,7 @@ is_parm (tree decl)
 static unsigned short
 remap_dependence_clique (copy_body_data *id, unsigned short clique)
 {
-  if (clique == 0)
+  if (clique == 0 || processing_debug_stmt)
 return 0;
   if (!id->dependence_map)
 id->dependence_map = new hash_map;

does that resolve the issue?  If so, this is ok for trunk and branches.

Thanks,
Richard.

> Martin
>
>
>


Re: [PATCH GCC]Proving no-trappness for array ref in tree if-conv using loop niter information.

2016-05-09 Thread Richard Biener
On Fri, May 6, 2016 at 11:42 AM, Bin.Cheng  wrote:
> On Fri, May 6, 2016 at 10:40 AM, Bin.Cheng  wrote:
>> On Tue, May 3, 2016 at 11:08 AM, Richard Biener
>>  wrote:
>>> On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng  wrote:
 On Mon, May 2, 2016 at 10:00 AM, Richard Biener
  wrote:
> On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng  wrote:
>> On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
>>  wrote:
>>> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng  wrote:
 Hi,
 Tree if-conversion sometimes cannot convert conditional array 
 reference into unconditional one.  Root cause is GCC conservatively 
 assumes newly introduced array reference could be out of array bound 
 and thus trapping.  This patch improves the situation by proving the 
 converted unconditional array reference is within array bound using 
 loop niter information.  To be specific, it checks every index of 
 array reference to see if it's within bound in 
 ifcvt_memrefs_wont_trap.  This patch also factors out 
 base_object_writable checking if the base object is writable or not.
 Bootstrap and test on x86_64 and aarch64, is it OK?
>>>
>>> I think you miss to handle the case optimally where the only
>>> non-ARRAY_REF idx is the dereference of the
>>> base-pointer for, say, p->a[i].  In this case we can use
>>> base_master_dr to see if p is unconditionally dereferenced
>> Yes, will pick up this case.
>>
>>> in the loop.  You also fail to handle the case where we have
>>> MEM_REF[].a[i] that is, you see a decl base.
>> I am having difficulty in creating this case for ifcvt, any advices?  
>> Thanks.
>
> Sth like
>
> float a[128];
> float foo (int n, int i)
> {
>   return (*((float(*)[n])a))[i];
> }
>
> should do the trick (w/o the component-ref).  Any other type-punning
> would do it, too.
>
>>> I suppose for_each_index should be fixed for this particular case (to
>>> return true), same for TARGET_MEM_REF TMR_BASE.
>>>
>>> +  /* The case of nonconstant bounds could be handled, but it would be
>>> + complicated.  */
>>> +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
>>> +  || !high || TREE_CODE (high) != INTEGER_CST)
>>> +return false;
>>> +
>>>
>>> handling of a non-zero but constant low bound is important - otherwise
>>> all this is a no-op for Fortran.  It
>>> shouldn't be too difficult to handle after all.  In fact I think your
>>> code does handle it correctly already.
>>>
>>> +  if (!init || TREE_CODE (init) != INTEGER_CST
>>> +  || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop 
>>> (step))
>>> +return false;
>>>
>>> step == 0 should be easy to handle as well, no?  The index will simply
>>> always be 'init' ...
>>>
>>> +  /* In case the relevant bound of the array does not fit in type, or
>>> + it does, but bound + step (in type) still belongs into the range 
>>> of the
>>> + array, the index may wrap and still stay within the range of the 
>>> array
>>> + (consider e.g. if the array is indexed by the full range of
>>> + unsigned char).
>>> +
>>> + To make things simpler, we require both bounds to fit into type, 
>>> although
>>> + there are cases where this would not be strictly necessary.  */
>>> +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
>>> +return false;
>>> +
>>> +  low = fold_convert (type, low);
>>>
>>> please use wide_int for all of this.
>> Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
>> sure what's the meaning by "handle "low = fold_convert (type, low);"
>> related code in wide_int".   Do you mean to use tree_int_cst_compare
>> instead of tree_int_cst_compare in the following code?
>
> I don't think you need any kind of fits-to-type check here.  You'd simply
> use to_widest () when operating on / comparing with high/low.
 But what would happen if low/high and init/step are different in type
 sign-ness?  Anything special I need to do before using wi::ltu_p or
 wi::lts_p directly?
>>>
>>> You want to use to_widest (min) which extends according to sign to
>>> an "infinite" precision signed integer.  So you can then use the new
>>> operator< overloads as well.
>>>
>> Hi,
>> Here is the updated patch.  It includes below changes according to
>> review comments:
>>
>> 1) It uses widest_int for all INTEGER_CST tree computations, which
>> simplifies the patch a lot.
>> 2) It covers array with non-zero low bound, which is important for 

Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-09 Thread Jakub Jelinek
On Mon, May 09, 2016 at 12:38:14PM +0200, Bernd Schmidt wrote:
> On 05/09/2016 12:23 PM, Bernd Schmidt wrote:
> >On 05/06/2016 01:38 AM, Dhole wrote:
> >>I've wrote a test case which fails (when it shouldn't) and I don't see
> >>why.
> >
> >I think it's setting the env var when executing the test, not when
> >executing the compiler.
> 
> Here's something that seems to work (with an appropriately adjusted
> testcase). Cc'ing testsuite maintainers.

Wouldn't it be better to add a more general directive, e.g. one that would
allow setting any env var for the compiler, rather than one specific,
similarly to how dg-set-target-env-var allows any env var to be set for the
execution?  dg-set-compiler-env-var ?

Jakub


Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-09 Thread Ville Voutilainen
On 9 May 2016 at 12:31, Jonathan Wakely  wrote:
> On 08/05/16 14:43 +0300, Ville Voutilainen wrote:
>>
>> Tested on Linux-PPC64.
>
>
> OK for trunk, and we should backport to gcc-6-branch (and
> gcc-5-branch?) soon.


gcc-5-branch doesn't have the "Improving pair and tuple"
implementation at all yet, afaik.
I'm somewhat hesitant to backport all that.


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-09 Thread Bernd Schmidt

On 05/09/2016 12:23 PM, Bernd Schmidt wrote:

On 05/06/2016 01:38 AM, Dhole wrote:

I've wrote a test case which fails (when it shouldn't) and I don't see
why.


I think it's setting the env var when executing the test, not when
executing the compiler.


Here's something that seems to work (with an appropriately adjusted 
testcase). Cc'ing testsuite maintainers.



Bernd
	* lib/gcc-dg.exp (orig_source_date_epoch_saved,
	orig_source_date_epoch_checked): Initialize to 0.
	(dg-set-source-date-epoch): New function.
	(restore_source_epoch_env_var): New function.
	(cleanup_after_saved_dg_test): Call it.

Index: gcc-dg.exp
===
--- gcc-dg.exp	(revision 236021)
+++ gcc-dg.exp	(working copy)
@@ -31,6 +31,9 @@ load_lib torture-options.exp
 load_lib fortran-modules.exp
 load_lib multiline.exp
 
+set orig_source_date_epoch_saved 0
+set orig_source_date_epoch_checked 0
+
 # We set LC_ALL and LANG to C so that we get the same error messages as expected.
 setenv LC_ALL C
 setenv LANG C
@@ -422,9 +425,24 @@ proc dg-set-target-env-var { args } {
 lappend set_target_env_var [list [lindex $args 1] [lindex $args 2]]
 }
 
+proc dg-set-source-date-epoch { args } {
+  global orig_source_date_epoch
+  global orig_source_date_epoch_saved
+  global orig_source_date_epoch_checked
+  if { $orig_source_date_epoch_checked == 0 } {
+if [info exists env(SOURCE_DATE_EPOCH)] {
+  set orig_source_date_epoch "$env(SOURCE_DATE_EPOCH)"
+  set orig_source_date_epoch_saved 1
+}
+set orig_source_date_epoch_checked 1
+  }
+  setenv SOURCE_DATE_EPOCH [lindex $args 1]
+}
+
 proc set-target-env-var { } {
 global set_target_env_var
 upvar 1 saved_target_env_var saved_target_env_var
+verbose "setting env var"
 foreach env_var $set_target_env_var {
 	set var [lindex $env_var 0]
 	set value [lindex $env_var 1]
@@ -840,6 +858,18 @@ proc output-exists-not { args } {
 }
 }
 
+proc restore_source_epoch_env_var { } {
+  global orig_source_date_epoch_saved
+  global orig_source_date_epoch
+  global env
+
+  if { $orig_source_date_epoch_saved } {
+setenv SOURCE_DATE_EPOCH "$orig_source_date_epoch"
+  } elseif [info exists env(SOURCE_DATE_EPOCH)] {
+unsetenv SOURCE_DATE_EPOCH
+  }
+}
+
 # We need to make sure that additional_* are cleared out after every
 # test.  It is not enough to clear them out *before* the next test run
 # because gcc-target-compile gets run directly from some .exp files
@@ -877,6 +907,7 @@ if { [info procs saved-dg-test] == [list
 	if [info exists keep_saved_temps_suffixes] {
 	unset keep_saved_temps_suffixes
 	}
+	restore_source_epoch_env_var
 	unset_timeout_vars
 	if [info exists compiler_conditional_xfail_data] {
 	unset compiler_conditional_xfail_data


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-09 Thread Bernd Schmidt

On 05/06/2016 01:38 AM, Dhole wrote:

On 16-04-29 09:17:44, Jakub Jelinek wrote:

Bernd: I'll see if I can prepare a testcase; first I need to get
familiar with the testing framework and learn how to set environment
variables in tests.  Any tips on that will be really welcome!


grep for dg-set-target-env-var in various tests.


I've been looking at how the test infrastructure works, and I'm having
some difficulties with setting the env var.

I've wrote a test case which fails (when it shouldn't) and I don't see
why.


I think it's setting the env var when executing the test, not when 
executing the compiler.



Bernd



[patch] libstdc++/71004 fix recursive_directory_iterator default constructor

2016-05-09 Thread Jonathan Wakely

This fixes some uninitialized members in the
filesystem::recursive_directory_iterator default constructor.

Tested x86_64-linux, committed to trunk. Backports to follow soon.


commit 2b1c32ec504886625dae53274b14f86e2082a636
Author: Jonathan Wakely 
Date:   Mon May 9 11:01:59 2016 +0100

libstdc++/71004 fix recursive_directory_iterator default constructor

	PR libstdc++/71004
	* include/experimental/bits/fs_dir.h (recursive_directory_iterator):
	Initialize scalar member variables in default constructor.
	* testsuite/experimental/filesystem/iterators/
	recursive_directory_iterator.cc: Teste default construction.

diff --git a/libstdc++-v3/include/experimental/bits/fs_dir.h b/libstdc++-v3/include/experimental/bits/fs_dir.h
index 4e28c8e..f128cce 100644
--- a/libstdc++-v3/include/experimental/bits/fs_dir.h
+++ b/libstdc++-v3/include/experimental/bits/fs_dir.h
@@ -301,8 +301,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 struct _Dir_stack;
 std::shared_ptr<_Dir_stack> _M_dirs;
-directory_options _M_options;
-bool _M_pending;
+directory_options _M_options = {};
+bool _M_pending = false;
   };
 
   inline recursive_directory_iterator
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
index 5d2e45b..a263602 100644
--- a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
+++ b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
@@ -97,6 +97,16 @@ test01()
   remove_all(p, ec);
 }
 
+void
+test02()
+{
+  // libstdc++71004
+  const fs::recursive_directory_iterator it;
+  assert( it.options() == fs::directory_options{} );
+  assert( it.depth() == 0 );
+  assert(it.recursion_pending() == false);
+}
+
 int
 main()
 {


Re: [PATCH] Take known zero bits into account when checking extraction.

2016-05-09 Thread Dominik Vogt
On Wed, Apr 27, 2016 at 10:24:21PM -0600, Jeff Law wrote:
> On 04/27/2016 02:20 AM, Dominik Vogt wrote:
> >The attached patch is a result of discussing an S/390 issue with
> >"and with complement" in some cases.
> >
> >  https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
> >  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html
> >
> >Combine would merge a ZERO_EXTEND and a SET taking the known zero
> >bits into account, resulting in an AND.  Later on,
> >make_compound_operation() fails to replace that with a ZERO_EXTEND
> >which we get for free on S/390 but leaves the AND, eventually
> >resulting in two consecutive AND instructions.
> >
> >The current code in make_compound_operation() that detects
> >opportunities for ZERO_EXTEND does not work here because it does
> >not take the known zero bits into account:
> >
> >  /* If the constant is one less than a power of two, this might be
> >  representable by an extraction even if no shift is present.
> >  If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
> >  we are in a COMPARE.  */
> >  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
> > new_rtx = make_extraction (mode,
> >make_compound_operation (XEXP (x, 0),
> > next_code),
> >0, NULL_RTX, i, 1, 0, in_code == COMPARE);
> >
> >An attempt to use the zero bits in the above conditions resulted
> >in many situations that generated worse code, so the patch tries
> >to fix this in a more conservative way.  While the effect is
> >completely positive on S/390, this will very likely have
> >unforeseeable consequences on other targets.

> > * combine.c (make_compound_operation): Take known zero bits into
> > account when checking for possible zero_extend.
> I'd strongly recommend writing some tests for this.

This turns out to be quite difficult.  A small test function
effectively just returns the argument:

  unsigned long bar (unsigned long in) 
  { 
if ((in & 1) == 0) 
  in = (in & ~(unsigned long)1); 
 
return in; 
  }

However, Gcc does not notice that the AND is a no-op.  As far as I
understand, zero bit tracking is only done in "combine", so when
folding the assignment statement the information that the lowest
bit is zero is not available and therefore the no-op is not
detected?

(I've been trying to trigger the code from the patch with a
function bases on the above construct.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: Simple bitop reassoc in match.pd (was: Canonicalize X u< X to UNORDERED_EXPR)

2016-05-09 Thread Richard Biener
On Fri, May 6, 2016 at 1:50 PM, Marc Glisse  wrote:
> On Tue, 3 May 2016, Richard Biener wrote:
>
>> On Tue, May 3, 2016 at 3:26 PM, Marc Glisse  wrote:
>>>
>>> On Tue, 3 May 2016, Richard Biener wrote:
>>>
 On Tue, May 3, 2016 at 8:36 AM, Marc Glisse 
 wrote:
>
>
> This removes the duplication. I also removed the case (A)&(A) which
> is
> handled by reassoc. And I need 2 NOP checks, for the case where @0 is a
> constant (that couldn't happen before my patch because canonicalization
> would put the constant as second operand).



 Nicely spotted.  Not sure we want to delay (A)&(A) until re-assoc.
 We
 have
 many patterns that reassoc would also catch, like (A + CST) + CST or (A
 +
 B)- A,
 albeit reassoc only handles the unsigned cases.
>>>
>>>
>>>
>>> (A) seems simple enough for match.pd, I thought (A)&(A) was
>>> starting
>>> to be a bit specialized. I could easily add it back (making it work with
>>> |
>>> at the same time), but then I am not convinced A&(B) is the best
>>> output.
>>> If A or A have several uses, then (A) or B&(A) seem preferable
>>> (and we would still have a transformation for (A) so we
>>> wouldn't
>>> lose in the case where B and C are constants). We may still end up having
>>> to
>>> add some :s to the transformation I just touched.
>>
>>
>> Yeah, these are always tricky questions.  Note that re-assoc won't
>> handle the case
>> with multi-use A or A  The only reason to care for the single-use
>> case is
>> when we change operations for the mixed operand cases.  For the all-&|
>> case
>> there is only the (usual) consideration about SSA lifetime extension.
>>
>> So maybe it makes sense to split out the all-&| cases.
>
>
> Here they are. I did (X) and (X)&(X). The next one would be
> ((X)), but at some point we have to defer to reassoc.
>
> I didn't add the convert?+tree_nop_conversion_p to the existing transform I
> modified. I guess at some point we should make a pass and add them to all
> the transformations on bit operations...
>
> For (X & Y) & Y, I believe that any conversion is fine. For the others,
> tree_nop_conversion_p is probably too strict (narrowing should be fine for
> all), but I was too lazy to look for tighter conditions.
>
> (X ^ Y) ^ Y -> X should probably have (non_lvalue ...) on its output, but in
> a simple test it didn't seem to matter. Is non_lvalue still needed?

I've only added them to patterns where the fold-const.c variant had it.

In theory (hopefully) all of them are no longer necessary since C/C++ both
do delayed folding and thus hopefully figure out lvalue-ness before
entering fold ...

Richard.

>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.
>
> 2016-05-06  Marc Glisse  
>
> gcc/
> * fold-const.c (fold_binary_loc) [(X ^ Y) & Y]: Remove and merge
> with...
> * match.pd ((X & Y) ^ Y): ... this.
> ((X & Y) & Y, (X | Y) | Y, (X ^ Y) ^ Y, (X & Y) & (X & Z), (X | Y)
> | (X | Z), (X ^ Y) ^ (X ^ Z)): New transformations.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/bit-assoc.c: New testcase.
> * gcc.dg/tree-ssa/pr69270.c: Adjust.
> * gcc.dg/tree-ssa/vrp59.c: Disable forwprop.
>
> --
> Marc Glisse
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 235933)
> +++ gcc/fold-const.c(working copy)
> @@ -10063,59 +10063,20 @@ fold_binary_loc (location_t loc,
> }
>/* Fold !X & 1 as X == 0.  */
>if (TREE_CODE (arg0) == TRUTH_NOT_EXPR
>   && integer_onep (arg1))
> {
>   tem = TREE_OPERAND (arg0, 0);
>   return fold_build2_loc (loc, EQ_EXPR, type, tem,
>   build_zero_cst (TREE_TYPE (tem)));
> }
>
> -  /* Fold (X ^ Y) & Y as ~X & Y.  */
> -  if (TREE_CODE (arg0) == BIT_XOR_EXPR
> - && operand_equal_p (TREE_OPERAND (arg0, 1), arg1, 0))
> -   {
> - tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
> - return fold_build2_loc (loc, BIT_AND_EXPR, type,
> - fold_build1_loc (loc, BIT_NOT_EXPR, type,
> tem),
> - fold_convert_loc (loc, type, arg1));
> -   }
> -  /* Fold (X ^ Y) & X as ~Y & X.  */
> -  if (TREE_CODE (arg0) == BIT_XOR_EXPR
> - && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)
> - && reorder_operands_p (TREE_OPERAND (arg0, 1), arg1))
> -   {
> - tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
> - return fold_build2_loc (loc, BIT_AND_EXPR, type,
> - fold_build1_loc (loc, BIT_NOT_EXPR, type,
> tem),
> - fold_convert_loc (loc, type, arg1));
> -   }
> -  /* Fold X & (X ^ Y) as X & ~Y.  */
> -  if (TREE_CODE (arg1) == BIT_XOR_EXPR
> - 

Re: [patch] Fix PR tree-optimization/70884

2016-05-09 Thread Richard Biener
On Sat, May 7, 2016 at 11:22 PM, Eric Botcazou  wrote:
> Hi,
>
> this is a tentative fix for the regression introduced in SRA by the machinery
> which deals with the constant pool.  initialize_constant_pool_replacements is
> supposed to issues new loads from the pool for scalarized variables, but it
> fails to do so for variables that are only partially scalarized.
>
> Tested on PowerPC/Linux and x86-64/Linux, OK for mainline and 6 branch?

Hmm, the patch looks obvious if it was the intent to allow constant
pool replacements
_not_ only when the whole constant pool entry may go away.  But I
think the intent was
to not do this otherwise it will generate worse code by forcing all
loads from the constant pool to appear at
function start.

So - the "real" issue may be a missing
should_scalarize_away_bitmap/cannot_scalarize_away_bitmap
check somewhere.

Alan?

Thanks,
Richrd.

>
> 2016-05-07  Eric Botcazou  
>
> PR tree-optimization/70884
> * tree-sra.c (initialize_constant_pool_replacements): Process all the
> candidate variables.
>
>
> 2016-05-07  Eric Botcazou  
>
> * gcc.dg/pr70884.c: New test.
>
> --
> Eric Botcazou


Re: [PATCH 3/3] Enhance dumps of IVOPTS

2016-05-09 Thread Richard Biener
On Fri, May 6, 2016 at 11:19 AM, Martin Liška  wrote:
> Hi.
>
> Honza asked me to explain the change more verbosely.
> The patch simplify enhances verbose dump of IVOPTS so that
> # of iterations is printed. Apart from that it also prints
> invariant expression that are used during the algorithm which
> considers a set of candidates which is improved.
>
> Main motivation for doing this was that sometimes the optimization
> considers a constant integer as invariant expression (Bin Cheng
> is working on removal of these) and that both IVs and IE are considered
> by the cost model to occupy a register. Which is not ideal and
> it sometimes tend to introduce more IVs that one would expect.
>
> === New format ===:
> Improved to:
>   cost: 27 (complexity 2)
>   cand_cost: 11
>   cand_group_cost: 10 (complexity 2)
>   candidates: 3, 5
>group:0 --> iv_cand:5, cost=(2,0)
>group:1 --> iv_cand:5, cost=(4,1)
>group:2 --> iv_cand:5, cost=(4,1)
>group:3 --> iv_cand:3, cost=(0,0)
>group:4 --> iv_cand:3, cost=(0,0)
>   invariants 1, 6
>   used invariant expressions:
>inv_expr:3:  ((sizetype) _976 - (sizetype) _922) * 4
>inv_expr:6:  ((sizetype) _1335 - (sizetype) _922) * 4
>
>
> Original cost 27 (complexity 2)
>
> Final cost 27 (complexity 2)
>
> Selected IV set for loop 96 at original.f90:820, 5 avg niters, 2 expressions, 
> 2 IVs:
>
> === Before ===:
>
> Improved to:
>   cost: 27 (complexity 2)
>   cand_cost: 11
>   cand_group_cost: 10 (complexity 2)
>   candidates: 3, 5
>group:0 --> iv_cand:5, cost=(2,0)
>group:1 --> iv_cand:5, cost=(4,1)
>group:2 --> iv_cand:5, cost=(4,1)
>group:3 --> iv_cand:3, cost=(0,0)
>group:4 --> iv_cand:3, cost=(0,0)
>   invariants 1, 6
>
> Original cost 27 (complexity 2)
>
> Final cost 27 (complexity 2)
>
> Selected IV set for loop 96 at original.f90:820, 2 IVs:

But it slows donw compile-time just for enhanced dump files.  Can you
make the new
hash-map conditional on dumping?

Richard.

>
> Martin


Re: [PATCH 0/4] RFC: RTL frontend

2016-05-09 Thread Richard Biener
On Wed, May 4, 2016 at 10:49 PM, David Malcolm  wrote:
> This patch kit introduces an RTL frontend, for the purpose
> of unit testing: primarly for unit testing of RTL passes, and
> possibly for unit testing of .md files.
>
> It's very much a work-in-progress; I'm posting it now to get feedback.
> I've successfully bootstrapped patches 1-3 of the kit on
> x86_64-pc-linux-gnu, but patch 4 (which is the heart of the
> implementation) doesn't survive bootstrap yet (dependency issues
> in the Makefile).
>
> The rest of this post is from gcc/rtl/notes.rst from patch 4; I'm
> adding a duplicate copy up-front here to make it easier to get an
> overview.

Thanks for starting this project.

> RTL frontend
> 
>
> Purpose
> ***
>
> Historically GCC testing has been done by providing source files
> to be built with various command-line options (via DejaGnu
> directives), dumping state at pertinent places, and verifying
> properties of the state via these dumps.
>
> A strength of this approach is that we have excellent integration
> testing, as every test case exercises the toolchain as a whole, but
> it has the drawback that when testing a specific pass,
> we have little control of the input to that specific pass.  We
> provide input, and the various passes transform the state
> of the internal representation::
>
>   INPUT -> PASS-1 -> STATE-1 -> PASS-2 -> STATE-2 -> ...
> -> etc ->
> -> ... -> PASS-n-1 -> STATE-n-1 -> PASS-n -> STATE-n
>   ^^ ^
>   || Output from the pass
>   |The pass we care about
>   The actual input to the pass
>
> so the intervening passes before "PASS-n" could make changes to the
> IR that affect the input seen by our pass ("STATE-n-1" above).  This
> can break our test cases, sometimes in a form that's visible,
> sometimes invisibly (e.g. where a test case silently stops providing
> coverage).
>
> The aim of the RTL frontend is to provide a convenient way to test
> individual passes in the backend, by loading dumps of specific RTL
> state (possibly edited by hand), and then running just one specific
> pass on them, so that we effectively have this::
>
>   INPUT -> PASS-n -> OUTPUT
>
> thus fixing the problem above.
>
> My hope is that this makes it easy to write more fine-grained and
> robust test coverage for the RTL phase of GCC.  However I see this
> as *complementary* to the existing "integrated testing" approach:
> patches should include both RTL frontend tests *and* integrated tests,
> to avoid regressing the great integration testing we currently have.
>
> The idea is to use the existing dump format as a input format, since
> presumably existing GCC developers are very familiar with the dump
> format.
>
> One other potential benefit of this approach is to allow unit-testing
> of machine descriptions - we could provide specific RTL fragments,
> and have the rtl.dg testsuite directly verify that we recognize all
> instructions and addressing modes that a given target ought to support.
>
> Structure
> *
>
> The RTL frontend is similar to a regular frontend: a gcc/rtl
> subdirectory within the source tree contains frontend-specific hooks.
> These provide a new "rtl" frontend, which can be optionally
> enabled at configuration time within --enable-languages.
>
> If enabled, it builds an rtl1 binary, which is invoked by the
> gcc driver on files with a .rtl extension.
>
> The testsuite is below gcc/testsuite/rtl.dg.  There's also
> a "roundtrip" subdirectory below this, in which every .rtl
> file is loaded and then dumped; roundtrip.exp verifies that
> the dump is identical to the original file, thus ensuring that
> the RTL loaders faithfully rebuild the input dump.
>
> Limitations
> ***
>
> * It's a work-in-progress.  There will be bugs.
>
> * The existing RTL code is structured around a single function being
>   optimized, so, as a simplification, the RTL frontend can only handle
>   one function per input file.  Also, the dump format currently uses
>   comments to separate functions::
>
> ;; Function test_1 (test_1, funcdef_no=0, decl_uid=1758, cgraph_uid=0, 
> symbol_order=0)
>
> ... various pass-specific things, sometimes expressed as comments,
> sometimes not
>
> ;;
> ;; Full RTL generated for this function:
> ;;
> (note 1 0 6 NOTE_INSN_DELETED)
> ;; etc, insns for function "test_1" go here
> (insn 27 26 0 6 (use (reg/i:SI 0 ax)) 
> ../../src/gcc/testsuite/rtl.dg/test.c:7 -1
>  (nil))
>
> ;; Function test_2 (test_2, funcdef_no=1, decl_uid=1765, cgraph_uid=1, 
> symbol_order=1)
> ... various pass-specific things, sometimes expressed as comments,
> sometimes not
> ;;
> ;; Full RTL generated for this function:
> ;;
> (note 1 0 5 NOTE_INSN_DELETED)
> ;; etc, insns for function "test_2" go here
> (insn 59 

Re: Fix regrename compare-debug issue

2016-05-09 Thread Bernd Schmidt



On 05/05/2016 09:02 AM, Eric Botcazou wrote:

When scanning addresses inside a debug insn, we shouldn't use normal
base/index classes. This shows as a compare-debug issue on Alpha, where
INDEX_REG_CLASS is NO_REGS, and this prevented a chain from being
renamed with debugging turned on.

Uros has reported that this patch resolves the issues he was seeing on
Alpha, and I've bootstrapped and tested it on x86_64-linux. Ok?


OK, thanks.  It might worthwhile to add a sentence somewhere (maybe at the end
of the head comment of the file) documenting the special treatment applied to
debug insns during the pass.


Committed with the extra hunk below.


Bernd

@@ -61,7 +61,10 @@
  5. If a renaming register has been found, it is substituted in 
the chain.


   Targets can parameterize the pass by specifying a preferred class 
for the
-  renaming register for a given (super)class of registers to be 
renamed.  */

+  renaming register for a given (super)class of registers to be renamed.
+
+  DEBUG_INSNs are treated specially, in particular registers occurring 
inside

+  them are treated as requiring ALL_REGS as a class.  */

 #if HOST_BITS_PER_WIDE_INT <= MAX_RECOG_OPERANDS
 #error "Use a different bitmap implementation for untracked_operands."


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-09 Thread Matthias Klose

On 07.05.2016 16:35, Matthias Klose wrote:

On 02.05.2016 17:51, Jeff Law wrote:

On 04/29/2016 05:36 AM, Ilya Verbin wrote:

Hi!

This patch brings the latest libcilkrts from upstream.
Regtested on i686-linux and x86_64-linux.

Abidiff:
Functions changes summary: 0 Removed, 1 Changed (16 filtered out), 2 Added
functions
Variables changes summary: 0 Removed, 0 Changed (1 filtered out), 0 Added
variable
2 Added functions:
  'function void __cilkrts_resume()'{__cilkrts_resume@@CILKABI1}
  'function void __cilkrts_suspend()'{__cilkrts_suspend@@CILKABI1}
1 function with some indirect sub-type change:
  [C]'function __cilkrts_worker_ptr __cilkrts_bind_thread_1()' at
cilk-abi.c:412:1 has some indirect sub-type changes:
Please note that the symbol of this function is
__cilkrts_bind_thread@@CILKABI0
 and it aliases symbol: __cilkrts_bind_thread_1@@CILKABI1
return type changed:
  underlying type '__cilkrts_worker*' changed:
in pointed to type 'struct __cilkrts_worker' at abi.h:161:1:
  1 data member changes (8 filtered):
   type of 'global_state_t* __cilkrts_worker::g' changed:
 in pointed to type 'typedef global_state_t' at abi.h:113:1:
   underlying type 'struct global_state_t' at global_state.h:119:1
changed:
   [...]

OK for trunk?

libcilkrts/
* Makefile.am: Merge from upstream, version 2.0.4420.0
.
* README: Likewise.
* configure.ac: Likewise.
* configure.tgt: Likewise.
* include/cilk/cilk.h: Likewise.
* include/cilk/cilk_api.h: Likewise.
* include/cilk/cilk_api_linux.h: Likewise.
* include/cilk/cilk_stub.h: Likewise.
* include/cilk/cilk_undocumented.h: Likewise.
* include/cilk/common.h: Likewise.
* include/cilk/holder.h: Likewise.
* include/cilk/hyperobject_base.h: Likewise.
* include/cilk/metaprogramming.h: Likewise.
* include/cilk/reducer.h: Likewise.
* include/cilk/reducer_file.h: Likewise.
* include/cilk/reducer_list.h: Likewise.
* include/cilk/reducer_max.h: Likewise.
* include/cilk/reducer_min.h: Likewise.
* include/cilk/reducer_min_max.h: Likewise.
* include/cilk/reducer_opadd.h: Likewise.
* include/cilk/reducer_opand.h: Likewise.
* include/cilk/reducer_opmul.h: Likewise.
* include/cilk/reducer_opor.h: Likewise.
* include/cilk/reducer_opxor.h: Likewise.
* include/cilk/reducer_ostream.h: Likewise.
* include/cilk/reducer_string.h: Likewise.
* include/cilktools/cilkscreen.h: Likewise.
* include/cilktools/cilkview.h: Likewise.
* include/cilktools/fake_mutex.h: Likewise.
* include/cilktools/lock_guard.h: Likewise.
* include/internal/abi.h: Likewise.
* include/internal/cilk_fake.h: Likewise.
* include/internal/cilk_version.h: Likewise.
* include/internal/metacall.h: Likewise.
* include/internal/rev.mk: Likewise.
* mk/cilk-version.mk: Likewise.
* runtime/acknowledgements.dox: Likewise.
* runtime/bug.cpp: Likewise.
* runtime/bug.h: Likewise.
* runtime/c_reducers.c: Likewise.
* runtime/cilk-abi-cilk-for.cpp: Likewise.
* runtime/cilk-abi-vla-internal.c: Likewise.
* runtime/cilk-abi-vla-internal.h: Likewise.
* runtime/cilk-abi.c: Likewise.
* runtime/cilk-ittnotify.h: Likewise.
* runtime/cilk-tbb-interop.h: Likewise.
* runtime/cilk_api.c: Likewise.
* runtime/cilk_fiber-unix.cpp: Likewise.
* runtime/cilk_fiber-unix.h: Likewise.
* runtime/cilk_fiber.cpp: Likewise.
* runtime/cilk_fiber.h: Likewise.
* runtime/cilk_malloc.c: Likewise.
* runtime/cilk_malloc.h: Likewise.
* runtime/component.h: Likewise.
* runtime/config/generic/cilk-abi-vla.c: Likewise.
* runtime/config/generic/os-fence.h: Likewise.
* runtime/config/generic/os-unix-sysdep.c: Likewise.
* runtime/config/x86/cilk-abi-vla.c: Likewise.
* runtime/config/x86/os-fence.h: Likewise.
* runtime/config/x86/os-unix-sysdep.c: Likewise.
* runtime/doxygen-layout.xml: Likewise.
* runtime/doxygen.cfg: Likewise.
* runtime/except-gcc.cpp: Likewise.
* runtime/except-gcc.h: Likewise.
* runtime/except.h: Likewise.
* runtime/frame_malloc.c: Likewise.
* runtime/frame_malloc.h: Likewise.
* runtime/full_frame.c: Likewise.
* runtime/full_frame.h: Likewise.
* runtime/global_state.cpp: Likewise.
* runtime/global_state.h: Likewise.
* runtime/jmpbuf.c: Likewise.
* runtime/jmpbuf.h: Likewise.
* runtime/linux-symbols.ver: Likewise.
* runtime/local_state.c: Likewise.
* runtime/local_state.h: Likewise.
* runtime/mac-symbols.txt: Likewise.
* runtime/metacall_impl.c: Likewise.
* runtime/metacall_impl.h: Likewise.
* runtime/os-unix.c: Likewise.
* runtime/os.h: Likewise.
* runtime/os_mutex-unix.c: Likewise.
* runtime/os_mutex.h: Likewise.
* runtime/pedigrees.c: Likewise.
* runtime/pedigrees.h: Likewise.
* 

[PATCH] Fix PR70985

2016-05-09 Thread Richard Biener

I am testing the following followup to my BIT_FIELD_REF simplification
changes which resolve issues when applying to memory BIT_FIELD_REFs.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-05-09  Richard Biener  

PR tree-optimization/70985
* match.pd (BIT_FIELD_REF -> (type)): Disable on GIMPLE when
op0 isn't a gimple register.

* gcc.dg/torture/pr70985.c: New testcase.

Index: gcc/match.pd
===
*** gcc/match.pd(revision 236021)
--- gcc/match.pd(working copy)
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 3244,3249 
--- 3244,3251 
   (view_convert (imagpart @0)
(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
 && INTEGRAL_TYPE_P (type)
+/* On GIMPLE this should only apply to register arguments.  */
+&& (! GIMPLE || is_gimple_reg (@0))
 /* A bit-field-ref that referenced the full argument can be stripped.  
*/
 && ((compare_tree_int (@1, TYPE_PRECISION (TREE_TYPE (@0))) == 0
&& integer_zerop (@2))
Index: gcc/testsuite/gcc.dg/torture/pr70985.c
===
*** gcc/testsuite/gcc.dg/torture/pr70985.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70985.c  (working copy)
***
*** 0 
--- 1,28 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target int32plus } */
+ 
+ struct
+ {
+   int f0:24;
+ } a, c, d;
+ 
+ int b;
+ 
+ int
+ fn1 ()
+ {
+   return 0;
+ }
+ 
+ void
+ fn2 ()
+ {
+   int e;
+   if (b) 
+ for (; e;)
+   {
+   d = c;
+   if (fn1 (b))
+ b = a.f0;
+   }
+ }


Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-09 Thread Jonathan Wakely

On 08/05/16 14:43 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64.


OK for trunk, and we should backport to gcc-6-branch (and
gcc-5-branch?) soon.

Thanks.



Re: [v3 PATCH] Avoid endless run-time recursion for copying single-element tuples where the element type is by-value constructible from any type

2016-05-09 Thread Jonathan Wakely

On 08/05/16 15:55 +0300, Ville Voutilainen wrote:

For what it's worth, I have the tiniest preference against using decay
here; whenever I see
decay, I wonder whether array/function decay is significant. While it
doesn't make a difference
here, I still prefer just doing remove_reference+remove_const here.
It's up to Jonathan, I'll change
it to decay if he so advises.


At one point I felt the same about being more precise with
remove_ref+remove_const, but I just use decay now.

I don't have a preference either way (although if we added a __decay_t
alias for use in C++11 then I might be swayed to use that, as it would
be even more concise).



Re: [RFA] Remove useless test in bitmap_find_bit.

2016-05-09 Thread Bernd Schmidt

On 05/06/2016 11:18 PM, Jeff Law wrote:


OK for the trunk?


Counts as obvious, doesn't it?


Bernd



Re: Error out on -fvtable-verify without --enable-vtable-verify

2016-05-09 Thread Bernd Schmidt

On 05/08/2016 12:44 PM, Rainer Orth wrote:

With the recent change not to install libvtv without
--enable-vtable-verify, I noticed that gcc/g++ would still accept
-fvtable-verify without errors, only to emit obscure link-time errors
about missing vtv_*.o (which hadn't been installed in that situation
before) and libvtv.

It seems to me a much better user experience to emit a clear error
message in this case, which is what this patch does.


Generally ok, but...


+AC_ARG_ENABLE(vtable-verify,
+[AS_HELP_STRING([--enable-vtable-verify],
+   [enable vtable verification feature])],
+[case "$enableval" in
+ yes) enable_vtable_verify=yes ;;
+ no)  enable_vtable_verify=no ;;
+ *)   enable_vtable_verify=no;;
+ esac],
+[enable_vtable_verify=no])
+vtable_verify=`if test $enable_vtable_verify != no; then echo 1; else echo 0; 
fi`
+AC_DEFINE_UNQUOTED(ENABLE_VTABLE_VERIFY, $vtable_verify,
+[Define 0/1 if vtable verification feature is enabled.])


That looks a little overly complicated. Don't you get the enable_ 
variables set by autoconf? And if you do need the case statement, you 
might as well set things to 0/1 directly and skip the 
enable_vtable_verify variable entirely.



Bernd


Re: [PATCH][ARM] Add mode to probe_stack set operands

2016-05-09 Thread Richard Earnshaw (lists)
On 03/05/16 12:27, Kyrill Tkachov wrote:
> Hi all,
> 
> When building the arm backend genrecog complains that the probe_stack
> set expression
> doesn't specify any modes.  This patch adds the SI mode annotation and
> fixes the warning
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-05-03  Kyrylo Tkachov  
> 
> * config/arm/arm.md (probe_stack): Add modes to set source
> and destination.
> 

OK.

R.

> arm-stack-probe.patch
> 
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> 68de70f05ce29f40849297e160d890f033c34487..0d491f7ea41e4fb5fb58bbb3047294abda541a73
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -8091,8 +8091,8 @@ (define_insn "blockage"
>  )
>  
>  (define_insn "probe_stack"
> -  [(set (match_operand 0 "memory_operand" "=m")
> -(unspec [(const_int 0)] UNSPEC_PROBE_STACK))]
> +  [(set (match_operand:SI 0 "memory_operand" "=m")
> +(unspec:SI [(const_int 0)] UNSPEC_PROBE_STACK))]
>"TARGET_32BIT"
>"str%?\\tr0, %0"
>[(set_attr "type" "store1")
> 



Re: Fix handling of negative bitpos in expand_debug_expr

2016-05-09 Thread Bernd Schmidt

On 05/05/2016 04:20 PM, Richard Sandiford wrote:

expand_debug_expr handled negative bit positions using:

 else if (bitpos < 0)
   {
 HOST_WIDE_INT units
   = (-bitpos + BITS_PER_UNIT - 1) / BITS_PER_UNIT;
 op0 = adjust_address_nv (op0, mode1, units);
 bitpos += units * BITS_PER_UNIT;
   }

Here "units" is the negative of the (negative) byte offset, so I think
we should be offsetting OP0 by -units instead.  E.g. a bitpos of -17
would give units==3, so this code would move OP0 up by 3 bytes and set
bitpos to 7, giving a total bitpos of 31.



gcc/
* cfgexpand.c (expand_debug_expr): Fix address offset for negative
bitpos.


Ok.


Bernd



Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-09 Thread Richard Biener
On Thu, 5 May 2016, Bernd Edlinger wrote:

> Hi!
> 
> this patch is inspired by recent discussion about basic asm:
> 
> Currently a basic asm is an instruction scheduling barrier,
> but not a memory barrier, and most surprising, basic asm
> does _not_ implicitly clobber CC on targets where
> extended asm always implicitly clobbers CC, even if
> nothing is in the clobber section.
> 
> This patch makes basic asm implicitly clobber CC on certain
> targets, and makes the basic asm implicitly clobber memory,
> but no general registers, which is what could be expected.
> 
> This is however only done for basic asm with non-empty
> assembler string, which is in sync with David's proposed
> basic asm warnings patch.
> 
> Due to the change in the tree representation, where
> ASM_INPUT can now be the first element of a
> PARALLEL block with the implicit clobber elements,
> there are some changes necessary.
> 
> Most of the changes in the middle end, were necessary
> because extract_asm_operands can not be used to find out
> if a PARALLEL block is an asm statement, but in most cases
> asm_noperands can be used instead.
> 
> There are also changes necessary in two targets: pa, and ia64.
> I have successfully built cross-compilers for these targets.
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu
> OK for trunk?

I'm generally sympathetic with the change but I wonder if it would
make sense to re-write "basic asm" into general asms to not
need to special case them.  I'd do that during gimplification
for example.

At least it sounds to me that its semantics can be fully expressed
with generic asms?  (Maybe apart from the only-if-ASM_STRING-is-empty
part)

Thanks,
Richard.


Re: [PATCH] Fix PR70937

2016-05-09 Thread Richard Biener
On Fri, 6 May 2016, Richard Biener wrote:

> On Fri, 6 May 2016, Richard Biener wrote:
> 
> > 
> > The following patch fixes another case of missing DECL_EXPR in the FE.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > Ok for trunk?
> 
> Dominique noticed a FAIL early which is fixed by adjusting the patch
> to only handle TYPE_DECL TYPE_NAME like so:
> 
> Index: gcc/fortran/trans-decl.c
> ===
> --- gcc/fortran/trans-decl.c(revision 235945)
> +++ gcc/fortran/trans-decl.c(working copy)
> @@ -3818,6 +3818,12 @@ gfc_trans_vla_type_sizes (gfc_symbol *sy
>  }
>  
>gfc_trans_vla_type_sizes_1 (type, body);
> +  /* gfc_build_qualified_array may have built this type but left 
> TYPE_NAME
> + pointing to the original type whose type sizes we need to expose to
> + the gimplifier unsharing.  */
> +  if (TYPE_NAME (type)
> +  && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL)
> +gfc_add_expr_to_block (body, build1 (DECL_EXPR, type, TYPE_NAME 
> (type)));
>  }
>  
>  
> I've re-started testing.
> 
> Ok with that change?

It doesn't work fully either.  Given also the original problem where
unshared IL exposes the need to visit the "unrelated" VLA types in
the gimplifier unsharing I tested the following patch instead which
resolves the issue as well.  Furthermore it can't regress things
and at this point is the easiest way forward to unbreak fortran.

Thus, bootstrapped and tested on x86_64-unknown-linux-gnu, applied to 
trunk.

I am testing if reverting my first fix is possible after this now.

Richard.

2016-05-09  Richard Biener  

PR fortran/70937
* trans-decl.c: Include gimplify.h for unshare_expr.
(gfc_trans_vla_one_sizepos): Unshare exprs before inserting
them into the IL.

* gfortran.dg/pr70937.f90: New testcase.

Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c(revision 235945)
--- gcc/fortran/trans-decl.c(working copy)
*** along with GCC; see the file COPYING3.
*** 45,50 
--- 45,51 
  /* Only for gfc_trans_code.  Shouldn't need to include this.  */
  #include "trans-stmt.h"
  #include "gomp-constants.h"
+ #include "gimplify.h"
  
  #define MAX_LABEL_VALUE 9
  
*** gfc_trans_vla_one_sizepos (tree *tp, stm
*** 3738,3744 
  
var = gfc_create_var_np (TREE_TYPE (t), NULL);
gfc_add_decl_to_function (var);
!   gfc_add_modify (body, var, val);
if (TREE_CODE (t) == SAVE_EXPR)
  TREE_OPERAND (t, 0) = var;
*tp = var;
--- 3739,3745 
  
var = gfc_create_var_np (TREE_TYPE (t), NULL);
gfc_add_decl_to_function (var);
!   gfc_add_modify (body, var, unshare_expr (val));
if (TREE_CODE (t) == SAVE_EXPR)
  TREE_OPERAND (t, 0) = var;
*tp = var;
 
Index: gcc/testsuite/gfortran.dg/pr70937.f90
===
*** gcc/testsuite/gfortran.dg/pr70937.f90   (revision 0)
--- gcc/testsuite/gfortran.dg/pr70937.f90   (working copy)
***
*** 0 
--- 1,10 
+ ! { dg-do compile }
+ ! { dg-options "-flto" }
+   SUBROUTINE dbcsr_test_read_args(narg, args)
+ CHARACTER(len=*), DIMENSION(:), &
+   INTENT(out) :: args
+ CHARACTER(len=80) :: line
+ DO
+args(narg) = line
+ ENDDO
+   END SUBROUTINE dbcsr_test_read_args


Re: [PATCH][CilkPlus] Merge libcilkrts from upstream

2016-05-09 Thread Christophe Lyon
On 2 May 2016 at 17:51, Jeff Law  wrote:
> On 04/29/2016 05:36 AM, Ilya Verbin wrote:
>>
>> Hi!
>>
>> This patch brings the latest libcilkrts from upstream.
>> Regtested on i686-linux and x86_64-linux.
>>
>> Abidiff:
>> Functions changes summary: 0 Removed, 1 Changed (16 filtered out), 2 Added
>> functions
>> Variables changes summary: 0 Removed, 0 Changed (1 filtered out), 0 Added
>> variable
>> 2 Added functions:
>>   'function void __cilkrts_resume()'{__cilkrts_resume@@CILKABI1}
>>   'function void __cilkrts_suspend()'{__cilkrts_suspend@@CILKABI1}
>> 1 function with some indirect sub-type change:
>>   [C]'function __cilkrts_worker_ptr __cilkrts_bind_thread_1()' at
>> cilk-abi.c:412:1 has some indirect sub-type changes:
>> Please note that the symbol of this function is
>> __cilkrts_bind_thread@@CILKABI0
>>  and it aliases symbol: __cilkrts_bind_thread_1@@CILKABI1
>> return type changed:
>>   underlying type '__cilkrts_worker*' changed:
>> in pointed to type 'struct __cilkrts_worker' at abi.h:161:1:
>>   1 data member changes (8 filtered):
>>type of 'global_state_t* __cilkrts_worker::g' changed:
>>  in pointed to type 'typedef global_state_t' at abi.h:113:1:
>>underlying type 'struct global_state_t' at
>> global_state.h:119:1 changed:
>>[...]
>>
>> OK for trunk?
>>
>> libcilkrts/
>> * Makefile.am: Merge from upstream, version 2.0.4420.0
>> .
>> * README: Likewise.
>> * configure.ac: Likewise.
>> * configure.tgt: Likewise.
>> * include/cilk/cilk.h: Likewise.
>> * include/cilk/cilk_api.h: Likewise.
>> * include/cilk/cilk_api_linux.h: Likewise.
>> * include/cilk/cilk_stub.h: Likewise.
>> * include/cilk/cilk_undocumented.h: Likewise.
>> * include/cilk/common.h: Likewise.
>> * include/cilk/holder.h: Likewise.
>> * include/cilk/hyperobject_base.h: Likewise.
>> * include/cilk/metaprogramming.h: Likewise.
>> * include/cilk/reducer.h: Likewise.
>> * include/cilk/reducer_file.h: Likewise.
>> * include/cilk/reducer_list.h: Likewise.
>> * include/cilk/reducer_max.h: Likewise.
>> * include/cilk/reducer_min.h: Likewise.
>> * include/cilk/reducer_min_max.h: Likewise.
>> * include/cilk/reducer_opadd.h: Likewise.
>> * include/cilk/reducer_opand.h: Likewise.
>> * include/cilk/reducer_opmul.h: Likewise.
>> * include/cilk/reducer_opor.h: Likewise.
>> * include/cilk/reducer_opxor.h: Likewise.
>> * include/cilk/reducer_ostream.h: Likewise.
>> * include/cilk/reducer_string.h: Likewise.
>> * include/cilktools/cilkscreen.h: Likewise.
>> * include/cilktools/cilkview.h: Likewise.
>> * include/cilktools/fake_mutex.h: Likewise.
>> * include/cilktools/lock_guard.h: Likewise.
>> * include/internal/abi.h: Likewise.
>> * include/internal/cilk_fake.h: Likewise.
>> * include/internal/cilk_version.h: Likewise.
>> * include/internal/metacall.h: Likewise.
>> * include/internal/rev.mk: Likewise.
>> * mk/cilk-version.mk: Likewise.
>> * runtime/acknowledgements.dox: Likewise.
>> * runtime/bug.cpp: Likewise.
>> * runtime/bug.h: Likewise.
>> * runtime/c_reducers.c: Likewise.
>> * runtime/cilk-abi-cilk-for.cpp: Likewise.
>> * runtime/cilk-abi-vla-internal.c: Likewise.
>> * runtime/cilk-abi-vla-internal.h: Likewise.
>> * runtime/cilk-abi.c: Likewise.
>> * runtime/cilk-ittnotify.h: Likewise.
>> * runtime/cilk-tbb-interop.h: Likewise.
>> * runtime/cilk_api.c: Likewise.
>> * runtime/cilk_fiber-unix.cpp: Likewise.
>> * runtime/cilk_fiber-unix.h: Likewise.
>> * runtime/cilk_fiber.cpp: Likewise.
>> * runtime/cilk_fiber.h: Likewise.
>> * runtime/cilk_malloc.c: Likewise.
>> * runtime/cilk_malloc.h: Likewise.
>> * runtime/component.h: Likewise.
>> * runtime/config/generic/cilk-abi-vla.c: Likewise.
>> * runtime/config/generic/os-fence.h: Likewise.
>> * runtime/config/generic/os-unix-sysdep.c: Likewise.
>> * runtime/config/x86/cilk-abi-vla.c: Likewise.
>> * runtime/config/x86/os-fence.h: Likewise.
>> * runtime/config/x86/os-unix-sysdep.c: Likewise.
>> * runtime/doxygen-layout.xml: Likewise.
>> * runtime/doxygen.cfg: Likewise.
>> * runtime/except-gcc.cpp: Likewise.
>> * runtime/except-gcc.h: Likewise.
>> * runtime/except.h: Likewise.
>> * runtime/frame_malloc.c: Likewise.
>> * runtime/frame_malloc.h: Likewise.
>> * runtime/full_frame.c: Likewise.
>> * runtime/full_frame.h: Likewise.
>> * runtime/global_state.cpp: Likewise.
>>