C++ PATCH for non-type constrained-type-specifiers

2015-11-06 Thread Jason Merrill
I started looking at allowing non-type constrained-type-specifiers in 
auto deduction and then realized that we didn't handle them in function 
parameters either.  Fixing that brought home to me the oddity of having 
a type-specifier stand in for a non-type.  Mind weighing in on that on 
the core reflector?


I also wonder why we have two different ways of expressing a 
constrained-type-specifier in the implementation: a constrained 
template-parameter (TYPE_DECL, is_constrained_parameter) and a 
constrained auto (TEMPLATE_TYPE_PARM).  Why not represent them the same way?


This patch doesn't mess with this duality, but extends 
equivalent_placeholder_constraints to deal with both kinds, and then 
uses that for comparing constrained-type-specifiers.  And also handles 
non-type constrained-type-specifiers in abbreviated function templates.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 030c2ba7fc87d48604176697c8dcf1bf48da
Author: Jason Merrill 
Date:   Fri Nov 6 00:17:57 2015 -0500

	Support non-type constrained-type-specifiers.
	* parser.c (check_type_concept): Remove.
	(cp_parser_maybe_constrained_type_specifier): Don't call it.
	(synthesize_implicit_template_parm): Handle non-type and template
	template parameters.  Also compare extra args.  Return the decl.
	(cp_parser_template_argument): Handle constrained-type-specifiers for
	non-type template parameters.
	(finish_constrained_template_template_parm): Split out from
	cp_parser_constrained_template_template_parm.
	(cp_parser_nonclass_name): Move some logic into
	cp_parser_maybe_concept_name.
	(cp_parser_init_declarator): Fix error recovery.
	(get_concept_from_constraint): Remove.
	(cp_parser_simple_type_specifier): Adjust for
	synthesize_implicit_template_parm returning the decl.
	* constraint.cc (placeholder_extract_concept_and_args)
	(equivalent_placeholder_constraints): Also handle TYPE_DECL
	constrained parms.

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a1fbf17..c6eaf75 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1379,12 +1379,21 @@ make_constrained_auto (tree con, tree args)
   return decl;
 }
 
-/* Given the predicate constraint T from a placeholder type, extract its
-   TMPL and ARGS.  */
+/* Given the predicate constraint T from a constrained-type-specifier, extract
+   its TMPL and ARGS.  FIXME why do we need two different forms of
+   constrained-type-specifier?  */
 
 void
 placeholder_extract_concept_and_args (tree t, tree &tmpl, tree &args)
 {
+  if (TREE_CODE (t) == TYPE_DECL)
+{
+  /* A constrained parameter.  */
+  tmpl = DECL_TI_TEMPLATE (CONSTRAINED_PARM_CONCEPT (t));
+  args = CONSTRAINED_PARM_EXTRA_ARGS (t);
+  return;
+}
+
   gcc_assert (TREE_CODE (t) == PRED_CONSTR);
   t = PRED_CONSTR_EXPR (t);
   gcc_assert (TREE_CODE (t) == CALL_EXPR
@@ -1418,9 +1427,10 @@ placeholder_extract_concept_and_args (tree t, tree &tmpl, tree &args)
 bool
 equivalent_placeholder_constraints (tree c1, tree c2)
 {
-  if (TREE_CODE (c1) == TEMPLATE_TYPE_PARM)
+  if (c1 && TREE_CODE (c1) == TEMPLATE_TYPE_PARM)
+/* A constrained auto.  */
 c1 = PLACEHOLDER_TYPE_CONSTRAINTS (c1);
-  if (TREE_CODE (c2) == TEMPLATE_TYPE_PARM)
+  if (c2 && TREE_CODE (c2) == TEMPLATE_TYPE_PARM)
 c2 = PLACEHOLDER_TYPE_CONSTRAINTS (c2);
 
   if (c1 == c2)
@@ -1434,14 +1444,21 @@ equivalent_placeholder_constraints (tree c1, tree c2)
 
   if (t1 != t2)
 return false;
-  int len = TREE_VEC_LENGTH (a1);
-  if (len != TREE_VEC_LENGTH (a2))
-return false;
+
   /* Skip the first argument to avoid infinite recursion on the
  placeholder auto itself.  */
-  for (int i = len-1; i > 0; --i)
-if (!cp_tree_equal (TREE_VEC_ELT (a1, i),
-			TREE_VEC_ELT (a2, i)))
+  bool skip1 = (TREE_CODE (c1) == PRED_CONSTR);
+  bool skip2 = (TREE_CODE (c2) == PRED_CONSTR);
+
+  int len1 = (a1 ? TREE_VEC_LENGTH (a1) : 0) - skip1;
+  int len2 = (a2 ? TREE_VEC_LENGTH (a2) : 0) - skip2;
+
+  if (len1 != len2)
+return false;
+
+  for (int i = 0; i < len1; ++i)
+if (!cp_tree_equal (TREE_VEC_ELT (a1, i + skip1),
+			TREE_VEC_ELT (a2, i + skip2)))
   return false;
   return true;
 }
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c6f5729..d1f4970 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13871,18 +13871,9 @@ cp_parser_constrained_type_template_parm (cp_parser *parser,
 return error_mark_node;
 }
 
-/* Finish parsing/processing a template template parameter by borrowing
-   the template parameter list from the prototype parameter.  */
-
 static tree
-cp_parser_constrained_template_template_parm (cp_parser *parser,
-  tree proto,
-  tree id,
-  cp_parameter_declarator *parmdecl)
+finish_constrained_template_template_parm (tree proto, tree id)
 {
-  if (!cp_parser_check_constrained_t

Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Sebastian Pop
On Fri, Nov 6, 2015 at 6:32 AM, Bernd Schmidt  wrote:
> On 11/06/2015 03:10 PM, Sebastian Pop wrote:
>>
>> On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:
>>>
>>> Formatting problem, here and in a few other places. I didn't fully read
>>> the
>>> patch this time around.
>>>
>>> I'm probably not reviewing further patches because I don't see this
>>> progressing to a state where it's acceptable. Others may do so, but as
>>> far
>>> as I'm concerned the patch is rejected.
>>
>>
>> Bernd,
>> I would like to ask you to focus on the technical part, and provide a
>> review only based on technical reasons.
>> Please ignore all formatting changes: I will help address all those
>> changes.
>> I will send a patch addressing all the comments you had in the current
>> review.
>
>
> As long as this just has allocation from the normal stack frame as its only
> strategy, I consider it unacceptable (and I think Richard B voiced the same

Understood.

> opinion). If you want a half-finished redzone allocator, I can send you a
> patch.

Yes please.  Let's get it work.

Thanks,
Sebastian


Move #undef DEF_BUILTIN* to builtins.def

2015-11-06 Thread Richard Sandiford
I was confused at first why tree-core.h was undefining DEF_BUILTIN_CHKP
before defining it, then undefining it again after including builtins.def.
This is because builtins.def provides a default definition of
DEF_BUILTIN_CHKP, but leaves it up to the caller to undefine it where
necessary.  Similarly to the previous internal-fn.def patch, it seems
more obvious for builtins.def to #undef things unconditionally.

One argument might have been that keeping preprocessor stuff
out of the .def files makes it easier for non-cpp parsers.  In practice
though we already have #ifs and multiline #defines, so single-line #undefs
should be easy in comparison.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/ada/
* gcc-interface/utils.c: Don't undef DEF_BUILTIN.

gcc/c-family/
* c-common.c: Don't undef DEF_BUILTIN.

gcc/jit/
* jit-builtins.c: Don't undef DEF_BUILTIN.

gcc/lto/
* lto-lang.c: Don't undef DEF_BUILTIN.

gcc/
* builtins.def: #undef DEF_BUILTIN and DEF_BUILTIN_CHKP
* builtins.c, genmatch.c, tree-core.h: Don't undef them here.

diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index 8617a87..3b893b8 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -6040,7 +6040,6 @@ install_builtin_functions (void)
BOTH_P, FALLBACK_P, NONANSI_P,   \
built_in_attributes[(int) ATTRS], IMPLICIT);
 #include "builtins.def"
-#undef DEF_BUILTIN
 }
 
 /* --- *
diff --git a/gcc/builtins.c b/gcc/builtins.c
index add9fc8..ad661c1 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -78,7 +78,6 @@ const char * built_in_names[(int) END_BUILTINS] =
 {
 #include "builtins.def"
 };
-#undef DEF_BUILTIN
 
 /* Setup an array of builtin_info_type, make sure each element decl is
initialized to NULL_TREE.  */
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 076da40..ed850df 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -945,3 +945,6 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 
 /* Pointer Bounds Checker builtins.  */
 #include "chkp-builtins.def"
+
+#undef DEF_BUILTIN_CHKP
+#undef DEF_BUILTIN
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index c87704b..b4663ce 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5735,7 +5735,6 @@ c_define_builtins (tree va_list_ref_type_node, tree 
va_list_arg_type_node)
   BOTH_P, FALLBACK_P, NONANSI_P,   \
   built_in_attributes[(int) ATTRS], IMPLICIT);
 #include "builtins.def"
-#undef DEF_BUILTIN
 
   targetm.init_builtins ();
 
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index b5a0fff..241a628 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -208,7 +208,6 @@ enum built_in_function {
 #include "builtins.def"
 END_BUILTINS
 };
-#undef DEF_BUILTIN
 
 /* Return true if CODE represents a commutative tree code.  Otherwise
return false.  */
@@ -4598,7 +4597,6 @@ add_operator (VIEW_CONVERT2, "VIEW_CONVERT2", 
"tcc_unary", 1);
 #define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
   add_builtin (ENUM, # ENUM);
 #include "builtins.def"
-#undef DEF_BUILTIN
 
   /* Parse ahead!  */
   parser p (r);
diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index b28a5de..63ff5af 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -62,7 +62,6 @@ static const struct builtin_data builtin_data[] =
 {
 #include "builtins.def"
 };
-#undef DEF_BUILTIN
 
 /* Helper function for find_builtin_by_name.  */
 
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index be317a4..4805c2a 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -731,7 +731,6 @@ lto_define_builtins (tree va_list_ref_type_node 
ATTRIBUTE_UNUSED,
 builtin_types[(int) LIBTYPE], BOTH_P, FALLBACK_P,  \
 NONANSI_P, built_in_attributes[(int) ATTRS], IMPLICIT);
 #include "builtins.def"
-#undef DEF_BUILTIN
 }
 
 static GTY(()) tree registered_builtin_types;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 1c6976e..bd4e629 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -159,13 +159,10 @@ enum built_in_function {
 
   BEGIN_CHKP_BUILTINS,
 
-#undef DEF_BUILTIN
 #define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND)
-#undef DEF_BUILTIN_CHKP
 #define DEF_BUILTIN_CHKP(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
   ENUM##_CHKP = ENUM + BEGIN_CHKP_BUILTINS + 1,
 #include "builtins.def"
-#undef DEF_BUILTIN_CHKP
 
   END_CHKP_BUILTINS = BEGIN_CHKP_BUILTINS * 2 + 1,
 
@@ -186,7 +183,6 @@ enum built_in_function {
   /* Upper bound on non-language-specific builtins.  */
   END_BUILTINS
 };
-#undef DEF_BUILTIN
 
 /* Tree code classes.  Each tree_code has an associated code class
represented by a TREE_CODE_CLASS.  */



Re: [PATCH] gcc/config.gcc: fix typo for powerpc e6500 cpu_is_64bit

2015-11-06 Thread David Edelsohn
2015-11-06  Arnout Vandecappelle  
 * gcc/config.gcc: fix typo for powerpc e6500 cpu_is_64bit

For GCC, please don't send ChangeLog entries as diffs.

Applied.  Thanks.

- David


Re: [PATCH] x86 interrupt attribute

2015-11-06 Thread Uros Bizjak
On Fri, Nov 6, 2015 at 3:07 PM, Yulia Koval  wrote:
> Hi,
>
> I updated and reposted the patch. Regtested/bootstraped on
> x86_64/Linux and i686/Linux. Ok for trunk?

This version still emits insns from ix86_function_arg, so NAK.

Uros.


Move const char * -> int/fp folds to fold-const-call.c

2015-11-06 Thread Richard Sandiford
This patch moves folds that deal with constant string arguments and
return a constant integer or floating-point value.  For example, it
handles strcmp ("foo", "bar") but not strstr ("foobar", "bar"),
which wouldn't currently be accepted by the gimple folders.

The builtins.c folding for strlen (via c_strlen) is a bit more general
than what the fold-const-call.c code does (and more general than we need
for the gimple folders).  I've therefore left it as-is, even though it
partially duplicates the new code.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_nan): Delete.
(fold_builtin_memcmp): Remove case where both arguments are constant.
(fold_builtin_strcmp, fold_builtin_strncmp): Likewise.
(fold_builtin_strspn, fold_builtin_strcspn): Likewise.
(fold_builtin_1): Remove BUILT_IN_NAN* handling.
* fold-const-call.c: Include fold-const.h.
(host_size_t_cst_p): New function.
(build_cmp_result, fold_const_builtin_nan): Likewise.
(fold_const_call_1): New function, split out from...
(fold_const_call): ...here (for all three interfaces).  Handle
constant nan, nans, strlen, strcmp, strncmp, strspn and strcspn.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3f7fe3b..add9fc8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -143,7 +143,6 @@ static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
 static tree fold_builtin_strlen (location_t, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
-static tree fold_builtin_nan (tree, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
 static rtx expand_builtin_fabs (tree, rtx, rtx);
@@ -7264,26 +7263,6 @@ fold_builtin_inf (location_t loc, tree type, int warn)
   return build_real (type, real);
 }
 
-/* Fold a call to __builtin_nan or __builtin_nans with argument ARG.  */
-
-static tree
-fold_builtin_nan (tree arg, tree type, int quiet)
-{
-  REAL_VALUE_TYPE real;
-  const char *str;
-
-  if (!validate_arg (arg, POINTER_TYPE))
-return NULL_TREE;
-  str = c_getstr (arg);
-  if (!str)
-return NULL_TREE;
-
-  if (!real_nan (&real, str, quiet, TYPE_MODE (type)))
-return NULL_TREE;
-
-  return build_real (type, real);
-}
-
 /* Fold function call to builtin sincos, sincosf, or sincosl.  Return
NULL_TREE if no simplification can be made.  */
 
@@ -7378,8 +7357,6 @@ fold_builtin_memchr (location_t loc, tree arg1, tree 
arg2, tree len, tree type)
 static tree
 fold_builtin_memcmp (location_t loc, tree arg1, tree arg2, tree len)
 {
-  const char *p1, *p2;
-
   if (!validate_arg (arg1, POINTER_TYPE)
   || !validate_arg (arg2, POINTER_TYPE)
   || !validate_arg (len, INTEGER_TYPE))
@@ -7394,25 +7371,6 @@ fold_builtin_memcmp (location_t loc, tree arg1, tree 
arg2, tree len)
   if (operand_equal_p (arg1, arg2, 0))
 return omit_one_operand_loc (loc, integer_type_node, integer_zero_node, 
len);
 
-  p1 = c_getstr (arg1);
-  p2 = c_getstr (arg2);
-
-  /* If all arguments are constant, and the value of len is not greater
- than the lengths of arg1 and arg2, evaluate at compile-time.  */
-  if (tree_fits_uhwi_p (len) && p1 && p2
-  && compare_tree_int (len, strlen (p1) + 1) <= 0
-  && compare_tree_int (len, strlen (p2) + 1) <= 0)
-{
-  const int r = memcmp (p1, p2, tree_to_uhwi (len));
-
-  if (r > 0)
-   return integer_one_node;
-  else if (r < 0)
-   return integer_minus_one_node;
-  else
-   return integer_zero_node;
-}
-
   /* If len parameter is one, return an expression corresponding to
  (*(const unsigned char*)arg1 - (const unsigned char*)arg2).  */
   if (tree_fits_uhwi_p (len) && tree_to_uhwi (len) == 1)
@@ -7445,8 +7403,6 @@ fold_builtin_memcmp (location_t loc, tree arg1, tree 
arg2, tree len)
 static tree
 fold_builtin_strcmp (location_t loc, tree arg1, tree arg2)
 {
-  const char *p1, *p2;
-
   if (!validate_arg (arg1, POINTER_TYPE)
   || !validate_arg (arg2, POINTER_TYPE))
 return NULL_TREE;
@@ -7455,21 +7411,8 @@ fold_builtin_strcmp (location_t loc, tree arg1, tree 
arg2)
   if (operand_equal_p (arg1, arg2, 0))
 return integer_zero_node;
 
-  p1 = c_getstr (arg1);
-  p2 = c_getstr (arg2);
-
-  if (p1 && p2)
-{
-  const int i = strcmp (p1, p2);
-  if (i < 0)
-   return integer_minus_one_node;
-  else if (i > 0)
-   return integer_one_node;
-  else
-   return integer_zero_node;
-}
-
   /* If the second arg is "", return *(const unsigned char*)arg1.  */
+  const char *p2 = c_getstr (arg2);
   if (p2 && *p2 == '\0')
 {
   tree cst_uchar_node = build_type_variant (unsigned_char_type_node, 1, 0);
@@ -7484,6 +7427,7 @@ fold_builtin_strcmp (location_t loc, tree arg1, tree arg2)
 }
 
   /* If the first arg is "", return -*(const un

Move #undef DEF_INTERNAL_FN to internal-fn.def

2015-11-06 Thread Richard Sandiford
In practice the definition of DEF_INTERNAL_FN is never reused after
including internal-fn.def, so we might as well #undef it there.

This becomes more obvious with a later patch that adds other
DEF_INTERNAL_* directives, such as DEF_INTERNAL_OPTAB_FN.
If the includer doesn't care about the information carried in
these new directives, it can simply leave the macro undefined
and internals.def will provide a definition that forwards to
DEF_INTERNAL_FN.  It doesn't make much sense for includers to have
to #undef macros that are defined by internals.def and it seems overly
complicated to get internals.def to undef macros only in the cases
where it provided a definition.  Instead I went with the approach of
#undeffing all the DEF_INTERNAL_* macros unconditionally.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* internal-fn.def: #undef DEF_INTERNAL_FN at the end.
* internal-fn.c: Don't undef it here.
* tree-core.h: Likewise.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a7da373..f9f7746 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -43,7 +43,6 @@ along with GCC; see the file COPYING3.  If not see
 const char *const internal_fn_name_array[] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) #CODE,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   ""
 };
 
@@ -51,7 +50,6 @@ const char *const internal_fn_name_array[] = {
 const int internal_fn_flags_array[] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) FLAGS,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   0
 };
 
@@ -65,7 +63,6 @@ init_internal_fns ()
   if (FNSPEC) internal_fn_fnspec_array[IFN_##CODE] = \
 build_string ((int) sizeof (FNSPEC), FNSPEC ? FNSPEC : "");
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   internal_fn_fnspec_array[IFN_LAST] = 0;
 }
 
@@ -2054,7 +2051,6 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED)
 static void (*const internal_fn_expanders[]) (gcall *) = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   0
 };
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 78266d9..a2da2dd 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | 
ECF_LEAF, ".")
 
 /* OpenACC looping abstraction.  See internal-fn.h for usage.  */
 DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL)
+
+#undef DEF_INTERNAL_FN
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 6b17da7..1c6976e 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -768,7 +768,6 @@ enum annot_expr_kind {
 enum internal_fn {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) IFN_##CODE,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   IFN_LAST
 };
 



Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-06 Thread Martin Sebor

On 11/06/2015 05:55 AM, Rainer Orth wrote:

Martin Sebor  writes:


If we use gcc_checking_assert it won't fire in release builds; let's go
with that.


Okay. Attached is an updated patch with that change.


Unfortunately, this breaks i386-pc-solaris2.10 bootstrap:

/vol/gcc/src/hg/trunk/local/gcc/cp/init.c: In function 'void 
warn_placement_new_too_small(tree, tree, tree, tree)':
/vol/gcc/src/hg/trunk/local/gcc/cp/init.c:2454:17: error: format '%lu' expects 
argument of type 'long unsigned int', but argument 5 has type 'long long 
unsigned int' [-Werror=format=]
   bytes_avail);
  ^

Printing an unsigned HOST_WIDE_INT with %lu in one case, but %wu in the
other seems like a simple typo, so the following fixes bootstrap for me:


Yes, that was a typo. Sorry about that and thanks to ktkachov for
committing the fix!

Martin


Small C++ PATCH to clean up non-type template parameter handling

2015-11-06 Thread Jason Merrill
The old code here laments not being able to reuse the CONST_DECL created 
by process_template_parm, but we can get to it from the TEMPLATE_PARM_INDEX.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 02af5bca9bd6dbd3080bef614e411c56303d3c66
Author: Jason Merrill 
Date:   Thu Nov 5 16:20:12 2015 -0500

	* pt.c (push_inline_template_parms_recursive): Don't recreate the
	CONST_DECL.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 45eda3a..bfea8e2 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -441,21 +441,8 @@ push_inline_template_parms_recursive (tree parmlist, int levels)
 	  break;
 
 	case PARM_DECL:
-	  {
-	/* Make a CONST_DECL as is done in process_template_parm.
-	   It is ugly that we recreate this here; the original
-	   version built in process_template_parm is no longer
-	   available.  */
-	tree decl = build_decl (DECL_SOURCE_LOCATION (parm),
-CONST_DECL, DECL_NAME (parm),
-TREE_TYPE (parm));
-	DECL_ARTIFICIAL (decl) = 1;
-	TREE_CONSTANT (decl) = 1;
-	TREE_READONLY (decl) = 1;
-	DECL_INITIAL (decl) = DECL_INITIAL (parm);
-	SET_DECL_TEMPLATE_PARM_P (decl);
-	pushdecl (decl);
-	  }
+	  /* Push the CONST_DECL.  */
+	  pushdecl (TEMPLATE_PARM_DECL (DECL_INITIAL (parm)));
 	  break;
 
 	default:


Re: Don't treat rint as setting errno

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 04:02 PM, Richard Sandiford wrote:

builtins.def says that rint sets errno, but it looks like this might
be a mistake.  C99 says that rint doesn't set errno and the builtins.c
expansion code doesn't try to keep errno up to date.


[snip explanation of the history]

FWIW the manpage has this to say:

"SUSv2 and POSIX.1-2001 contain text about overflow (which might set 
errno to ERANGE, or raise an FE_OVERFLOW exception).  In practice, the 
result cannot overflow on any current machine, so this error-handling 
stuff is just  nonsense."



Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?


Ok.


Bernd


Re: Move c_getstr to fold-const.c

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 04:05 PM, Richard Sandiford wrote:

Upcoming patches to fold-const-call.c want to use c_getstr, which is
currently defined in builtins.c.  The function doesn't really do anything
related to built-ins, and I'd rather not make fold-const-call.c depend
on builtins.c and builtins.c depend on fold-const-call.c, so this patch
moves the function to fold-const.c instead.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?


Sure.


Bernd


Move constant bitop and bswap folds to fold-const-call.c

2015-11-06 Thread Richard Sandiford
The only folds left in builtins.c were for constants, so we can remove
the builtins.c handling entirely.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_bitop, fold_builtin_bswap): Delete.
(fold_builtin_1): Don't call them.
* fold-const-call.c: Include tm.h.
(fold_const_call_ss): New variant for integer-to-integer folds.
(fold_const_call): Call it.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 6eefd54..3f7fe3b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -148,7 +148,6 @@ static tree rewrite_call_expr (location_t, tree, int, tree, 
int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
 static rtx expand_builtin_fabs (tree, rtx, rtx);
 static rtx expand_builtin_signbit (tree, rtx);
-static tree fold_builtin_bitop (tree, tree);
 static tree fold_builtin_strchr (location_t, tree, tree, tree);
 static tree fold_builtin_memchr (location_t, tree, tree, tree, tree);
 static tree fold_builtin_memcmp (location_t, tree, tree, tree);
@@ -7332,99 +7331,6 @@ fold_builtin_sincos (location_t loc,
 fold_build1_loc (loc, REALPART_EXPR, type, call)));
 }
 
-/* Fold function call to builtin ffs, clz, ctz, popcount and parity
-   and their long and long long variants (i.e. ffsl and ffsll).  ARG is
-   the argument to the call.  Return NULL_TREE if no simplification can
-   be made.  */
-
-static tree
-fold_builtin_bitop (tree fndecl, tree arg)
-{
-  if (!validate_arg (arg, INTEGER_TYPE))
-return NULL_TREE;
-
-  /* Optimize for constant argument.  */
-  if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
-{
-  tree type = TREE_TYPE (arg);
-  int result;
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-   {
-   CASE_INT_FN (BUILT_IN_FFS):
- result = wi::ffs (arg);
- break;
-
-   CASE_INT_FN (BUILT_IN_CLZ):
- if (wi::ne_p (arg, 0))
-   result = wi::clz (arg);
- else if (! CLZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
-   result = TYPE_PRECISION (type);
- break;
-
-   CASE_INT_FN (BUILT_IN_CTZ):
- if (wi::ne_p (arg, 0))
-   result = wi::ctz (arg);
- else if (! CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
-   result = TYPE_PRECISION (type);
- break;
-
-   CASE_INT_FN (BUILT_IN_CLRSB):
- result = wi::clrsb (arg);
- break;
-
-   CASE_INT_FN (BUILT_IN_POPCOUNT):
- result = wi::popcount (arg);
- break;
-
-   CASE_INT_FN (BUILT_IN_PARITY):
- result = wi::parity (arg);
- break;
-
-   default:
- gcc_unreachable ();
-   }
-
-  return build_int_cst (TREE_TYPE (TREE_TYPE (fndecl)), result);
-}
-
-  return NULL_TREE;
-}
-
-/* Fold function call to builtin_bswap and the short, long and long long
-   variants.  Return NULL_TREE if no simplification can be made.  */
-static tree
-fold_builtin_bswap (tree fndecl, tree arg)
-{
-  if (! validate_arg (arg, INTEGER_TYPE))
-return NULL_TREE;
-
-  /* Optimize constant value.  */
-  if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
-{
-  tree type = TREE_TYPE (TREE_TYPE (fndecl));
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-   {
- case BUILT_IN_BSWAP16:
- case BUILT_IN_BSWAP32:
- case BUILT_IN_BSWAP64:
-   {
- signop sgn = TYPE_SIGN (type);
- tree result =
-   wide_int_to_tree (type,
- wide_int::from (arg, TYPE_PRECISION (type),
- sgn).bswap ());
- return result;
-   }
-   default:
- gcc_unreachable ();
-   }
-}
-
-  return NULL_TREE;
-}
-
 /* Fold function call to builtin memchr.  ARG1, ARG2 and LEN are the
arguments to the call, and TYPE is its return type.
Return NULL_TREE if no simplification can be made.  */
@@ -8364,19 +8270,6 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
 CASE_FLT_FN (BUILT_IN_NANS):
   return fold_builtin_nan (arg0, type, false);
 
-case BUILT_IN_BSWAP16:
-case BUILT_IN_BSWAP32:
-case BUILT_IN_BSWAP64:
-  return fold_builtin_bswap (fndecl, arg0);
-
-CASE_INT_FN (BUILT_IN_FFS):
-CASE_INT_FN (BUILT_IN_CLZ):
-CASE_INT_FN (BUILT_IN_CTZ):
-CASE_INT_FN (BUILT_IN_CLRSB):
-CASE_INT_FN (BUILT_IN_POPCOUNT):
-CASE_INT_FN (BUILT_IN_PARITY):
-  return fold_builtin_bitop (fndecl, arg0);
-
 case BUILT_IN_ISASCII:
   return fold_builtin_isascii (loc, arg0);
 
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index c277d2b..48e05a9 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "options.h"
 #include "fold-const-call.h"
+#include "tm.h" /* For C[LT]Z_DEFINED_AT_ZERO.

Handle constant fp classifications in fold-const-call.c

2015-11-06 Thread Richard Sandiford
Move the constant "is finite", "is infinite" and "is nan" queries
to fold-const-call.c.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_classify): Move constant cases to...
* fold-const-call.c (fold_const_call_ss): ...here.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 69c56e7..6eefd54 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -8018,7 +8018,6 @@ static tree
 fold_builtin_classify (location_t loc, tree fndecl, tree arg, int 
builtin_index)
 {
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
-  REAL_VALUE_TYPE r;
 
   if (!validate_arg (arg, REAL_TYPE))
 return NULL_TREE;
@@ -8029,16 +8028,6 @@ fold_builtin_classify (location_t loc, tree fndecl, tree 
arg, int builtin_index)
   if (!HONOR_INFINITIES (arg))
return omit_one_operand_loc (loc, type, integer_zero_node, arg);
 
-  if (TREE_CODE (arg) == REAL_CST)
-   {
- r = TREE_REAL_CST (arg);
- if (real_isinf (&r))
-   return real_compare (GT_EXPR, &r, &dconst0)
-  ? integer_one_node : integer_minus_one_node;
- else
-   return integer_zero_node;
-   }
-
   return NULL_TREE;
 
 case BUILT_IN_ISINF_SIGN:
@@ -8078,24 +8067,12 @@ fold_builtin_classify (location_t loc, tree fndecl, 
tree arg, int builtin_index)
  && !HONOR_INFINITIES (arg))
return omit_one_operand_loc (loc, type, integer_one_node, arg);
 
-  if (TREE_CODE (arg) == REAL_CST)
-   {
- r = TREE_REAL_CST (arg);
- return real_isfinite (&r) ? integer_one_node : integer_zero_node;
-   }
-
   return NULL_TREE;
 
 case BUILT_IN_ISNAN:
   if (!HONOR_NANS (arg))
return omit_one_operand_loc (loc, type, integer_zero_node, arg);
 
-  if (TREE_CODE (arg) == REAL_CST)
-   {
- r = TREE_REAL_CST (arg);
- return real_isnan (&r) ? integer_one_node : integer_zero_node;
-   }
-
   arg = builtin_save_expr (arg);
   return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
 
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 5af2c63..c277d2b 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -736,6 +736,31 @@ fold_const_call_ss (wide_int *result, built_in_function fn,
   /* Not yet folded to a constant.  */
   return false;
 
+CASE_FLT_FN (BUILT_IN_FINITE):
+case BUILT_IN_FINITED32:
+case BUILT_IN_FINITED64:
+case BUILT_IN_FINITED128:
+case BUILT_IN_ISFINITE:
+  *result = wi::shwi (real_isfinite (arg) ? 1 : 0, precision);
+  return true;
+
+CASE_FLT_FN (BUILT_IN_ISINF):
+case BUILT_IN_ISINFD32:
+case BUILT_IN_ISINFD64:
+case BUILT_IN_ISINFD128:
+  if (real_isinf (arg))
+   *result = wi::shwi (arg->sign ? -1 : 1, precision);
+  else
+   *result = wi::shwi (0, precision);
+  return true;
+
+CASE_FLT_FN (BUILT_IN_ISNAN):
+case BUILT_IN_ISNAND32:
+case BUILT_IN_ISNAND64:
+case BUILT_IN_ISNAND128:
+  *result = wi::shwi (real_isnan (arg) ? 1 : 0, precision);
+  return true;
+
 default:
   return false;
 }



[gomp4, committed] Fix double word typo in tree-inline.c

2015-11-06 Thread Tom de Vries

Hi,

reverting a patch in tree-inline.c in gomp-4_0-branch exposed a typo 
already fixed on trunk.  This patch fixes that.


Committed to gomp-4_0-branch.

Thanks,
- Tom

2015-11-06  Tom de Vries  

	backport from trunk:
	2015-07-12  Aldy Hernandez  

	* tree-inline.c: Fix double word typos.
---
 gcc/tree-inline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 3d06e6e..884131f 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4540,7 +4540,7 @@ expand_call_inline (basic_block bb, gimple *stmt, copy_body_data *id)
   id->src_cfun = DECL_STRUCT_FUNCTION (fn);
   id->call_stmt = stmt;
 
-  /* If the the src function contains an IFN_VA_ARG, then so will the dst
+  /* If the src function contains an IFN_VA_ARG, then so will the dst
  function after inlining.  */
   if ((id->src_cfun->curr_properties & PROP_gimple_lva) == 0)
 {
-- 
1.9.1



[gomp4, committed] Revert "Add dom_walker::walk_until"

2015-11-06 Thread Tom de Vries

Hi,

this patch reverts the "Add dom_walker::walk_until" patch.

The dom_walker::walk_until functionality is no longer required now that 
we've reverted pass_dominator::sese_mode_p.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Revert "Add dom_walker::walk_until"

2015-11-06  Tom de Vries  

	revert:
	2015-10-12  Tom de Vries  

	* domwalk.c (dom_walker::walk): Rename to ...
	(dom_walker::walk_until): ... this.  Add and handle until and
	until_inclusive parameters.
	(dom_walker::walk): Reimplement using dom_walker::walk_until.
	* domwalk.h (dom_walker::walk_until): Declare.
---
 gcc/domwalk.c | 32 +---
 gcc/domwalk.h |  2 --
 2 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/gcc/domwalk.c b/gcc/domwalk.c
index 6a205f0..167fc38 100644
--- a/gcc/domwalk.c
+++ b/gcc/domwalk.c
@@ -143,18 +143,11 @@ cmp_bb_postorder (const void *a, const void *b)
 }
 
 /* Recursively walk the dominator tree.
-   BB is the basic block we are currently visiting.  UNTIL is a basic_block that
-   is the root of a subtree that we won't visit.  If UNTIL_INCLUSIVE, we visit
-   UNTIL, but not it's children.  Otherwise don't visit UNTIL and its
-   children.  */
+   BB is the basic block we are currently visiting.  */
 
 void
-dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
+dom_walker::walk (basic_block bb)
 {
-  bool skip_self = (bb == until && !until_inclusive);
-  if (skip_self)
-return;
-
   basic_block dest;
   basic_block *worklist = XNEWVEC (basic_block,
    n_basic_blocks_for_fn (cfun) * 2);
@@ -188,15 +181,9 @@ dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
 	  worklist[sp++] = NULL;
 
 	  int saved_sp = sp;
-	  bool skip_children = bb == until && until_inclusive;
-	  if (!skip_children)
-	for (dest = first_dom_son (m_dom_direction, bb);
-		 dest; dest = next_dom_son (m_dom_direction, dest))
-	  {
-		bool skip_child = (dest == until && !until_inclusive);
-		if (!skip_child)
-		  worklist[sp++] = dest;
-	  }
+	  for (dest = first_dom_son (m_dom_direction, bb);
+	   dest; dest = next_dom_son (m_dom_direction, dest))
+	worklist[sp++] = dest;
 	  if (m_dom_direction == CDI_DOMINATORS)
 	switch (sp - saved_sp)
 	  {
@@ -230,12 +217,3 @@ dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
 }
   free (worklist);
 }
-
-/* Recursively walk the dominator tree.
-   BB is the basic block we are currently visiting.  */
-
-void
-dom_walker::walk (basic_block bb)
-{
-  walk_until (bb, NULL, true);
-}
diff --git a/gcc/domwalk.h b/gcc/domwalk.h
index 71e6075..71a7c47 100644
--- a/gcc/domwalk.h
+++ b/gcc/domwalk.h
@@ -34,8 +34,6 @@ public:
 
   /* Walk the dominator tree.  */
   void walk (basic_block);
-  /* Walk a part of the dominator tree.  */
-  void walk_until (basic_block, basic_block, bool);
 
   /* Function to call before the recursive walk of the dominator children.  */
   virtual void before_dom_children (basic_block) {}
-- 
1.9.1



Move c_getstr to fold-const.c

2015-11-06 Thread Richard Sandiford
Upcoming patches to fold-const-call.c want to use c_getstr, which is
currently defined in builtins.c.  The function doesn't really do anything
related to built-ins, and I'd rather not make fold-const-call.c depend
on builtins.c and builtins.c depend on fold-const-call.c, so this patch
moves the function to fold-const.c instead.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.h (c_getstr): Move to...
* fold-const.h (c_getstr): ...here.
* builtins.c (c_getstr): Move to...
* fold-const.c (c_getstr): ...here.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 8f0717c..69c56e7 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -616,27 +616,6 @@ c_strlen (tree src, int only_value)
   return ssize_int (strlen (ptr + offset));
 }
 
-/* Return a char pointer for a C string if it is a string constant
-   or sum of string constant and integer constant.  */
-
-const char *
-c_getstr (tree src)
-{
-  tree offset_node;
-
-  src = string_constant (src, &offset_node);
-  if (src == 0)
-return 0;
-
-  if (offset_node == 0)
-return TREE_STRING_POINTER (src);
-  else if (!tree_fits_uhwi_p (offset_node)
-  || compare_tree_int (offset_node, TREE_STRING_LENGTH (src) - 1) > 0)
-return 0;
-
-  return TREE_STRING_POINTER (src) + tree_to_uhwi (offset_node);
-}
-
 /* Return a constant integer corresponding to target reading
GET_MODE_BITSIZE (MODE) bits from string constant STR.  */
 
diff --git a/gcc/builtins.h b/gcc/builtins.h
index cce9e75..b039632 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -87,7 +87,6 @@ extern bool is_simple_builtin (tree);
 extern bool is_inexpensive_builtin (tree);
 
 extern bool readonly_data_expr (tree exp);
-extern const char *c_getstr (tree);
 extern bool init_target_chars (void);
 extern unsigned HOST_WIDE_INT target_newline;
 extern unsigned HOST_WIDE_INT target_percent;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ee9b349..3b6e898 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14397,3 +14397,24 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree 
ptr, HOST_WIDE_INT off)
   return fold_build2_loc (loc, POINTER_PLUS_EXPR, TREE_TYPE (ptr),
  ptr, size_int (off));
 }
+
+/* Return a char pointer for a C string if it is a string constant
+   or sum of string constant and integer constant.  */
+
+const char *
+c_getstr (tree src)
+{
+  tree offset_node;
+
+  src = string_constant (src, &offset_node);
+  if (src == 0)
+return 0;
+
+  if (offset_node == 0)
+return TREE_STRING_POINTER (src);
+  else if (!tree_fits_uhwi_p (offset_node)
+  || compare_tree_int (offset_node, TREE_STRING_LENGTH (src) - 1) > 0)
+return 0;
+
+  return TREE_STRING_POINTER (src) + tree_to_uhwi (offset_node);
+}
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 97d18cf..94a21b7 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -180,6 +180,7 @@ extern tree exact_inverse (tree, tree);
 extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (enum built_in_function);
+extern const char *c_getstr (tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */



Don't treat rint as setting errno

2015-11-06 Thread Richard Sandiford
builtins.def says that rint sets errno, but it looks like this might
be a mistake.  C99 says that rint doesn't set errno and the builtins.c
expansion code doesn't try to keep errno up to date.

Perhaps this was because earlier versions of POSIX said that
rint sets errno on overflow:

http://pubs.opengroup.org/onlinepubs/009695399/functions/rintf.html

However, this is another instance of the observation that "rounding
functions could never overflow" (because anything using exponents
that large is already integral).  The page above also says that
differences with C99 are unintentional and the ERANGE clause has
been removed from later versions of POSIX:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/rint.html

Also, the version of POSIX that lists ERANGE for rint does the same
for nearbyint:

http://pubs.opengroup.org/onlinepubs/009695399/functions/nearbyintf.html

and we already treat nearbyint as not setting errno.  This too has been
clarified in later versions of POSIX:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/nearbyint.html

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.def (BUILTIN_RINT, BUILTIN_RINTF, BUILTIN_RINTL): Use
ATTR_MATHFN_FPROUNDING rather than ATTR_MATHFN_FPROUNDING_ERRNO.

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 886b45c..076da40 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -475,9 +475,9 @@ DEF_C99_BUILTIN(BUILT_IN_REMAINDERL, "remainderl", 
BT_FN_LONGDOUBLE_LONG
 DEF_C99_BUILTIN(BUILT_IN_REMQUO, "remquo", 
BT_FN_DOUBLE_DOUBLE_DOUBLE_INTPTR, ATTR_MATHFN_FPROUNDING_STORE)
 DEF_C99_BUILTIN(BUILT_IN_REMQUOF, "remquof", 
BT_FN_FLOAT_FLOAT_FLOAT_INTPTR, ATTR_MATHFN_FPROUNDING_STORE)
 DEF_C99_BUILTIN(BUILT_IN_REMQUOL, "remquol", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_INTPTR, ATTR_MATHFN_FPROUNDING_STORE)
-DEF_C99_BUILTIN(BUILT_IN_RINT, "rint", BT_FN_DOUBLE_DOUBLE, 
ATTR_MATHFN_FPROUNDING_ERRNO)
-DEF_C99_BUILTIN(BUILT_IN_RINTF, "rintf", BT_FN_FLOAT_FLOAT, 
ATTR_MATHFN_FPROUNDING_ERRNO)
-DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", BT_FN_LONGDOUBLE_LONGDOUBLE, 
ATTR_MATHFN_FPROUNDING_ERRNO)
+DEF_C99_BUILTIN(BUILT_IN_RINT, "rint", BT_FN_DOUBLE_DOUBLE, 
ATTR_MATHFN_FPROUNDING)
+DEF_C99_BUILTIN(BUILT_IN_RINTF, "rintf", BT_FN_FLOAT_FLOAT, 
ATTR_MATHFN_FPROUNDING)
+DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", BT_FN_LONGDOUBLE_LONGDOUBLE, 
ATTR_MATHFN_FPROUNDING)
 DEF_C99_BUILTIN(BUILT_IN_ROUND, "round", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDF, "roundf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDL, "roundl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)



Add -fno-math-errno to gcc.dg/lto/20110201-1_0.c

2015-11-06 Thread Richard Sandiford
At the moment the ECF_* flags for a gimple call to a built-in
function are derived from the function decl, which in turn is
derived from the global command-line options.  So if the compiler
is run with -fno-math-errno, we always assume functions don't set
errno, regardless of local optimization options.  Similarly if the
compiler is run with -fmath-errno, we always assume functions set errno.

This shows up in gcc.dg/lto/20110201-1_0.c, where we compile
the file with -O0 and use -O2 -ffast-math for a specific function.
-O2 -ffast-math is enough for us to convert cabs to sqrt as hoped,
but because of the global -fmath-errno setting, we assume that the
call to sqrt is not pure or const and create vops for it.  This makes
it appear to the gimple code that a simple sqrt optab isn't enough.

Later patches move more decisions about maths functions to gimple
and think that in this case we should use:

y = sqrt (x);
if (!(x >= 0))
sqrt (x); // to set errno.

This is being tracked as PR68235.  For now the patch adds
-fno-math-errno to the dg-options for this test.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard

gcc/testsuite/
PR tree-optimization/68235
* gcc.dg/lto/20110201-1_0.c: Add -fno-math-errno.

diff --git a/gcc/testsuite/gcc.dg/lto/20110201-1_0.c 
b/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
index 068dddc..2144f07 100644
--- a/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
+++ b/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
@@ -1,6 +1,6 @@
 /* { dg-lto-do run } */
-/* { dg-lto-options { { -O0 -flto } } } */
-/* { dg-lto-options { "-O0 -flto -mfloat-abi=softfp -mfpu=neon-vfpv4" } { 
target arm*-*-* } } */
+/* { dg-lto-options { { -O0 -flto -fno-math-errno } } } */
+/* { dg-lto-options { "-O0 -flto -fno-math-errno -mfloat-abi=softfp 
-mfpu=neon-vfpv4" } { target arm*-*-* } } */
 /* { dg-require-linker-plugin "" } */
 /* { dg-require-effective-target sqrt_insn } */
 



Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 03:10 PM, Sebastian Pop wrote:

On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:

Formatting problem, here and in a few other places. I didn't fully read the
patch this time around.

I'm probably not reviewing further patches because I don't see this
progressing to a state where it's acceptable. Others may do so, but as far
as I'm concerned the patch is rejected.


Bernd,
I would like to ask you to focus on the technical part, and provide a
review only based on technical reasons.
Please ignore all formatting changes: I will help address all those changes.
I will send a patch addressing all the comments you had in the current review.


As long as this just has allocation from the normal stack frame as its 
only strategy, I consider it unacceptable (and I think Richard B voiced 
the same opinion). If you want a half-finished redzone allocator, I can 
send you a patch.



Bernd



Re: [PING] Re: [PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-11-06 Thread Jason Merrill

OK.

Jason


Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-06 Thread Kyrill Tkachov


On 06/11/15 00:56, Segher Boessenkool wrote:

On Thu, Nov 05, 2015 at 12:01:26PM +, Kyrill Tkachov wrote:

Thanks, that looks less intrusive. I did try it out on arm and aarch64.
It does work on the aarch64 testcase. However, there's also a correctness
regression, I'll try to explain inline

diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..3bf7ffb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int
in_cond, int unique_copy)
  || GET_CODE (SET_DEST (x)) == PC))
fmt = "ie";
  
+  /* Substituting into the operands of a widening MULT is not likely

+to create RTL matching a machine insn.  */
+  if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+   return x;

I think we should also add:
   && REG_P (XEXP (XEXP (x, 0), 0))
   && REG_P (XEXP (XEXP (x, 1), 0))

to the condition. Otherwise I've seen regressions in the arm testsuite, the
gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
(define_insn "*maddhisi4tb"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 (plus:SI (mult:SI (ashiftrt:SI
(match_operand:SI 1 "s_register_operand" "r")
(const_int 16))
   (sign_extend:SI
(match_operand:HI 2 "s_register_operand" "r")))
  (match_operand:SI 3 "s_register_operand" "r")))]


There we have a sign_extend of a shift that we want to convert to the form
that the pattern expects. So adding the checks for REG_P fixes that for me.

I'll have a look at this; I thought it should be handled with the new patch
(attached), but maybe not.


For the correctness issue I saw on aarch64 the shortest case I could reduce
is:
short int a[16], b[16];
void
f5 (void)
{
   a[0] = b[0] / 14;
}

(Without -mcpu=cortex-a53, or you just get a divide insn).


Is there a way that subst can signal some kind of "failed to substitute"
result?

Yep, see new patch.  The "from == to" condition is for when subst is called
just to simplify some code (normally with pc_rtx, pc_rtx).


Indeed, this looks better but it still needs the REG_P checks for the inner
operands of the extends to not screw up the arm case.
Here's my proposed extension. I've also modified the testcase slightly to not 
use
an unitialised variable. It still demonstrates the issue we're trying to solve.

Bootstrapped and tested on arm and aarch64.
I'll let you put it through it's paces on your setup :)

P.S. Do we want to restrict this to targets that have a widening mul optab like 
I did
in the original patch?

Thanks,
Kyrill

2015-11-06  Segher Boessenkool  
Kyrylo Tkachov  

* combine.c (subst): Don't substitute or simplify when
handling register-wise widening multiply.
(force_to_mode): Likewise.

2015-11-06  Kyrylo Tkachov  

* gcc.target/aarch64/umaddl_combine_1.c: New test.


If not, I got it to work by using that check to set the in_dest variable to
the
subsequent recursive call to subst, in a similar way to my original patch,
but I was
hoping to avoid overloading the meaning of in_dest.

Yes me too :-)


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..c806db9 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,20 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
  || GET_CODE (SET_DEST (x)) == PC))
fmt = "ie";
  
+  /* Substituting into the operands of a widening MULT is not likely

+to create RTL matching a machine insn.  */
+  if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+   {
+ if (from == to)
+   return x;
+ else
+   return gen_rtx_CLOBBER (GET_MODE (x), const0_rtx);
+   }
+
/* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
 constant.  */
if (fmt[0] == 'e')
@@ -8455,6 +8469,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned 
HOST_WIDE_INT mask,
/* ... fall through ...  */
  
  case MULT:

+  /* Substituting into the operands of a widening MULT is not likely to
+create RTL matching a machine insn.  */
+  if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+   return gen_lowpart_or_truncate (mode, x);
+
/* For PLUS, MINUS and MULT, we need any bits less significant than the
 most significant bit in MASK since carries from those bits will
 affe

Re: [gomp4 05/14] omp-low: set 'omp target entrypoint' only on entypoints

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 03:08 PM, Jakub Jelinek wrote:

On Fri, Nov 06, 2015 at 03:05:05PM +0100, Bernd Schmidt wrote:

This patch creates a new "omp target entrypoint" annotation that appears not
to be used - it would be better to just not annotate a function if it's not
going to need entrypoint treatment. IMO a single type of attribute should be
sufficient for that.


But NVPTX is just one backend, perhaps other backends need different
treatment of the entry points?


If we don't know, then it's not a problem we have to solve now. We can 
change it at any time later. For now, let's just keep it simple - no 
need to invent special annotations that end up unused.



Bernd


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Sebastian Pop
On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:
> Formatting problem, here and in a few other places. I didn't fully read the
> patch this time around.
>
> I'm probably not reviewing further patches because I don't see this
> progressing to a state where it's acceptable. Others may do so, but as far
> as I'm concerned the patch is rejected.

Bernd,
I would like to ask you to focus on the technical part, and provide a
review only based on technical reasons.
Please ignore all formatting changes: I will help address all those changes.
I will send a patch addressing all the comments you had in the current review.

Thanks,
Sebastian


Re: [gomp4 05/14] omp-low: set 'omp target entrypoint' only on entypoints

2015-11-06 Thread Jakub Jelinek
On Fri, Nov 06, 2015 at 03:05:05PM +0100, Bernd Schmidt wrote:
> This patch creates a new "omp target entrypoint" annotation that appears not
> to be used - it would be better to just not annotate a function if it's not
> going to need entrypoint treatment. IMO a single type of attribute should be
> sufficient for that.

But NVPTX is just one backend, perhaps other backends need different
treatment of the entry points?

Jakub


Re: [PATCH] x86 interrupt attribute

2015-11-06 Thread Yulia Koval
Hi,

I updated and reposted the patch. Regtested/bootstraped on
x86_64/Linux and i686/Linux. Ok for trunk?

Implement x86 interrupt attribute

The interrupt and exception handlers are called by x86 processors.  X86
hardware pushes information onto stack and calls the handler.  The
requirements are

1. Both interrupt and exception handlers must use the 'IRET' instruction,
instead of the 'RET' instruction, to return from the handlers.
2. All registers are callee-saved in interrupt and exception handlers.
3. The difference between interrupt and exception handlers is the
exception handler must pop 'ERROR_CODE' off the stack before the 'IRET'
instruction.

The design goals of interrupt and exception handlers for x86 processors
are:

1. Support both 32-bit and 64-bit modes.
2. Flexible for compilers to optimize.
3. Easy to use by programmers.

To implement interrupt and exception handlers for x86 processors, a
compiler should support:

'interrupt' attribute

Use this attribute to indicate that the specified function with
mandatory arguments is an interrupt or exception handler.  The compiler
generates function entry and exit sequences suitable for use in an
interrupt handler when this attribute is present.  The 'IRET' instruction,
instead of the 'RET' instruction, is used to return from interrupt or
exception handlers.  All registers, except for the EFLAGS register which
is restored by the 'IRET' instruction, are preserved by the compiler.

Any interruptible-without-stack-switch code must be compiled with
-mno-red-zone since interrupt handlers can and will, because of the
hardware design, touch the red zone.

1. interrupt handler must be declared with a mandatory pointer argument:

struct interrupt_frame;

__attribute__ ((interrupt))
void
f (struct interrupt_frame *frame)
{
...
}

and user must properly define the structure the pointer pointing to.

2. exception handler:

The exception handler is very similar to the interrupt handler with
a different mandatory function signature:

typedef unsigned int uword_t __attribute__ ((mode (__word__)));

struct interrupt_frame;

__attribute__ ((interrupt))
void
f (struct interrupt_frame *frame, uword_t error_code)
{
...
}

and compiler pops the error code off stack before the 'IRET' instruction.

The exception handler should only be used for exceptions which push an
error code and all other exceptions must use the interrupt handler.
The system will crash if the wrong handler is used.

To be feature complete, compiler may implement the optional
'no_caller_saved_registers' attribute:

Use this attribute to indicate that the specified function has no
caller-saved registers.  That is, all registers are callee-saved.
The compiler generates proper function entry and exit sequences to
save and restore any modified registers.

The user can call functions specified with 'no_caller_saved_registers'
attribute from an interrupt handler without saving and restoring all
call clobbered registers.

gcc/

PR target/66960
PR target/67630
PR target/67634
PR target/68037
* config/i386/i386-protos.h (ix86_epilogue_uses): New prototype.
* config/i386/i386.c (ix86_frame): Add nbndregs, nmaskregs,
bnd_reg_save_offset and mask_reg_save_offset.
(ix86_conditional_register_usage): Preserve all registers,
except for function return registers if there are no caller-saved
registers.
(ix86_set_current_function): Set no_caller_saved_registers and
func_type.  Call reinit_regs if AX register usage isn't
consistent.
(ix86_function_ok_for_sibcall): Return false if there are no
caller-saved registers.
(type_natural_mode): Don't warn ABI change for MMX in interrupt
handler.
(ix86_function_arg_advance): Skip for callee in interrupt
handler.
(ix86_function_arg): Handle arguments for callee in interrupt
handler.
(ix86_can_use_return_insn_p): Don't use `ret' instruction in
interrupt handler.
(ix86_epilogue_uses): New function.
(ix86_hard_regno_scratch_ok): Likewise.
(ix86_reg_ever_defined_p): Likewise.
(ix86_nsaved_bndregs): Likewise.
(ix86_nsaved_maskregs): Likewise.
(ix86_emit_save_bnd_regs_using_mov): Likewise.
(ix86_emit_save_mask_regs_using_mov): Likewise.
(ix86_emit_restore_bnd_regs_using_mov): Likewise.
(ix86_emit_restore_mask_regs_using_mov): Likewise.
(ix86_handle_no_caller_saved_registers_attribute): Likewise.
(ix86_handle_interrupt_attribute): Likewise.
(ix86_save_reg): Preserve all registers in interrupt function
after reload.  Preserve all registers, except for function
return registers, if there are no caller-saved registers after
reload.
(ix86_nsaved_sseregs): Don't return 0 if there are no
caller-saved registers.
(ix86_compute_frame_layout): Set nbndregs and nmaskregs.  Set
and allocate BND and MASK register save areas.  Allocate space to
save full vector registers if there are no caller-saved registers.
(ix86_emit_save_reg_using_mov): Don't use UNSPEC_STOREU to
SSE registers.
(ix86_emit_save_sse_regs_using_mov): Save full vector registers
if there are no caller-saved registers.
(find_drap_reg): Al

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-11-06 Thread Jakub Jelinek
On Fri, Nov 06, 2015 at 03:00:30PM +0100, Bernd Schmidt wrote:
> >Sanity-checked by running the libgomp testsuite.  I realize the #ifdef in
> >internal-fn.c is not appropriate: it's there to make the patch smaller, I'll
> >replace it with a target hook if otherwise this approach is ok.
> 
> FWIW, no objections from me regarding the approach.

As I said on IRC, I fear it is not a general solution, and will try to write
a testcase that demonstrates that soon.
That said, as a temporary partial workaround it might be acceptable, but
1) there really should be a target hook
2) it should never be emitted when not in target regions (declare target
   functions or when inside of OpenMP target region)
3) it should be folded away as soon as possible for the non-PTX
   targets (both host and say XeonPhi), so in the openacc pass shortly post 
   IPA (or look at if something that will come with the HSA merge could help 
there)

Jakub


Re: [gomp4 05/14] omp-low: set 'omp target entrypoint' only on entypoints

2015-11-06 Thread Bernd Schmidt

On 10/30/2015 05:44 PM, Alexander Monakov wrote:

+  /* Ignore "omp target entrypoint" here: OpenMP target region functions are
+ called from gomp_nvptx_main.  The corresponding kernel entry is emitted
+ from write_omp_entry.  */
  }


I'm probably confused, but didn't we agree that this should be changed 
so that the entry point isn't gomp_nvptx_main but instead something that 
wraps a call to that function?


This patch creates a new "omp target entrypoint" annotation that appears 
not to be used - it would be better to just not annotate a function if 
it's not going to need entrypoint treatment. IMO a single type of 
attribute should be sufficient for that.



Bernd



Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-11-06 Thread Bernd Schmidt

Sanity-checked by running the libgomp testsuite.  I realize the #ifdef in
internal-fn.c is not appropriate: it's there to make the patch smaller, I'll
replace it with a target hook if otherwise this approach is ok.


FWIW, no objections from me regarding the approach.


Bernd


[PATCH] gcc/config.gcc: fix typo for powerpc e6500 cpu_is_64bit

2015-11-06 Thread Arnout Vandecappelle (Essensium/Mind)
Otherwise, it will not be treated as a 64-bit CPU even if the target
tuple specifies powerpc64.
---
This is a trivial fix so I hope no copyright assignment is required. I
submit this change in the public domain.

This is my first time submitting anything to gcc, so if I should do
someting different please let me know.

---
 gcc/ChangeLog  | 3 +++
 gcc/config.gcc | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3966f51..82e4779 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,6 @@
+2015-11-06  Arnout Vandecappelle  
+   * gcc/config.gcc: fix typo for powerpc e6500 cpu_is_64bit
+
 2015-11-06  Kyrylo Tkachov  
 
PR target/68088
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4a7cbd2..9cc765e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -439,7 +439,7 @@ powerpc*-*-*)
cpu_type=rs6000
extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h 
spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
case x$with_cpu in
-   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[345678]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|Xe6500)
+   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[345678]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
cpu_is_64bit=yes
;;
esac
-- 
2.6.2



Re: [PATCH] Do not allow irreducible loops/regions in a scop

2015-11-06 Thread Sebastian Pop
On Thu, Nov 5, 2015 at 10:12 PM, Aditya Kumar  wrote:
> Irreducible regions are not going to be optimized by ISL
> so discard them early.
>
> gcc/ChangeLog:
>
> 2015-11-06  Aditya Kumar  
>
> * graphite-scop-detection.c (scop_detection::merge_sese):
> (scop_detection::can_represent_loop_1):
> (scop_detection::harmful_stmt_in_region):

Ok.
I will add "Check flag *_IRREDUCIBLE_LOOP." to the changelog before committing.
Thanks for taking care of this.

Sebastian
>
>
> ---
>  gcc/graphite-scop-detection.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
> index 8d67883..4e19b63 100644
> --- a/gcc/graphite-scop-detection.c
> +++ b/gcc/graphite-scop-detection.c
> @@ -605,7 +605,8 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
> const
>   get_entry_bb (second));
>
>edge entry = get_nearest_dom_with_single_entry (dom);
> -  if (!entry)
> +
> +  if (!entry || (entry->flags & EDGE_IRREDUCIBLE_LOOP))
>  return invalid_sese;
>
>basic_block pdom = nearest_common_dominator (CDI_POST_DOMINATORS,
> @@ -614,7 +615,8 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
> const
>pdom = nearest_common_dominator (CDI_POST_DOMINATORS, dom, pdom);
>
>edge exit = get_nearest_pdom_with_single_exit (pdom);
> -  if (!exit)
> +
> +  if (!exit || (exit->flags & EDGE_IRREDUCIBLE_LOOP))
>  return invalid_sese;
>
>sese_l combined (entry, exit);
> @@ -734,6 +736,7 @@ scop_detection::can_represent_loop_1 (loop_p loop, sese_l 
> scop)
>struct tree_niter_desc niter_desc;
>
>return single_exit (loop)
> +&& (loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
>  && number_of_iterations_exit (loop, single_exit (loop), &niter_desc, 
> false)
>  && niter_desc.control.no_overflow
>  && (niter = number_of_latch_executions (loop))
> @@ -864,6 +867,10 @@ scop_detection::harmful_stmt_in_region (sese_l scop) 
> const
>if (!dominated_by_p (CDI_POST_DOMINATORS, bb, exit_bb))
> continue;
>
> +  /* The basic block should not be part of an irreducible loop.  */
> +  if (bb->flags & BB_IRREDUCIBLE_LOOP)
> +return true;
> +
>if (harmful_stmt_in_bb (scop, bb))
> return true;
>  }
> --
> 2.1.0.243.g30d45f7
>


[AArch64] Fix vqtb[lx][234] on big-endian

2015-11-06 Thread Christophe Lyon
Hi,

As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
are failing on aarch64_be.

The attached patch fixes them, and rewrites them using new builtins
instead of inline assembly.

I wondered about the names of the new builtins, I hope I got them
right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.

I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
use  and share the code with the v16qi variants.

In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
file, in alphabetical order, although the comment says "Start of
optimal implementations in approved order": the previous ones really
seem to be in alphabetical order.

And I added a new testcase, skipped for arm* targets.

This has been tested on aarch64-none-elf and aarch64_be-none-elf
targets, using the Foundation model.

OK?

Christophe.
2015-11-06  Christophe Lyon  

gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.

gcc/
* config/aarch64/aarch64-simd-builtins.def: Update builtins
tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
* config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
(aarch64_tbl3) ... this, which supports v16qi too.
(aarch64_tbx4v8qi): Rename to...
aarch64_tbx4): ... this.
(aarch64_qtbl3): New pattern.
(aarch64_qtbx3): New pattern.
(aarch64_qtbl4): New pattern.
(aarch64_qtbx4): New pattern.
* config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
(vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
(vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
(vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
(vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
(vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
(vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
(vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
functions.
commit dedb311cc98bccd1633b77b60362e97dc8b9ce51
Author: Christophe Lyon 
Date:   Thu Nov 5 22:40:09 2015 +0100

[AArch64] Fix vqtb[lx]X[q] on big-endian.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 654e963..594fc33 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -407,8 +407,26 @@
   VAR1 (BINOPP, crypto_pmull, 0, di)
   VAR1 (BINOPP, crypto_pmull, 0, v2di)
 
-  /* Implemented by aarch64_tbl3v8qi.  */
+  /* Implemented by aarch64_tbl3.  */
   VAR1 (BINOP, tbl3, 0, v8qi)
+  VAR1 (BINOP, tbl3, 0, v16qi)
 
-  /* Implemented by aarch64_tbx4v8qi.  */
+  /* Implemented by aarch64_qtbl3.  */
+  VAR1 (BINOP, qtbl3, 0, v8qi)
+  VAR1 (BINOP, qtbl3, 0, v16qi)
+
+  /* Implemented by aarch64_qtbl4.  */
+  VAR1 (BINOP, qtbl4, 0, v8qi)
+  VAR1 (BINOP, qtbl4, 0, v16qi)
+
+  /* Implemented by aarch64_tbx4.  */
   VAR1 (TERNOP, tbx4, 0, v8qi)
+  VAR1 (TERNOP, tbx4, 0, v16qi)
+
+  /* Implemented by aarch64_qtbx3.  */
+  VAR1 (TERNOP, qtbx3, 0, v8qi)
+  VAR1 (TERNOP, qtbx3, 0, v16qi)
+
+  /* Implemented by aarch64_qtbx4.  */
+  VAR1 (TERNOP, qtbx4, 0, v8qi)
+  VAR1 (TERNOP, qtbx4, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 65a2b6f..f330300 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4777,24 +4777,70 @@
   [(set_attr "type" "neon_tbl2_q")]
 )
 
-(define_insn "aarch64_tbl3v8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=w")
-   (unspec:V8QI [(match_operand:OI 1 "register_operand" "w")
- (match_operand:V8QI 2 "register_operand" "w")]
+(define_insn "aarch64_tbl3"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+   (unspec:VB [(match_operand:OI 1 "register_operand" "w")
+ (match_operand:VB 2 "register_operand" "w")]
  UNSPEC_TBL))]
   "TARGET_SIMD"
-  "tbl\\t%S0.8b, {%S1.16b - %T1.16b}, %S2.8b"
+  "tbl\\t%S0., {%S1.16b - %T1.16b}, %S2."
   [(set_attr "type" "neon_tbl3")]
 )
 
-(define_insn "aarch64_tbx4v8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=w")
-   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
+(define_insn "aarch64_tbx4"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+   (unspec:VB [(match_operand:VB 1 "register_operand" "0")
  (match_operand:OI 2 "register_operand" "w")
- (match_operand:V8QI 3 "register_operand" "w")]
+ (match_operand:VB 3 "register_operand" "w")]
+ UNSPEC_TBX))]
+  "TARGET_SIMD"
+  "tbx\\t%S0., {%S2.16b - %T2.16b}, %S3."
+  [(set_attr "type" "neon_tbl4")]
+)
+
+;; Three source registers.
+
+(define_insn "aarch64_qtbl3"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+   (unspec:VB [(match_operand:CI 1 "register_operand" "w")
+ (match

Re: [OpenACC] declare directive

2015-11-06 Thread James Norris

Jakub,

Ping.

Do you need more information before you can review this patch?

Thanks!
Jim


On 10/27/2015 03:18 PM, James Norris wrote:

Hi!

 This patch adds the processing of OpenACC declare directive in C
 and C++. (Note: Support in Fortran is already in trunk.)
 Commentary on the changes is included as an attachment (NOTES).

 All of the code is in the gomp-4_0-branch.

 Regtested on x86_64-linux.

 Thanks!
 Jim




Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Ramana Radhakrishnan

> Hi!
> 
> I faced the same issue but I had somewhat different RTL for the consumer:
> 
> (insn 20 15 21 2 (set (reg/i:SI 0 r0)
> (minus:SI (subreg:SI (reg:DI 117) 4)
> (mult:SI (reg:SI 123)
> (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})
> 
> where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
> really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
> cycle in this case?

If the accumulator can be forwarded (i.e. a SImode register), there isn't a 
reason why a subreg:SI (reg:DI) will not get forwarded.

The subreg:SI is an artifact before register allocation, thus it's a 
representation issue that the patch is fixing here unless I misunderstand your 
question.

regards
Ramana


> 
> Thanks,
> Nikolai
> 



Re: OpenACC declare directive updates

2015-11-06 Thread James Norris

Jakub,

Ping

Do you need more information before you can review this patch?

Thanks!
Jim


On 11/04/2015 06:32 AM, James Norris wrote:


 This patch updates the processing of OpenACC declare directive for
 Fortran in the following areas:

 1) module support
 2) device_resident and link clauses
 3) clause checking
 4) directive generation

 Commentary on the changes is included as an attachment (NOTES).

 All of the code is in the gomp-4_0-branch.

 Regtested on x86_64-linux.

 Thanks!
 Jim




Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-06 Thread Nathan Sidwell

On 11/06/15 01:50, Jakub Jelinek wrote:

On Thu, Nov 05, 2015 at 06:10:49PM -0800, Cesar Philippidis wrote:

I've applied this patch to trunk. It also includes the fortran and
template changes. Note that there is a new regression in
gfortran.dg/goacc/combined_loop.f90. Basically, the gimplifier is
complaining about reduction variables appearing in multiple clauses.
E.g. 'acc parallel reduction(+:var) copy(var)'. Nathan's upcoming
gimplifier changes should address that.


If you are relying on the OMP_CLAUSE_MAP_PRIVATE flag that I've added
on gomp-4_1-branch and then removed yesterday, feel free to re-add it,
but of course never set it for OpenMP, just OpenACC constructs
(so for OpenMP keep the gimplifier assertion, for OpenACC set it).



FWIW not noticed a problem with my firstprivate reworking, rebased ontop 
yesterday's  openmp merge


nathan

--
Nathan Sidwell


Re: [PATCH] Fix memory leaks

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Richard Biener wrote:

> 
> A few, spotted with valgrind.  One is even mine ;)
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

And this is what I committed (one extra leak in postreload-gcse).

Richard.

2015-11-06  Richard Biener  

* tree-ssa-sccvn.c (class sccvn_dom_walker): Add destructor.
* lra.c (init_reg_info): Truncate copy_vec instead of
re-allocating a new one and leaking the old.
* ipa-inline-analysis.c (estimate_function_body_sizes): Free
bb_infos vec.
* sched-deps.c (sched_deps_finish): Free the dn/dl pools.
* postreload-gcse.c (free_mem): Free modify_mem_list and
canon_modify_mem_list.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 229842)
--- gcc/tree-ssa-sccvn.c(working copy)
*** class sccvn_dom_walker : public dom_walk
*** 4154,4159 
--- 4199,4205 
  public:
sccvn_dom_walker ()
  : dom_walker (CDI_DOMINATORS), fail (false), cond_stack (vNULL) {}
+   ~sccvn_dom_walker ();
  
virtual void before_dom_children (basic_block);
virtual void after_dom_children (basic_block);
*** public:
*** 4168,4173 
--- 4214,4224 
  cond_stack;
  };
  
+ sccvn_dom_walker::~sccvn_dom_walker ()
+ {
+   cond_stack.release ();
+ }
+ 
  /* Record a temporary condition for the BB and its dominated blocks.  */
  
  void
Index: gcc/ipa-inline-analysis.c
===
*** gcc/ipa-inline-analysis.c   (revision 229842)
--- gcc/ipa-inline-analysis.c   (working copy)
*** estimate_function_body_sizes (struct cgr
*** 2853,2858 
--- 2853,2859 
inline_summaries->get (node)->self_time = time;
inline_summaries->get (node)->self_size = size;
nonconstant_names.release ();
+   fbi.bb_infos.release ();
if (opt_for_fn (node->decl, optimize))
  {
if (!early)
Index: gcc/sched-deps.c
===
*** gcc/sched-deps.c(revision 229842)
--- gcc/sched-deps.c(working copy)
*** void
*** 4092,4100 
  sched_deps_finish (void)
  {
gcc_assert (deps_pools_are_empty_p ());
!   dn_pool->release_if_empty ();
dn_pool = NULL;
-   dl_pool->release_if_empty ();
dl_pool = NULL;
  
h_d_i_d.release ();
--- 4092,4100 
  sched_deps_finish (void)
  {
gcc_assert (deps_pools_are_empty_p ());
!   delete dn_pool;
!   delete dl_pool;
dn_pool = NULL;
dl_pool = NULL;
  
h_d_i_d.release ();
Index: gcc/lra.c
===
--- gcc/lra.c   (revision 229843)
+++ gcc/lra.c   (working copy)
@@ -1293,7 +1293,7 @@ init_reg_info (void)
   lra_reg_info = XNEWVEC (struct lra_reg, reg_info_size);
   for (i = 0; i < reg_info_size; i++)
 initialize_lra_reg_info_element (i);
-  copy_vec.create (100);
+  copy_vec.truncate (0);
 }
 
 
Index: gcc/postreload-gcse.c
===
--- gcc/postreload-gcse.c   (revision 229842)
+++ gcc/postreload-gcse.c   (working copy)
@@ -348,6 +348,8 @@ free_mem (void)
   BITMAP_FREE (blocks_with_calls);
   BITMAP_FREE (modify_mem_list_set);
   free (reg_avail_info);
+  free (modify_mem_list);
+  free (canon_modify_mem_list);
 }
 
 


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-06 Thread Nathan Sidwell

On 11/05/15 21:10, Cesar Philippidis wrote:

I've applied this patch to trunk. It also includes the fortran and
template changes. Note that there is a new regression in
gfortran.dg/goacc/combined_loop.f90. Basically, the gimplifier is
complaining about reduction variables appearing in multiple clauses.
E.g. 'acc parallel reduction(+:var) copy(var)'. Nathan's upcoming
gimplifier changes should address that.

Also, because of these reduction problems, I decided not to merge
combined_loops.f90 with combined-directives.f90 yet because the latter
relies on scanning which would fail with the errors detected during
gimplfication. I'm planning on adding a couple of more test cases once
acc reductions are working on trunk.


Reductions are already on trunk.  do you mean one or both of:
1) firstprivate
2) default handlinng (depends on #1)

I expect to post #1 today.  #2 (a smaller patch) may not make it today, as I 
have to rebase it on the reworking of #1 I did to remove the two enums I 
disucssed with Jakub.


nathan
--
Nathan Sidwell


[PATCH][cp][committed] Fix bootstrap on arm due to print format warning

2015-11-06 Thread Kyrill Tkachov

Hi all,

With Martins' patch at https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00492.html
I'm seeing an arm bootstrap error due to a warning in the print format 
addressed in this patchlet.
bytes_avail is an unsigned HOST_WIDE_INT and so needs the %wu print format 
rather than %lu which
may not be correct on all platforms.

Bootstrap on arm-none-linux-gnueabihf now passes so I'm committing this to 
trunk as obvious.

Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

* init.c (warn_placement_new_too_small): Use %wu format
rather than %lu when printing bytes_avail.
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 7386499..337797c 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2447,7 +2447,7 @@ warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper)
 			  "%<%T [%wu]%> and size %qwu in a region of type %qT "
 			  "and size %qwi"
 			  : "placement new constructing an object of type "
-			  "%<%T [%lu]%> and size %qwu in a region of type %qT "
+			  "%<%T [%wu]%> and size %qwu in a region of type %qT "
 			  "and size at most %qwu",
 			  type, tree_to_uhwi (nelts), bytes_need,
 			  TREE_TYPE (oper),


[PATCH, ARM, v2] PR target/68059 libgcc should not use __write for printing fatal error

2015-11-06 Thread Szabolcs Nagy

libgcc/config/arm/linux-atomic-64bit.c uses __write to print an error
message if the 64bit cmpxchg method is not available in the kernel.

__write is not part of the public libc abi, so use write instead.
(user code may define write in iso c conforming mode and then the
error message may not be visible before the crash.)

The return type in the declaration of write is fixed too.

OK for trunk and backporting?

libgcc/ChangeLog:

2015-11-06  Szabolcs Nagy  

PR target/68059
* config/arm/linux-atomic-64bit.c (__write): Rename to...
(write): ...this and fix the return type.
diff --git a/libgcc/config/arm/linux-atomic-64bit.c b/libgcc/config/arm/linux-atomic-64bit.c
index cdf713c..894450e 100644
--- a/libgcc/config/arm/linux-atomic-64bit.c
+++ b/libgcc/config/arm/linux-atomic-64bit.c
@@ -33,7 +33,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
kernels; we check for that in an init section and bail out rather
unceremoneously.  */
 
-extern unsigned int __write (int fd, const void *buf, unsigned int count);
+extern int write (int fd, const void *buf, unsigned int count);
 extern void abort (void);
 
 /* Kernel helper for compare-and-exchange.  */
@@ -56,7 +56,7 @@ static void __check_for_sync8_kernelhelper (void)
 	 for the user - I'm not sure I can rely on much else being
 	 available at this point, so do the same as generic-morestack.c
 	 write () and abort ().  */
-  __write (2 /* stderr.  */, err, sizeof (err));
+  write (2 /* stderr.  */, err, sizeof (err));
   abort ();
 }
 };


Unreviewed patch

2015-11-06 Thread Rainer Orth
The following patch has remained unrevied for a month:

[build] Support init priority on Solaris
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00716.html

It needs build and ia64 maintainers and someone familiar with the init
priority support to review.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-06 Thread Yuri Rumyantsev
Richard,

I tried it but 256-bit precision integer type is not yet supported.

Yuri.


2015-11-06 15:56 GMT+03:00 Richard Biener :
> On Mon, Nov 2, 2015 at 4:24 PM, Yuri Rumyantsev  wrote:
>> Hi Richard,
>>
>> I've come back to this optimization and try to implement your proposal
>> for comparison:
>>> Btw, you didn't try the simpler alternative of
>>>
>>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>>> build2 (EQ_EXPR, boolean_type_node,
>>>  build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>>
>>> ?  That is, use the GIMPLE level equivalent of
>>>  (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>>
>> using the following code:
>>
>>   vectype = TREE_TYPE (mask);
>>   ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)),
>> MODE_INT, 0);
>>   ext_type = lang_hooks.types.type_for_mode (ext_mode , 1);
>>
>> but I've got zero type for it. Should I miss something?
>
> Use ext_type = build_nonstandard_integer_type (GET_MODE_PRECISION
> (ext_mode), 1);
>
> Richard.
>
>> Any help will be appreciated.
>> Yuri.
>>
>>
>> 2015-08-13 14:40 GMT+03:00 Richard Biener :
>>> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev  wrote:
 Hi Richard,

 Did you have a chance to look at updated patch?
>>>
>>> Having a quick look now.  Btw, you didn't try the simpler alternative of
>>>
>>>  tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>>>  build2 (EQ_EXPR, boolean_type_node,
>>>build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>>
>>> ?  That is, use the GIMPLE level equivalent of
>>>
>>>  (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>>>
>>> ?  That should be supported by the expander already, though again not sure 
>>> if
>>> the target(s) have compares that match this.
>>>
>>> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction
>>> on EQ/NE_EXPR
>>> is missing.  Operand type equality is tested anyway.
>>>
>>> Why do you need to restrict forward_propagate_into_comparison_1?
>>>
>>> Otherwise this looks better, but can you try with the VIEW_CONVERT as well?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
 Thanks.
 Yuri.

 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev :
> HI All,
>
> Here is updated patch which implements Richard proposal to use vector
> comparison with boolean result instead of target hook. Support for it
> was added to ix86_expand_branch.
>
> Any comments will be appreciated.
>
> Bootstrap and regression testing did not show any new failures.
>
> ChangeLog:
> 2015-08-06  Yuri Rumyantsev  
>
> * config/i386/i386.c (ix86_expand_branch): Implement vector
> comparison with boolean result.
> * config/i386/sse.md (define_expand "cbranch4): Add define
> for vector comparion.
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM.
> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros.
> * tree-cfg.c (verify_gimple_comparison): Add test for vector
> comparion with boolean result.
> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not
> propagate vector comparion with boolean result.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h.
> (is_valid_sink): New function.
> (optimize_mask_stores): New function.
> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked
> stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>
>
> 2015-07-27 11:48 GMT+03:00 Richard Biener :
>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law  wrote:
>>> On 07/24/2015 03:16 AM, Richard Biener wrote:
>
> Is there any rationale given anywhere for the transformation into
> conditional expressions?  ie, is there any reason why we can't have a
> GIMPLE_COND where the expression is a vector condition?


 No rationale for equality compare which would have the semantic of
 having all elements equal or not equal.  But you can't define a 
 sensible
 ordering (that HW implements) for other compare operators and you
 obviously need a single boolean result, not a vector of element 
 comparison
 results.
>>>
>>> Right.  EQ/NE only as others just don't have any real meaning.
>>>
>>>
 I've already replied that I'm fine allowing ==/!= whole-vector 
 compares.
 But one needs to check whether expansion does anything sensible
 with them (either expand to integer subreg compares or add optabs
 for the compares).
>>>

Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-06 Thread Richard Biener
On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng  wrote:
> Hi,
> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
> prevented from representing address of one memory object using address of
> another memory object.  Also as I commented in that PR, we have two possible
> fixes for this:
> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
> using another one, while not breaking PR50955.
> 2) Add iv candidates with base object stripped.  In this way, we use the
> common base-stripped part to represent all address expressions, in the form
> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>
> In terms of code generation, method 2) is at least as good as 1), actually
> better in my opinion.  The problem of 2) is we need to tell when iv
> candidates should be added for the common part and when shouldn't.  This
> issue can be generalized and described as: We know IVO tries to add
> candidates by deriving from iv uses.  One disadvantage is that candidates
> are derived from iv use independently.  It doesn't take common sub
> expression among different iv uses into consideration.  As a result,
> candidate for common sub expression is not added, while many useless
> candidates are added.
>
> As a matter of fact, candidate derived from iv use is useful only if it's
> common enough and could be shared among different uses.  A candidate is most
> likely useless if it's derived from a single use and could not be shared by
> others.  This patch works in this way by firstly recording all kinds
> candidates derived from iv uses, then adding candidates for common ones.
>
> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
> they are false alarms caused by align issues.
> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
> another irrelevant issue caused by heuristic candidate selecting algorithm.
> Unfortunately, I don't have fix to it currently.
>
> This patch may add more candidates in some cases, but generally candidates
> number is smaller because we don't need to add useless candidates now.
> Statistic data shows there are quite fewer loops with more than 30
> candidates when building spec2k6 on x86_64 using this patch.
>
> Bootstrap and test on x86_64.  I will re-test it against latest trunk on
> AArch64.  Is it OK?

+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+  const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+&& operand_equal_p (ccand1->base, ccand2->base, 0)
+&& operand_equal_p (ccand1->step, ccand2->step, 0)
+&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base));

I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
unconditionally?

+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+{
+  *slot = XNEW (struct iv_common_cand);

allocate from the IV obstack instead?  I see we do a lot of heap allocations
in IVOPTs, so we can improve that as followup as well.

We probably should empty the obstack after each processed loop.

Thanks,
Richard.


> Thanks,
> bin
>
> 2015-11-03  Bin Cheng  
>
> PR tree-optimization/52272
> * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
> (struct iv_common_cand_hasher): New struct.
> (iv_common_cand_hasher::hash): New function.
> (iv_common_cand_hasher::equal): New function.
> (struct ivopts_data): New fields, iv_common_cand_tab and
> iv_common_cands.
> (tree_ssa_iv_optimize_init): Initialize above fields.
> (record_common_cand, common_cand_cmp): New functions.
> (add_iv_candidate_derived_from_uses): New function.
> (add_iv_candidate_for_use): Record iv_common_cands derived from
> iv use in hash table, instead of adding candidates directly.
> (add_iv_candidate_for_uses): Call
> add_iv_candidate_derived_from_uses.
> (record_important_candidates): Add important candidates to iv uses'
> related_cands.  Always keep related_cands for future use.
> (try_add_cand_for): Use iv uses' related_cands.
> (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
> in struct ivopts_data, iv_common_cand_tab and iv_common_cands.


Re: [PATCH] Fix PR ipa/68035

2015-11-06 Thread Richard Biener
On Fri, Nov 6, 2015 at 10:10 AM, Martin Liška  wrote:
> Hello.
>
> Following patch triggers hash calculation of items (functions and variables)
> in situations where LTO mode is not utilized.
>
> Patch survives regression tests and bootstraps on x86_64-linux-pc.

Why does that make a difference?  Do we have direct ->hash users
that should have used get_hash ()?

> Ready for trunk?
> Thanks,
> Martin


Re: [PATCH] Fix transform_to_exit_first_loop_alt with -g

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Tom de Vries wrote:

> Hi,
> 
> This patch fixes a problem with -g compilation in
> transform_to_exit_first_loop_alt.
> 
> Consider test-case test.c:
> ...
> void
> f (int *a, int n)
> {
>   int i;
>   for (i = 0; i < n; ++i)
> a[i] = 1;
> }
> ...
> 
> If we add a "checking_verify_ssa (true, true)" call at the end of
> transform_to_exit_first_loop_alt, and we compile with "-g -O2
> -ftree-parallelize-loops=4", we run into this ICE:
> ...
> test.c: In function ‘f’:
> test.c:2:1: error: definition in block 5 does not dominate use in block 13
> for SSA_NAME: i_10 in statement:
> # DEBUG i => i_10
> test.c:2:1: internal compiler error: verify_ssa failed
> ...
> 
> Before transform_to_exit_first_loop_alt, the loop looks like:
> ...
>   :
> 
>   :
>   # ivtmp_22 = PHI <0(11), ivtmp_23(7)>
>   i_13 = ivtmp_22;
>   # DEBUG i => i_13
>   _5 = (long unsigned int) i_13;
>   _6 = _5 * 4;
>   _8 = a_7(D) + _6;
>   *_8 = 1;
>   i_10 = i_13 + 1;
>   # DEBUG i => i_10
>   # DEBUG i => i_10
>   if (ivtmp_22 < _1)
> goto ;
>   else
> goto ;
> 
>   :
>   ivtmp_23 = ivtmp_22 + 1;
>   goto ;
> ...
> 
> 
> And after transform_to_exit_first_loop_alt, it looks like:
> ...
>   :
>   goto ;
> 
>   :
>   # ivtmp_22 = PHI 
>   i_13 = ivtmp_22;
>   # DEBUG i => i_13
>   _5 = (long unsigned int) i_13;
>   _6 = _5 * 4;
>   _8 = a_7(D) + _6;
>   *_8 = 1;
>   i_10 = i_13 + 1;
>   goto ;
> 
>   :
>   # ivtmp_25 = PHI 
>   # DEBUG i => i_10
>   # DEBUG i => i_10
>   if (ivtmp_25 < _2)
> goto ;
>   else
> goto ;
> 
>   :
>   ivtmp_23 = ivtmp_22 + 1;
>   goto ;
> ...
> 
> The ICE triggers because the use of i_10 in debug insn 'DEBUG i => i_10' in bb
> 13 is no longer dominated by the defition of i_10 in bb 5.
> 
> The patch fixes the ICE by ensuring that gimple_split_block_before_cond_jump
> really splits before cond_jump, instead of after the last nondebug insn before
> cond_jump, as it does now. This behaviour also better matches the rtl
> implementation of the cfghook. Btw, note that the only user of cfghook
> split_block_before_cond_jump is transform_to_exit_first_loop_alt.
> 
> [ A similar fix for an openacc variant of this ICE was committed on the
> gomp-4_0-branch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00060.html ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?

Ok.

Richard.

> Thanks,
> - Tom

Re: Fix 61441

2015-11-06 Thread Joseph Myers
On Fri, 6 Nov 2015, Sujoy Saraswati wrote:

> > Shouldn't real_convert do this rather than the caller needing to do it?
> 
> Yes, it should be. I had started by doing this within real_convert but
> then saw that there are quite a few callers where I should add the
> check for flag_signaling_nans. This was making the patch bigger, so
> instead decided to change the caller in this particular case. I will
> try to make the change in real_convert now that we are planning to
> break the patch.

I think the general principle is:

* The caller decides whether folding is desirable (whether it would lose 
exceptions, for example).

* The real.c code is called only when the caller has decided that folding 
is desirable, and should always produce the correct output (which for a 
conversion means producing a quiet NaN from a signaling NaN).

So both places need changes, but real_convert is where the code that makes 
it a quiet NaN should go.

Another place in the patch that looks incorrect: the changes to 
fold-const-call.c calling real_powi and checking if the result is a 
signaling NaN.  The result of real_powi should never be a signaling NaN.  
Rather, real_powi should produce a quiet NaN if its input is a signaling 
NaN, and the callers should check if the argument is a signaling NaN when 
deciding whether to fold, not if the result is.

-- 
Joseph S. Myers
jos...@codesourcery.com


[gomp4, committed] Remove DEF_GOACC_BUILTIN_FNSPEC

2015-11-06 Thread Tom de Vries

[ was: Re: [gomp4, committed] Revert "Add IFN_GOACC_DATA_END_WITH_ARG" ]

On 06/11/15 13:03, Tom de Vries wrote:


Now that we've got -foffload-alias, we're no longer concerned about
GOACC builtins being alias analysis optimization barriers, so the
IFN_GOACC_DATA_END_WITH_ARG patch has become obsolete.


Likewse, DEF_GOACC_BUILTIN_FNSPEC has become obsolete.

This patch removes DEF_GOACC_BUILTIN_FNSPEC and associated code.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove DEF_GOACC_BUILTIN_FNSPEC

2015-11-06  Tom de Vries  

	* builtins.def (DEF_GOACC_BUILTIN_FNSPEC): Remove #undef and #define.
	* omp-builtins.def: Remove DEF_GOACC_BUILTIN_FNSPEC.

	* f95-lang.c (gfc_init_builtin_functions): Remove.
	(DEF_GOACC_BUILTIN_FNSPEC): Remove #undef and #define.
---
 gcc/builtins.def   |  7 ---
 gcc/fortran/f95-lang.c | 32 
 gcc/omp-builtins.def   | 24 ++--
 3 files changed, 10 insertions(+), 53 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index d60b037..886b45c 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -174,13 +174,6 @@ along with GCC; see the file COPYING3.  If not see
 	   false, true, true, ATTRS, false, \
 	   (flag_openacc \
 		|| flag_offload_abi != OFFLOAD_ABI_UNSET))
-/* Like DEF_GOACC_BUILTIN, but with an fn spec attribute.
-   KLUDGE: The ATTRS field needs to be a combination of ATTRS2 and FNSPEC.
-   In this file, we use the ATTRS field, and in gcc/fortran/f95-lang.c, we use
-   ATTRS2 and FNSPEC instead.  */
-#undef DEF_GOACC_BUILTIN_FNSPEC
-#define DEF_GOACC_BUILTIN_FNSPEC(ENUM, NAME, TYPE, ATTRS, ATTRS2, FNSPEC) \
-  DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS)
 #undef DEF_GOACC_BUILTIN_COMPILER
 #define DEF_GOACC_BUILTIN_COMPILER(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 56a30ca..a63ebb3 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -563,27 +563,6 @@ gfc_define_builtin (const char *name, tree type, enum built_in_function code,
   set_builtin_decl (code, decl, true);
 }
 
-/* Like gfc_define_builtin, but with fn spec attribute FNSPEC.  */
-
-static void ATTRIBUTE_UNUSED
-gfc_define_builtin_with_spec (const char *name, tree fntype,
-			  enum built_in_function code,
-			  const char *library_name, int attr,
-			  const char *fnspec)
-{
-  if (fnspec)
-{
-  /* Code copied from build_library_function_decl_1.  */
-  tree attr_args = build_tree_list (NULL_TREE,
-	build_string (strlen (fnspec), fnspec));
-  tree attrs = tree_cons (get_identifier ("fn spec"),
-			  attr_args, TYPE_ATTRIBUTES (fntype));
-  fntype = build_type_attribute_variant (fntype, attrs);
-}
-
-  gfc_define_builtin (name, fntype, code, library_name, attr);
-}
-
 #define DO_DEFINE_MATH_BUILTIN(code, name, argtype, tbase) \
 gfc_define_builtin ("__builtin_" name "l", tbase##longdouble[argtype], \
 			BUILT_IN_ ## code ## L, name "l", \
@@ -1236,12 +1215,6 @@ gfc_init_builtin_functions (void)
 #define DEF_GOACC_BUILTIN(code, name, type, attr) \
   gfc_define_builtin ("__builtin_" name, builtin_types[type], \
 			  code, name, attr);
-/* Like DEF_GOACC_BUILTIN, but with an fn spec attribute.
-   KLUDGE: See gcc/builtins.def DEF_GOACC_BUILTIN_FNSPEC comment.  */
-#undef DEF_GOACC_BUILTIN_FNSPEC
-#define DEF_GOACC_BUILTIN_FNSPEC(code, name, type, attr, attr2, fnspec)	\
-  gfc_define_builtin_with_spec ("__builtin_" name, builtin_types[type], \
-code, name, attr2, fnspec);
 #undef DEF_GOACC_BUILTIN_COMPILER
 #define DEF_GOACC_BUILTIN_COMPILER(code, name, type, attr) \
   gfc_define_builtin (name, builtin_types[type], code, name, attr);
@@ -1249,7 +1222,6 @@ gfc_init_builtin_functions (void)
 #define DEF_GOMP_BUILTIN(code, name, type, attr) /* ignore */
 #include "../omp-builtins.def"
 #undef DEF_GOACC_BUILTIN
-#undef DEF_GOACC_BUILTIN_FNSPEC
 #undef DEF_GOACC_BUILTIN_COMPILER
 #undef DEF_GOMP_BUILTIN
 }
@@ -1258,9 +1230,6 @@ gfc_init_builtin_functions (void)
 {
 #undef DEF_GOACC_BUILTIN
 #define DEF_GOACC_BUILTIN(code, name, type, attr) /* ignore */
-#undef DEF_GOACC_BUILTIN_FNSPEC
-#define DEF_GOACC_BUILTIN_FNSPEC(code, name, type, attr, attr2, fnspec)	\
-  /* Ignore.  */
 #undef DEF_GOACC_BUILTIN_COMPILER
 #define DEF_GOACC_BUILTIN_COMPILER(code, name, type, attr)  /* ignore */
 #undef DEF_GOMP_BUILTIN
@@ -1269,7 +1238,6 @@ gfc_init_builtin_functions (void)
 			  code, name, attr);
 #include "../omp-builtins.def"
 #undef DEF_GOACC_BUILTIN
-#undef DEF_GOACC_BUILTIN_FNSPEC
 #undef DEF_GOACC_BUILTIN_COMPILER
 #undef DEF_GOMP_BUILTIN
 }
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 6908f94..1504a48 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -21,7 +21,6 @@ along with GCC; see the file COPYING3.  If not see
 /* Before including this file, you should define a macro:
 
  DEF_GOAC

Re: [ping] Fix PR debug/66728

2015-11-06 Thread Richard Biener
On Fri, Nov 6, 2015 at 2:34 AM, Mike Stump  wrote:
> On Nov 5, 2015, at 4:32 AM, Richard Biener  wrote:
>> No idea on location lists but maybe this means we should just use the
>> maximum supported integer mode for CONST_WIDE_INTs?
>
> Ah, yeah, that sounds like a fine idea.  Below is that version.  I snuck in 
> one more change, as it was annoying me, and it is a regression from gcc-4.8.  
> It has this effect:
>
> @@ -55,7 +55,7 @@ test:
> .long   0x72# DW_AT_type
> .byte   0x10
> .quad   0   # DW_AT_const_value
> -   .quad   0x8000  # (null)
> +   .quad   0x8000  #
> .byte   0   # end of children of DIE 0x2d
> .uleb128 0x4# (DIE (0x6b) DW_TAG_base_type)
> .byte   0x10# DW_AT_byte_size
>
> This version has the added benefit of reducing all wide_ints to be so 
> shortened.  We do this by changing get_full_len, which changes the world.
>
> If there are no substantial reasons to not check it in now, I’d like to 
> proceed and get it checked in.  People can refine it further in tree if they 
> want.  Any objections?

Ok with a changelog entry and bootstrap/regtest.

Thanks,
Richard.

>
> Index: dwarf2out.c
> ===
> --- dwarf2out.c (revision 229720)
> +++ dwarf2out.c (working copy)
> @@ -368,12 +368,14 @@
>  #endif
>
>  /* Get the number of HOST_WIDE_INTs needed to represent the precision
> -   of the number.  */
> +   of the number.  Some constants have a large uniform precision, so
> +   we get the precision needed for the actual value of the number.  */
>
>  static unsigned int
>  get_full_len (const wide_int &op)
>  {
> -  return ((op.get_precision () + HOST_BITS_PER_WIDE_INT - 1)
> +  int prec = wi::min_precision (op, UNSIGNED);
> +  return ((prec + HOST_BITS_PER_WIDE_INT - 1)
>   / HOST_BITS_PER_WIDE_INT);
>  }
>
> @@ -9010,7 +9012,7 @@
> {
>   dw2_asm_output_data (l, a->dw_attr_val.v.val_wide->elt (i),
>"%s", name);
> - name = NULL;
> + name = "";
> }
> else
>   for (i = 0; i < len; ++i)
> @@ -9017,7 +9019,7 @@
> {
>   dw2_asm_output_data (l, a->dw_attr_val.v.val_wide->elt (i),
>"%s", name);
> - name = NULL;
> + name = "";
> }
>   }
>   break;
> @@ -15593,8 +15595,13 @@
>return true;
>
>  case CONST_WIDE_INT:
> -  add_AT_wide (die, DW_AT_const_value,
> -  std::make_pair (rtl, GET_MODE (rtl)));
> +  {
> +   wide_int w1 = std::make_pair (rtl, MAX_MODE_INT);
> +   unsigned int prec = MIN (wi::min_precision (w1, UNSIGNED),
> +(unsigned int)CONST_WIDE_INT_NUNITS (rtl) * 
> HOST_BITS_PER_WIDE_INT);
> +   wide_int w = wi::zext (w1, prec);
> +   add_AT_wide (die, DW_AT_const_value, w);
> +  }
>return true;
>
>  case CONST_DOUBLE:
> Index: rtl.h
> ===
> --- rtl.h   (revision 229720)
> +++ rtl.h   (working copy)
> @@ -2086,6 +2086,7 @@
>  inline unsigned int
>  wi::int_traits ::get_precision (const rtx_mode_t &x)
>  {
> +  gcc_checking_assert (x.second != BLKmode && x.second != VOIDmode);
>return GET_MODE_PRECISION (x.second);
>  }
>


Re: libgo patch committed: Update to Go 1.5 release

2015-11-06 Thread Rainer Orth
Ian Lance Taylor  writes:

> I have committed a patch to libgo to update it to the Go 1.5 release.
>
> As usual for libgo updates, the actual patch is too large to attach to
> this e-mail message.  I've attached the changes to the gccgo-specific
> files.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.
>
> This may cause trouble on non-GNU/Linux operating systems.  Please let
> me know about any problems you encounter.

It does indeed (first tried on i386-pc-solaris2.10):

* 

/vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c: In function 
'__go_ioctl':
/vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c:63:10: error: implicit 
declaration of function 'ioctl' [-Werror=implicit-function-declaration]
   return ioctl (d, request, arg);
  ^

  Needs , the following patch works:

diff --git a/libgo/runtime/go-varargs.c b/libgo/runtime/go-varargs.c
--- a/libgo/runtime/go-varargs.c
+++ b/libgo/runtime/go-varargs.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* The syscall package calls C functions.  The Go compiler can not
represent a C varargs functions.  On some systems it's important

* 

/vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_bsd.go:107:7: error: 
incompatible types in assignment (cannot use type int as type Pid_t)
r1 = raw_getpid()
   ^

I can cast to Pid_t and this works.  The underlying error to me seems
that raw_getpid the in the generated libcalls.go is wrong, casting
c_getpid return value to int while pid_t can be long.

* 

/vol/gcc/src/hg/trunk/local/libgo/go/net/hook_cloexec.go:13:70: error: 
reference to undefined identifier 'syscall.Accept4'
  accept4Func func(int, int) (int, syscall.Sockaddr, error) = syscall.Accept4
  ^
  
No accept4 on Solaris (and certainly other systems, thence configure
test), but used unconditionally.

* 

/vol/gcc/src/hg/trunk/local/libgo/go/net/sendfile_solaris.go:78:22: error: 
reference to undefined identifier 'syscall.Sendfile'
   n, err1 := syscall.Sendfile(dst, src, &pos1, n)
  ^

Only in go/syscall/libcall_linux.go!?

* 

/vol/gcc/src/hg/trunk/local/libgo/go/net/tcpsockopt_solaris.go:34:103: error: 
reference to undefined identifier 'syscall.TCP_KEEPALIVE_THRESHOLD'
  return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(fd.sysfd, 
syscall.IPPROTO_TCP, syscall.TCP_KEEPALIVE_THRESHOLD, msecs))

   ^
  
Not in Solaris 10, only Solaris 11 and 12 have it.

  Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR tree-optimization/68145] Fix vectype computation in vectorizable_operation

2015-11-06 Thread Richard Biener
On Thu, Nov 5, 2015 at 5:16 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch fixes a way vectype is computed in vectorizable_operation.  
> Currently op0 is always used to compute vectype.  If it is a loop invariant 
> then its type is used to get vectype which is impossible for booleans 
> requiring a context to correctly compute vectype.  This patch uses output 
> vectype in such cases, this should always work fine for operations on 
> booleans.  Bootstrapped on x86_64-unknown-linux-gnu.  Regression tesing is in 
> progress.  Ok if no regressions?

Ok.
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-05  Ilya Enkovich  
>
> PR tree-optimization/68145
> * tree-vect-stmts.c (vectorizable_operation): Fix
> determination for booleans.
>
> gcc/testsuite/
>
> 2015-11-05  Ilya Enkovich  
>
> PR tree-optimization/68145
> * g++.dg/vect/pr68145.cc: New test.
>
>
> diff --git a/gcc/testsuite/g++.dg/vect/pr68145.cc 
> b/gcc/testsuite/g++.dg/vect/pr68145.cc
> new file mode 100644
> index 000..51e663a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr68145.cc
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +
> +struct A {
> +  bool operator()(int p1, int p2) { return p1 && p2; }
> +};
> +class B {
> +public:
> +  bool *cbegin();
> +  bool *cend();
> +};
> +template  void operator&&(B p1, T p2) {
> +  B a;
> +  arrayContTransform(p1, p2, a, A());
> +}
> +
> +template  +  typename _BinaryOperation>
> +void myrtransform(_InputIterator1 p1, _OutputIterator p2, T p3,
> +  _BinaryOperation p4) {
> +  _InputIterator1 b;
> +  for (; b != p1; ++b, ++p2)
> +*p2 = p4(*b, p3);
> +}
> +
> +template 
> +void arrayContTransform(L p1, R p2, RES p3, BinaryOperator p4) {
> +  myrtransform(p1.cend(), p3.cbegin(), p2, p4);
> +}
> +
> +class C {
> +public:
> +  B getArrayBool();
> +};
> +class D {
> +  B getArrayBool(const int &);
> +  C lnode_p;
> +};
> +bool c;
> +B D::getArrayBool(const int &) { lnode_p.getArrayBool() && c; }
> +
> +// { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
> i?86-*-* x86_64-*-* } } } }
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index ae14075..9aa2d4e 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -4697,7 +4697,26 @@ vectorizable_operation (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>/* If op0 is an external or constant def use a vector type with
>   the same size as the output vector type.  */
>if (!vectype)
> -vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
> +{
> +  /* For boolean type we cannot determine vectype by
> +invariant value (don't know whether it is a vector
> +of booleans or vector of integers).  We use output
> +vectype because operations on boolean don't change
> +type.  */
> +  if (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE)
> +   {
> + if (TREE_CODE (TREE_TYPE (scalar_dest)) != BOOLEAN_TYPE)
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"not supported operation on bool value.\n");
> + return false;
> +   }
> + vectype = vectype_out;
> +   }
> +  else
> +   vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
> +}
>if (vec_stmt)
>  gcc_assert (vectype);
>if (!vectype)


Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-06 Thread Richard Biener
On Mon, Nov 2, 2015 at 4:24 PM, Yuri Rumyantsev  wrote:
> Hi Richard,
>
> I've come back to this optimization and try to implement your proposal
> for comparison:
>> Btw, you didn't try the simpler alternative of
>>
>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>> build2 (EQ_EXPR, boolean_type_node,
>>  build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>
>> ?  That is, use the GIMPLE level equivalent of
>>  (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>
> using the following code:
>
>   vectype = TREE_TYPE (mask);
>   ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)),
> MODE_INT, 0);
>   ext_type = lang_hooks.types.type_for_mode (ext_mode , 1);
>
> but I've got zero type for it. Should I miss something?

Use ext_type = build_nonstandard_integer_type (GET_MODE_PRECISION
(ext_mode), 1);

Richard.

> Any help will be appreciated.
> Yuri.
>
>
> 2015-08-13 14:40 GMT+03:00 Richard Biener :
>> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev  wrote:
>>> Hi Richard,
>>>
>>> Did you have a chance to look at updated patch?
>>
>> Having a quick look now.  Btw, you didn't try the simpler alternative of
>>
>>  tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>>  build2 (EQ_EXPR, boolean_type_node,
>>build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>
>> ?  That is, use the GIMPLE level equivalent of
>>
>>  (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>>
>> ?  That should be supported by the expander already, though again not sure if
>> the target(s) have compares that match this.
>>
>> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction
>> on EQ/NE_EXPR
>> is missing.  Operand type equality is tested anyway.
>>
>> Why do you need to restrict forward_propagate_into_comparison_1?
>>
>> Otherwise this looks better, but can you try with the VIEW_CONVERT as well?
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev :
 HI All,

 Here is updated patch which implements Richard proposal to use vector
 comparison with boolean result instead of target hook. Support for it
 was added to ix86_expand_branch.

 Any comments will be appreciated.

 Bootstrap and regression testing did not show any new failures.

 ChangeLog:
 2015-08-06  Yuri Rumyantsev  

 * config/i386/i386.c (ix86_expand_branch): Implement vector
 comparison with boolean result.
 * config/i386/sse.md (define_expand "cbranch4): Add define
 for vector comparion.
 * fold-const.c (fold_relational_const): Add handling of vector
 comparison with boolean result.
 * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM.
 * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros.
 * tree-cfg.c (verify_gimple_comparison): Add test for vector
 comparion with boolean result.
 * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not
 propagate vector comparion with boolean result.
 * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
 has_mask_store field of vect_info.
 * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h.
 (is_valid_sink): New function.
 (optimize_mask_stores): New function.
 (vectorize_loops): Invoke optimaze_mask_stores for loops having masked
 stores.
 * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
 correspondent macros.

 gcc/testsuite/ChangeLog:
 * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.


 2015-07-27 11:48 GMT+03:00 Richard Biener :
> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law  wrote:
>> On 07/24/2015 03:16 AM, Richard Biener wrote:

 Is there any rationale given anywhere for the transformation into
 conditional expressions?  ie, is there any reason why we can't have a
 GIMPLE_COND where the expression is a vector condition?
>>>
>>>
>>> No rationale for equality compare which would have the semantic of
>>> having all elements equal or not equal.  But you can't define a sensible
>>> ordering (that HW implements) for other compare operators and you
>>> obviously need a single boolean result, not a vector of element 
>>> comparison
>>> results.
>>
>> Right.  EQ/NE only as others just don't have any real meaning.
>>
>>
>>> I've already replied that I'm fine allowing ==/!= whole-vector compares.
>>> But one needs to check whether expansion does anything sensible
>>> with them (either expand to integer subreg compares or add optabs
>>> for the compares).
>>
>> Agreed, EQ/NE for whole vector compares only would be fine for me too 
>> under
>> the same conditions.
>
> Btw, you can already do this on GIMPLE by doing
>
>   TImode vec_as_int = VIEW_CONVERT_EXPR  (vec_2);
>   if (vec_as_int == 0)
> ...
>

Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-06 Thread Richard Biener
On Thu, Nov 5, 2015 at 2:33 PM, Alan Lawrence  wrote:
> On 03/11/15 13:39, Richard Biener wrote:
>> On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence  wrote:
>>>
>>> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
>>> split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
>>
>> As said on IRC it should be 4 1 1 1 and 4 1 1 1.
>
> Right. And so, if we have a twelve-element group (0 1 1 1 1 1 1 1 1 1 1 1), by
> the time it became three subgroups, these should each be (8 1 1 1), via an
> intermediate stage of (4 1 1 1 1 1 1 1) (8 1 1 1). This leads to the code in
> the attached patch.
>
>> No, I don't think we can split load groups that way.  So I think if
>> splitting store
>> groups works well (with having larger load groups) then that's the way to go
>> (even for loop vect).
>
> Well, slp-11a.c still fails if I enable the splitting for non-BB SLP; I was
> thinking this was because I needed to split the load groups too, but maybe not
> - maybe this is a separate bug/issue with hybrid SLP. Whatever the reason, I
> still think splitting groups in hybrid SLP is another patch. (Do we really 
> want
> to put off handling the basic-block case until it works for hybrid SLP as 
> well?
> IMHO I would think not.)

No.

> It sounds as if the approach of restricting splitting
> to store groups with appropriate asserts GROUP_GAP == 1 is thus the right 
> thing
> to do in the longer term too (hence, renamed vect_split_slp_store_group to
> emphasize that) - at least until we remove that restriction on SLP generally.
>
> Bootstrapped + check-{gcc,g++,gfortran} on x86_64, AArch64, ARM.
>
> Re. the extra skipping loop, I think it would theoretically be possible for 
> the
> recursive call to vect_slp_analyze to succeed on an element where the original
> failed, because it may have more num_permutes remaining (after skipping over
> the first vector). So there's a second argument (besides code complexity) for
> dropping that part?
>
> gcc/ChangeLog:
>
> * tree-vect-slp.c (vect_split_slp_store_group): New.
> (vect_analyze_slp_instance): Recurse on subgroup(s) if
> vect_build_slp_tree fails during basic block SLP.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic.
> * gcc.dg/vect/bb-slp-subgroups-1.c: New.
> * gcc.dg/vect/bb-slp-subgroups-2.c: New.
> * gcc.dg/vect/bb-slp-subgroups-3.c: New.
> * gcc.dg/vect/bb-slp-subgroups-4.c: New.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-7.c   | 10 +--
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 
>  gcc/tree-vect-slp.c| 87 
> +-
>  6 files changed, 259 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> index ab54a48..b8bef8c 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> @@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y)
>unsigned int *pout = &out[0];
>unsigned int a0, a1, a2, a3;
>
> -  /* Non isomorphic.  */
> +  /* Non isomorphic, even 64-bit subgroups.  */
>a0 = *pin++ + 23;
> -  a1 = *pin++ + 142;
> +  a1 = *pin++ * 142;
>a2 = *pin++ + 2;
>a3 = *pin++ * 31;
> -
> +
>*pout++ = a0 * x;
>*pout++ = a1 * y;
>*pout++ = a2 * x;
> @@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y)
>
>/* Check results.  */
>if (out[0] != (in[0] + 23) * x
> -  || out[1] != (in[1] + 142) * y
> +  || out[1] != (in[1] * 142) * y
>|| out[2] != (in[2] + 2) * x
>|| out[3] != (in[3] * 31) * y)
>  abort();
> @@ -47,4 +47,4 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
> -
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> new file mode 100644
> index 000..39c23c3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> @@ -0,0 +1,44 @@
> +/* { dg-require-effective-target vect_int } */
> +/* PR tree-optimization/67682.  */
> +
> +#include "tree-vect.h"
> +
> +int __attribute__((__aligned__(8))) a[8];
> +int __attribute__((__aligned__(8))) b[4];
> +
> +__attribute__ ((noinline)) void
> +test ()
> +{
> +a[0] = b[0];
> +a[1] = b[1];
> +a[2] = b[2];
> +a[3] = b[3];
> +a[4] = 0;
> +a[5] = 0;
> +a[6] = 0;
> +a[7] = 0;
> +}
> +
> 

Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-06 Thread Rainer Orth
Martin Sebor  writes:

>> If we use gcc_checking_assert it won't fire in release builds; let's go
>> with that.
>
> Okay. Attached is an updated patch with that change.

Unfortunately, this breaks i386-pc-solaris2.10 bootstrap:

/vol/gcc/src/hg/trunk/local/gcc/cp/init.c: In function 'void 
warn_placement_new_too_small(tree, tree, tree, tree)':
/vol/gcc/src/hg/trunk/local/gcc/cp/init.c:2454:17: error: format '%lu' expects 
argument of type 'long unsigned int', but argument 5 has type 'long long 
unsigned int' [-Werror=format=]
  bytes_avail);
 ^

Printing an unsigned HOST_WIDE_INT with %lu in one case, but %wu in the
other seems like a simple typo, so the following fixes bootstrap for me:

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2447,7 +2447,7 @@ warn_placement_new_too_small (tree type,
 			  "%<%T [%wu]%> and size %qwu in a region of type %qT "
 			  "and size %qwi"
 			  : "placement new constructing an object of type "
-			  "%<%T [%lu]%> and size %qwu in a region of type %qT "
+			  "%<%T [%wu]%> and size %qwu in a region of type %qT "
 			  "and size at most %qwu",
 			  type, tree_to_uhwi (nelts), bytes_need,
 			  TREE_TYPE (oper),

Rainer


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-06 Thread Andreas Schwab
I see this failure on m68k:

FAIL: g++.dg/warn/Wplacement-new-size.C  -std=gnu++11 (test for excess errors)
Excess errors:
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:189:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:191:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:194:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:198:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]

That appears to be a 32-bit problem, the test also fails here on x86-64
with -m32 <http://gcc.gnu.org/ml/gcc-testresults/2015-11/msg00522.html>
or here on powerpc
<http://gcc.gnu.org/ml/gcc-testresults/2015-11/msg00520.html>

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [hsa 10/12] HSAIL BRIG description header file (hopefully not a licensing issue)

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 12:29 PM, Bernd Schmidt wrote:

David Cc'ed so he can take the necessary steps.




Initially, I have created the file by copying out pieces of PDF
documentation but the latest version of the file (describing final
HSAIL 1.0) is actually taken from the HSAIL (dis)assembler developed
by HSA foundation and released by "University of Illinois/NCSA Open
Source License."


Actually there's not just the question of license, but also of copyright 
assignment.



Bernd


Re: [PATCH 5/6]tree-sra.c: Fix completely_scalarize for negative array indices

2015-11-06 Thread Richard Biener
On Thu, Nov 5, 2015 at 2:22 PM, Alan Lawrence  wrote:
> On 30/10/15 10:54, Eric Botcazou wrote:
>> On 30/10/15 10:44, Richard Biener wrote:
>>>
>>> I think you want to use wide-ints here and
>>>
>>> wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
>>> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>>> wide_int maxidx = ...
>>>
>>> you can then simply iterate minidx with ++ and do the final compare
>>> against maxidx
>>> with while (++idx <= maxidx).  For the array ref index we want to use
>>> TYPE_DOMAIN
>>> as type as well, not size_int.  Thus wide_int_to_tree (TYPE_DOMAIN 
>>> (...)..idx).
> [...]
>> But using offset_int should be OK, see for example get_ref_base_and_extent.
>>
>
> Here's a patch using offset_int. (Not as easy to construct as wide_int::from,
> the sign-extend is what appeared to be done elsewhere that constructs
> offset_ints).
>
> Tested by bootstrap+check-{gcc,g++,ada,fortran} with the rest of the patchset
> (which causes an array[-1..1] to be completely scalarized, among others), on
> x86_64 and ARM.
>
> I don't have a test without all that (such would have to be in Ada, and 
> trigger
> SRA of such an array but not involving the constant pool); is it OK without?

Ok.

Richard.

> gcc/ChangeLog:
>
> * tree-sra.c (completely_scalarize): Properly handle negative array
> indices using offset_int.
> ---
>  gcc/tree-sra.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index e15df1f..6168a7e 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1010,18 +1010,25 @@ completely_scalarize (tree base, tree decl_type, 
> HOST_WIDE_INT offset, tree ref)
> if (maxidx)
>   {
> gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> -   /* MINIDX and MAXIDX are inclusive.  Try to avoid overflow.  */
> -   unsigned HOST_WIDE_INT lenp1 = tree_to_shwi (maxidx)
> -   - tree_to_shwi (minidx);
> -   unsigned HOST_WIDE_INT idx = 0;
> -   do
> +   tree domain = TYPE_DOMAIN (decl_type);
> +   /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> +  DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
> +   offset_int idx = wi::to_offset (minidx);
> +   offset_int max = wi::to_offset (maxidx);
> +   if (!TYPE_UNSIGNED (domain))
>   {
> -   tree nref = build4 (ARRAY_REF, elemtype, ref, size_int (idx),
> +   idx = wi::sext (idx, TYPE_PRECISION (domain));
> +   max = wi::sext (max, TYPE_PRECISION (domain));
> + }
> +   for (int el_off = offset; wi::les_p (idx, max); ++idx)
> + {
> +   tree nref = build4 (ARRAY_REF, elemtype,
> +   ref,
> +   wide_int_to_tree (domain, idx),
> NULL_TREE, NULL_TREE);
> -   int el_off = offset + idx * el_size;
> scalarize_elem (base, el_off, el_size, nref, elemtype);
> +   el_off += el_size;
>   }
> -   while (++idx <= lenp1);
>   }
>}
>break;
> --
> 1.9.1
>


Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Nikolai Bozhenov

On 10/28/2015 01:07 PM, Kyrill Tkachov wrote:

Hi all,

This RTL checking error occurs on aarch64 in 
aarch_accumulator_forwarding when processing an msubsi insn

with subregs:
(insn 15 14 16 3 (set (reg/v:SI 78 [ i ])
(minus:SI (subreg:SI (reg/v:DI 76 [ aul ]) 0)
(mult:SI (subreg:SI (reg:DI 83) 0)
(subreg:SI (reg:DI 75 [ _20 ]) 0 schedice.c:10 357 
{*msubsi}


The register_operand predicate for that pattern allows subregs (I 
think correctly).
The code in aarch_accumulator_forwarding doesn't take that into 
account and ends up

taking a REGNO of a SUBREG, causing a checking error.

This patch fixes that by stripping the subregs off the accumulator rtx 
before

checking that the inner expression is a REG and taking its REGNO.

The testcase now works fine with an aarch64-none-elf toolchain 
configure for RTL checking.


The testcase is taken verbatim from the BZ entry for PR 68088.
Since this function is shared between arm and aarch64 I've 
bootstrapped and tested it on both

and I'll need ok's for both ports.

Ok for trunk?

Thanks,
Kyrill

2015-10-28  Kyrylo Tkachov  

PR target/68088
* config/arm/aarch-common.c (aarch_strip_subreg): New function.
(aarch_accumulator_forwarding): Strip subregs from accumulator rtx
when appropriate.

2015-10-28  Kyrylo Tkachov  

* gcc.target/aarch64/pr68088_1.c: New test.


Hi!

I faced the same issue but I had somewhat different RTL for the consumer:

(insn 20 15 21 2 (set (reg/i:SI 0 r0)
(minus:SI (subreg:SI (reg:DI 117) 4)
(mult:SI (reg:SI 123)
(reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?

Thanks,
Nikolai


Re: [PATCH] Fix PR68067

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Alan Lawrence wrote:

> On 06/11/15 10:39, Richard Biener wrote:
> > > ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error:
> > > location
> > > references block not in block tree
> > > l1_279 = PHI <1(28), l1_299(33)>
> > 
> > ^^^
> > 
> > this is the error to look at!  It means that the GC heap will be corrupted
> > quite easily.
> 
> Thanks, I'll have a go at that.
> 
> -fdump-tree-alias is also suspicious, hinting that that does more than just
> print. (How many bugs here??)

Well, it only allocates some more heap memory for extra debug verbosity.
I think this can happen with other dumps as well.

> > Interesting would be for which pass this happens - just print
> > *pass at this point.
> 
> FWIW - unswitch. (A long time after -fdump-tree-alias!)

Ah, unswitch again... :/

> Cheers, Alan
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix PR68067

2015-11-06 Thread Alan Lawrence

On 06/11/15 10:39, Richard Biener wrote:

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
references block not in block tree
l1_279 = PHI <1(28), l1_299(33)>


^^^

this is the error to look at!  It means that the GC heap will be corrupted
quite easily.


Thanks, I'll have a go at that.

-fdump-tree-alias is also suspicious, hinting that that does more than just 
print. (How many bugs here??)



Interesting would be for which pass this happens - just print
*pass at this point.


FWIW - unswitch. (A long time after -fdump-tree-alias!)

Cheers, Alan



[PATCH] Fix memory leaks

2015-11-06 Thread Richard Biener

A few, spotted with valgrind.  One is even mine ;)

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-11-06  Richard Biener  

* tree-ssa-sccvn.c (class sccvn_dom_walker): Add destructor.
* lra.c (init_reg_info): Truncate copy_vec instead of
re-allocating a new one and leaking the old.
* ipa-inline-analysis.c (estimate_function_body_sizes): Free
bb_infos vec.
* sched-deps.c (sched_deps_finish): Free the dn/dl pools.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 229842)
--- gcc/tree-ssa-sccvn.c(working copy)
*** class sccvn_dom_walker : public dom_walk
*** 4154,4159 
--- 4199,4205 
  public:
sccvn_dom_walker ()
  : dom_walker (CDI_DOMINATORS), fail (false), cond_stack (vNULL) {}
+   ~sccvn_dom_walker ();
  
virtual void before_dom_children (basic_block);
virtual void after_dom_children (basic_block);
*** public:
*** 4168,4173 
--- 4214,4224 
  cond_stack;
  };
  
+ sccvn_dom_walker::~sccvn_dom_walker ()
+ {
+   cond_stack.release ();
+ }
+ 
  /* Record a temporary condition for the BB and its dominated blocks.  */
  
  void
Index: gcc/ipa-inline-analysis.c
===
*** gcc/ipa-inline-analysis.c   (revision 229842)
--- gcc/ipa-inline-analysis.c   (working copy)
*** estimate_function_body_sizes (struct cgr
*** 2853,2858 
--- 2853,2859 
inline_summaries->get (node)->self_time = time;
inline_summaries->get (node)->self_size = size;
nonconstant_names.release ();
+   fbi.bb_infos.release ();
if (opt_for_fn (node->decl, optimize))
  {
if (!early)
Index: gcc/sched-deps.c
===
*** gcc/sched-deps.c(revision 229842)
--- gcc/sched-deps.c(working copy)
*** void
*** 4092,4100 
  sched_deps_finish (void)
  {
gcc_assert (deps_pools_are_empty_p ());
!   dn_pool->release_if_empty ();
dn_pool = NULL;
-   dl_pool->release_if_empty ();
dl_pool = NULL;
  
h_d_i_d.release ();
--- 4092,4100 
  sched_deps_finish (void)
  {
gcc_assert (deps_pools_are_empty_p ());
!   delete dn_pool;
!   delete dl_pool;
dn_pool = NULL;
dl_pool = NULL;
  
h_d_i_d.release ();
Index: gcc/lra.c
===
--- gcc/lra.c   (revision 229843)
+++ gcc/lra.c   (working copy)
@@ -1293,7 +1293,7 @@ init_reg_info (void)
   lra_reg_info = XNEWVEC (struct lra_reg, reg_info_size);
   for (i = 0; i < reg_info_size; i++)
 initialize_lra_reg_info_element (i);
-  copy_vec.create (100);
+  copy_vec.truncate (0);
 }
 
 


[PATCH] Fix object init in alloc-pool.h

2015-11-06 Thread Richard Biener

The previous allocator never initialized objects thus switch to
default initialization.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

Index: gcc/alloc-pool.h
===
--- gcc/alloc-pool.h(revision 229804)
+++ gcc/alloc-pool.h(working copy)
@@ -480,7 +480,7 @@ public:
   inline T *
   allocate () ATTRIBUTE_MALLOC
   {
-return ::new (m_allocator.allocate ()) T ();
+return ::new (m_allocator.allocate ()) T;
   }
 
   inline void


[gomp4, committed] Revert "Add IFN_GOACC_DATA_END_WITH_ARG"

2015-11-06 Thread Tom de Vries

Hi,

I've reverted the patch that added IFN_GOACC_DATA_END_WITH_ARG ( 
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02661.html ).


The patch attempted to fix a test failure, while at the same time 
keeping the GOACC_data_start fnspec attributes to prevent it from 
becoming an alias analysis optimization barrier.


Now that we've got -foffload-alias, we're no longer concerned about 
GOACC builtins being alias analysis optimization barriers, so the 
IFN_GOACC_DATA_END_WITH_ARG patch has become obsolete.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Revert "Add IFN_GOACC_DATA_END_WITH_ARG"

2015-10-05  Tom de Vries  

	revert:
	2015-05-28  Tom de Vries  

	PR tree-optimization/65419
	* cfgexpand.c (pass_data_expand): Add PROP_gimple_lompifn to
	properties_required field.
	* gimplify.c (gimplify_omp_workshare): Use IFN_GOACC_DATA_END_WITH_ARG
	instead of BUILT_IN_GOACC_DATA_END.  Clear PROP_gimple_lompifn in
	curr_properties.
	(gimplify_function_tree): Tentatively set PROP_gimple_lompifn in
	curr_properties.
	* internal-fn.c (expand_GOACC_DATA_END_WITH_ARG): New dummy function.
	* internal-fn.def (GOACC_DATA_END_WITH_ARG): New DEF_INTERNAL_FN.
	* omp-low.c (lower_omp_target): Set argument of GOACC_DATA_END_WITH_ARG.
	(pass_data_late_lower_omp): New pass_data.
	(pass_late_lower_omp): New pass.
	(pass_late_lower_omp::gate, pass_late_lower_omp::execute)
	(make_pass_late_lower_omp): New function.
	* passes.def: Add pass_late_lower_omp.
	* tree-inline.c (expand_call_inline): Handle PROP_gimple_lompifn.
	* tree-pass.h (PROP_gimple_lompifn): Add define.

	* testsuite/libgomp.oacc-c-c++-common/goacc-data-end.c: New test.
---
 gcc/cfgexpand.c|  3 +-
 gcc/gimplify.c | 25 ++-
 gcc/internal-fn.c  |  9 ---
 gcc/internal-fn.def|  1 -
 gcc/omp-low.c  | 86 +-
 gcc/passes.def |  1 -
 gcc/tree-inline.c  | 16 ++--
 gcc/tree-pass.h|  2 -
 .../libgomp.oacc-c-c++-common/goacc-data-end.c | 67 -
 9 files changed, 14 insertions(+), 196 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/goacc-data-end.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index ca52d3d..bfbc958 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6060,8 +6060,7 @@ const pass_data pass_data_expand =
   ( PROP_ssa | PROP_gimple_leh | PROP_cfg
 | PROP_gimple_lcx
 | PROP_gimple_lvec
-| PROP_gimple_lva
-| PROP_gimple_lompifn), /* properties_required */
+| PROP_gimple_lva), /* properties_required */
   PROP_rtl, /* properties_provided */
   ( PROP_ssa | PROP_trees ), /* properties_destroyed */
   0, /* todo_flags_start */
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6283f0c..a5e28b4 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8960,32 +8960,20 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
 	pop_gimplify_context (NULL);
   if (ort == ORT_TARGET_DATA)
 	{
+	  enum built_in_function end_ix;
 	  switch (TREE_CODE (expr))
 	{
 	case OACC_DATA:
-	  /* Rather than building a call to BUILT_IN_GOACC_DATA_END, we use
-		 this ifn which is similar, but has a pointer argument, which
-		 will be later set to the &.omp_data_arr of the corresponding
-		 BUILT_IN_GOACC_DATA_START.
-		 This allows us to pretend that the &.omp_data_arr argument of
-		 BUILT_IN_GOACC_DATA_START does not escape.  */
-	  g = gimple_build_call_internal (IFN_GOACC_DATA_END_WITH_ARG, 1,
-	  null_pointer_node);
-	  /* Clear the tentatively set PROP_gimple_lompifn, to indicate that
-		 IFN_GOACC_DATA_END_WITH_ARG needs to be expanded.  The argument
-		 is not abi-compatible with the GOACC_data_end function, which
-		 has no arguments.  */
-	  cfun->curr_properties &= ~PROP_gimple_lompifn;
+	  end_ix = BUILT_IN_GOACC_DATA_END;
 	  break;
 	case OMP_TARGET_DATA:
-	  {
-		tree fn = builtin_decl_explicit (BUILT_IN_GOMP_TARGET_END_DATA);
-		g = gimple_build_call (fn, 0);
-	  }
+	  end_ix = BUILT_IN_GOMP_TARGET_END_DATA;
 	  break;
 	default:
 	  gcc_unreachable ();
 	}
+	  tree fn = builtin_decl_explicit (end_ix);
+	  g = gimple_build_call (fn, 0);
 	  gimple_seq cleanup = NULL;
 	  gimple_seq_add_stmt (&cleanup, g);
 	  g = gimple_build_try (body, cleanup, GIMPLE_TRY_FINALLY);
@@ -10939,9 +10927,6 @@ gimplify_function_tree (tree fndecl)
  if necessary.  */
   cfun->curr_properties |= PROP_gimple_lva;
 
-  /* Tentatively set PROP_gimple_lompifn.  */
-  cfun->curr_properties |= PROP_gimple_lompifn;
-
   for (parm = DECL_ARGUMENTS (fndecl); parm ; parm = DECL_CHAIN (parm))
 {
   /* Preliminarily mark non-addressed complex variables as eligible
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 317149e..6fac752 100644
--- a/gcc/

RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.

2015-11-06 Thread Simon Dardis
Committed r229844.

Thanks,
Simon

> -Original Message-
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com]
> Sent: 03 November 2015 14:09
> To: Simon Dardis; Alan Lawrence; Matthew Fortune
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.
> 
> 
> 
> > -Original Message-
> > From: Simon Dardis [mailto:simon.dar...@imgtec.com]
> > Sent: Wednesday, October 07, 2015 6:51 AM
> > To: Alan Lawrence; Matthew Fortune; Moore, Catherine
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.
> >
> > On the change from smin/smax it was a deliberate change as I managed
> > to confuse myself of the mode patterns, correct version follows.
> > Reverted back to VWHB for smax/smin. Stylistic point addressed.
> >
> > No new regression, ok for commit?
> >
> 
> Yes, OK to commit.  Sorry for the delay in review.
> Catherine
> 
> >
> > Index: config/mips/loongson.md
> >
> ==
> > =
> > --- config/mips/loongson.md (revision 228282)
> > +++ config/mips/loongson.md (working copy)
> > @@ -852,58 +852,66 @@
> >"dsrl\t%0,%1,%2"
> >[(set_attr "type" "fcvt")])
> >
> > -(define_expand "reduc_uplus_"
> > -  [(match_operand:VWH 0 "register_operand" "")
> > -   (match_operand:VWH 1 "register_operand" "")]
> > +(define_insn "vec_loongson_extract_lo_"
> > +  [(set (match_operand: 0 "register_operand" "=r")
> > +(vec_select:
> > +  (match_operand:VWHB 1 "register_operand" "f")
> > +  (parallel [(const_int 0)])))]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> > -{
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_add3);
> > -  DONE;
> > -})
> > +  "mfc1\t%0,%1"
> > +  [(set_attr "type" "mfc")])
> >
> > -; ??? Given that we're not describing a widening reduction, we should
> > -; not have separate optabs for signed and unsigned.
> > -(define_expand "reduc_splus_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > +(define_expand "reduc_plus_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VWHB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  emit_insn (gen_reduc_uplus_(operands[0], operands[1]));
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_add3);
> emit_insn
> > + (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_smax_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > +(define_expand "reduc_smax_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VWHB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_smax3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_smax3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_smin_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > +(define_expand "reduc_smin_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VWHB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_smin3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_smin3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_umax_"
> > -  [(match_operand:VB 0 "register_operand" "")
> > +(define_expand "reduc_umax_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_umax3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_umax3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_umin_"
> > -  [(match_operand:VB 0 "register_operand" "")
> > +(define_expand "reduc_umin_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_umin3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_umin3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> >
> > -Original Message-
> > From: Alan Lawrence [mailto:alan.lawre...@arm.com]
> > Sent: 06 October 2015 11:12
> > To: Simon Dardis; Matthew Fortune; Moore, Catherine
> > Cc: gcc-patches@gcc.gnu.org
> > Su

Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Ramana Radhakrishnan


On 29/10/15 14:14, Kyrill Tkachov wrote:
> 
> On 29/10/15 14:00, Marcus Shawcroft wrote:
>> On 29 October 2015 at 13:50, Kyrill Tkachov  wrote:
>>
> Ok for trunk?
 rtl.h exposes reg_or_subregno() already doesn't that do what we need here?
>>>
>>> reg_or_subregno assumes that what it's passed is REG or a SUBREG.
>>> It will ICE on any other rtx. Here I want to strip the subreg if it is
>>> a subreg, but leave it as it is otherwise.
>> OK, I follow.
>>
 The test case is not aarch64 specific therefore I think convention is
 that it should go into a generic directory.
>>>
>>> Ok, I'll put it in gcc.dg/
>>
>> OK with the test case moved. Thanks /Marcus
>>
> Thanks, but I'd like to do a slight respin.
> The testcase is moved to gcc.dg but I also avoid creating the new
> helper function and just do the SUBREG extraction once at the very end.
> This makes the patch smaller.
> 
> Since you're ok with the approach and this revision is logically equivalent,
> I just need an ok from an arm perspective.

OK, looks good to me and assuming you've tested this with a regression test run 
targeting cortex-a53 to stress the function ;)


regards
Ramana
> 
> Thanks,
> Kyrill
> 
> 2015-10-29  Kyrylo Tkachov  
> 
> PR target/68088
> * config/arm/aarch-common.c (aarch_accumulator_forwarding): Strip
> subregs from accumulator and make sure it's a register.
> 
> 2015-10-29  Kyrylo Tkachov  
> 
> PR target/68088
> * gcc.dg/pr68088_1.c: New test.
> 


Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03170.html

Thanks,
Kyrill

On 29/10/15 14:14, Kyrill Tkachov wrote:


On 29/10/15 14:00, Marcus Shawcroft wrote:

On 29 October 2015 at 13:50, Kyrill Tkachov  wrote:


Ok for trunk?

rtl.h exposes reg_or_subregno() already doesn't that do what we need here?


reg_or_subregno assumes that what it's passed is REG or a SUBREG.
It will ICE on any other rtx. Here I want to strip the subreg if it is
a subreg, but leave it as it is otherwise.

OK, I follow.


The test case is not aarch64 specific therefore I think convention is
that it should go into a generic directory.


Ok, I'll put it in gcc.dg/


OK with the test case moved. Thanks /Marcus


Thanks, but I'd like to do a slight respin.
The testcase is moved to gcc.dg but I also avoid creating the new
helper function and just do the SUBREG extraction once at the very end.
This makes the patch smaller.

Since you're ok with the approach and this revision is logically equivalent,
I just need an ok from an arm perspective.

Thanks,
Kyrill

2015-10-29  Kyrylo Tkachov  

PR target/68088
* config/arm/aarch-common.c (aarch_accumulator_forwarding): Strip
subregs from accumulator and make sure it's a register.

2015-10-29  Kyrylo Tkachov  

PR target/68088
* gcc.dg/pr68088_1.c: New test.





Re: [PATCH][ARM] Fix checking RTL error in cortex_a9_sched_adjust_cost

2015-11-06 Thread Kyrill Tkachov


On 06/11/15 11:37, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03170.html


My apologies, I meant to ping another patch.
Please disregard that.

Kyrill



Thanks,
Kyrill
On 30/10/15 15:47, Kyrill Tkachov wrote:


On 30/10/15 14:37, Ramana Radhakrishnan wrote:


On 29/10/15 16:02, Kyrill Tkachov wrote:

Hi all,

An arm-none-eabi build with RTL checking and --with-cpu=cortex-a9 fails because
cortex_a9_sched_adjust_cost tries to access the SET_DEST of a PARALLEL.
The correct thing to do is to call single_set on dep, which will return a 
simple SET
that we can take the SET_DEST of or NULL if there's more than one SET.

This patch does that.
The arm-none-eabi build passes.
Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-10-29  Kyrylo Tkachov  

 * config/arm/arm.c (cortex_a9_sched_adjust_cost): Use reg_set_p to
 check for dependencies.

Ok - but I think we also need a patch to improve the comment for reg_set_p, 
probably because it started life as internal function but now has wider 
visibility.


Thanks,

Here's a patch for the reg_set_p comment.
Committing as obvious.

Kyrill




Thanks,
Ramana









Re: [PATCH][ARM] Fix checking RTL error in cortex_a9_sched_adjust_cost

2015-11-06 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03170.html

Thanks,
Kyrill
On 30/10/15 15:47, Kyrill Tkachov wrote:


On 30/10/15 14:37, Ramana Radhakrishnan wrote:


On 29/10/15 16:02, Kyrill Tkachov wrote:

Hi all,

An arm-none-eabi build with RTL checking and --with-cpu=cortex-a9 fails because
cortex_a9_sched_adjust_cost tries to access the SET_DEST of a PARALLEL.
The correct thing to do is to call single_set on dep, which will return a 
simple SET
that we can take the SET_DEST of or NULL if there's more than one SET.

This patch does that.
The arm-none-eabi build passes.
Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-10-29  Kyrylo Tkachov  

 * config/arm/arm.c (cortex_a9_sched_adjust_cost): Use reg_set_p to
 check for dependencies.

Ok - but I think we also need a patch to improve the comment for reg_set_p, 
probably because it started life as internal function but now has wider 
visibility.


Thanks,

Here's a patch for the reg_set_p comment.
Committing as obvious.

Kyrill




Thanks,
Ramana







Re: Merge of HSA branch

2015-11-06 Thread Thomas Schwinge
Hi!

On Fri, 6 Nov 2015 12:03:25 +0100, Bernd Schmidt  wrote:
> On 11/06/2015 11:30 AM, Richard Biener wrote:
> > On Fri, 6 Nov 2015, Bernd Schmidt wrote:
> >>
> >> Realistically we're probably not going to reject this work, but I still 
> >> want
> >> to ask whether the approach was acked by the community before you started. 
> >> I'm
> >> really not exactly thrilled about having two different classes of backends 
> >> in
> >> the compiler, and two different ways of handling offloading.
> >
> > Realistically the other approaches werent acked either (well, implicitely
> > by review).
> 
> I think the LTO approach was discussed beforehand. As far as I remember 
> (and Jakub may correct me) it was considered for intelmic, and Jakub had 
> considerable input on it. I heard that it came up at the 2013 Cauldron.
> Writing an rtl backend is the default thing to do for gcc and I would 
> expect any other approach to be discussed beforehand.
> 
> > Not doing an RTL backend for NVPTX would have simplified
> > your life as well.
> 
> I'm not convinced about this. At least I just had to turn off the 
> register allocator, not write a new one.

From the notes of the Accelerator BoF at the GNU Tools Cauldron 2013,
:

| The main issue we discussed in the backend category was how to target
| more than one ISA when generating code (i.e., we need code in the host's
| ISA and in the accelerator(s)' (virtual) ISA(s)).  Multi-target support
| in GCC might be one option, but would probably need quite some time and
| thus depending on it would probably delay the accelerator efforts.  It
| might be simpler to stream code several times to different backends
| using the LTO infrastructure.  [...] A third
| option that SuSE is experimenting with is not writing a new backend but
| instead generating code right after the last GIMPLE pass; however, HSAIL
| needs register allocation, so it was noted that writing a light-weight
| backend might be
| easier.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: regrename: don't overflow insn_rr_info

2015-11-06 Thread Ramana Radhakrishnan


On 06/11/15 11:31, Bernd Schmidt wrote:
> On 11/06/2015 12:17 PM, Ramana Radhakrishnan wrote:
>> On 06/11/15 11:08, Bernd Schmidt wrote:
>>> This one is a fix for something that could currently only affect c6x, but I 
>>> have code that exposes it on i386.
>>>
>>> When optionally gathering operand info in regrename, we can overflow the 
>>> array in certain situations. This can occur when we have a situation where 
>>> a value is constructed in multiple small registers and then accessed as a 
>>> larger one (CDImode in the testcase I have). In that case we enter the 
>>> "superset" path, which fails the involved chains, but the smaller pieces 
>>> still all get seen by record_operand_use, and there may be more of them 
>>> than MAX_REGS_PER_ADDRESS.
>>>
>>> The following fixes it. Bootstrapped and tested  with -frename-registers 
>>> enabled at -O1 on x86_64-linux. Ok?
>>>
>>>
>>> Bernd
>>
>> This sounds like it will fix http://gcc.gnu.org/PR66785 ...
> 
> Ah, I didn't realize something else was using this functionality:
> 
> gcc/config/aarch64/cortex-a57-fma-steering.c
> 1025:  regrename_init (true);
> 
> Yeah, the description of that bug makes it sound like the same issue.

Yeah looks like the ICE goes away with a quick spin - I've not done any deeper 
analysis but that looks like a fix.

I'll take the opportunity to point out gcc11{3-6} if you need an aarch64 
machine on the compile farm if you wanted access to one.

regards
Ramana



> 
> 
> Bernd


Re: regrename: don't overflow insn_rr_info

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 12:17 PM, Ramana Radhakrishnan wrote:

On 06/11/15 11:08, Bernd Schmidt wrote:

This one is a fix for something that could currently only affect c6x, but I 
have code that exposes it on i386.

When optionally gathering operand info in regrename, we can overflow the array in certain 
situations. This can occur when we have a situation where a value is constructed in 
multiple small registers and then accessed as a larger one (CDImode in the testcase I 
have). In that case we enter the "superset" path, which fails the involved 
chains, but the smaller pieces still all get seen by record_operand_use, and there may be 
more of them than MAX_REGS_PER_ADDRESS.

The following fixes it. Bootstrapped and tested  with -frename-registers 
enabled at -O1 on x86_64-linux. Ok?


Bernd


This sounds like it will fix http://gcc.gnu.org/PR66785 ...


Ah, I didn't realize something else was using this functionality:

gcc/config/aarch64/cortex-a57-fma-steering.c
1025:  regrename_init (true);

Yeah, the description of that bug makes it sound like the same issue.


Bernd


Re: [hsa 10/12] HSAIL BRIG description header file (hopefully not a licensing issue)

2015-11-06 Thread Bernd Schmidt

On 11/05/2015 11:05 PM, Martin Jambor wrote:

Initially, I have created the file by copying out pieces of PDF
documentation but the latest version of the file (describing final
HSAIL 1.0) is actually taken from the HSAIL (dis)assembler developed
by HSA foundation and released by "University of Illinois/NCSA Open
Source License."

The license is "GPL-compatible" according to FSF
(http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
so I believe that means we can put it inside GCC and I hope I also do
not need any special steering committee approval or whatnot.  At the
same time, the license comes with three restrictions that I hope I
have fulfilled by keeping them in the header comment.  Nevertheless,
if anybody knowledgeable can tell me what is the known right thing to
do (or to confirm this is indeed the right thing to do), I'll be very
happy.


It's not something I as a reviewer would want to decide. so I think this 
really is a question for the Steering Committee - they might not know 
the answer either but they can ask the FSF.


David Cc'ed so he can take the necessary steps.


Bernd


+/* HSAIL and BRIG related macros and definitions.
+   Copyright (c) 2013-2015, Advanced Micro Devices, Inc.
+   Copyright (C) 2013-2015 Free Software Foundation, Inc.
+
+   Majority of contents in this file has originally been distributed under the
+   University of Illinois/NCSA Open Source License.  This license mandates that
+   the following conditions are observed when distributing this file:
+
+ * Redistributions of source code must retain the above copyright notice,
+   this list of conditions and the following disclaimers.
+
+ * Redistributions in binary form must reproduce the above copyright 
notice,
+   this list of conditions and the following disclaimers in the
+   documentation and/or other materials provided with the distribution.
+
+ * Neither the names of the HSA Team, HSA Foundation, University of
+   Illinois at Urbana-Champaign, nor the names of its contributors may be
+   used to endorse or promote products derived from this Software without
+   specific prior written permission.
+
+   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+   CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+   WITH THE SOFTWARE.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */




Re: [PATCH] Make BB vectorizer work on sub-BBs

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Kyrill Tkachov wrote:

> Hi Richard,
> 
> On 06/11/15 11:09, Richard Biener wrote:
> > On Fri, 6 Nov 2015, Richard Biener wrote:
> > 
> > > The following patch makes the BB vectorizer not only handle BB heads
> > > (until the first stmt with a data reference it cannot handle) but
> > > arbitrary regions in a BB separated by such stmts.
> > > 
> > > This improves the number of BB vectorizations from 469 to 556
> > > in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
> > > 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
> > > 1x481.wrf failing both patched and unpatched (have to update my
> > > config used for such experiments it seems ...)
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
> > > 
> > > I'm currently re-testing for a cosmetic change I made when writing
> > > the changelog.
> > > 
> > > I expected (and there are) some issues with compile-time.  Left
> > > is unpatched and right is patched.
> > > 
> > > '403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
> > > '483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
> > > '416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
> > > '435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
> > > '447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
> > > '453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
> > > '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
> > > '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
> > > 
> > > other benchmarks are unchanged.  I'm double-checking now that a followup
> > > patch I have which re-implements BB vectorization dependence checking
> > > fixes this (that's the only quadraticness I know of).
> > Fixes all but
> > 
> > '453.povray': 00:04:54 (294)  | '453.povray': 00:06:46 (406)
> 
> Note that povray is currently suffering from PR 68198

Ah, yeah.  Seems to run into

/space/rguenther/install-trunk/usr/local/bin/g++ -c -o fnpovfpu.o 
-DSPEC_CPU -DNDEBUG-Ofast -fopt-info-vec -ftime-report 
-Wl,-rpath=/abuild/rguenther/install-trunk/usr/local/lib64   
-DSPEC_CPU_LP64 -Wno-multichar  fnpovfpu.cpp
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
specmake: *** [fnpovfpu.o] Error 4

and dmesg

[7525617.394116] Out of memory: Kill process 31426 (cc1plus) score 832 or 
sacrif
ice child
[7525617.394117] Killed process 31426 (cc1plus) total-vm:8399700kB, 
anon-rss:679
0020kB, file-rss:1584kB

for me (and that's the one taking all the time).  I can imagine that
with many basic-blocks the patch might end up as a net slowdown
still.  I'll try to investigate anyway, maybe I'm leaking sth.

Richard.

> Kyrill
> 
> > 
> > it even improves compile-time on some:
> > 
> > '464.h264ref': 00:00:26 (26)  | '464.h264ref': 00:00:21 (21)
> > 
> > it also increases the number of vectorized BBs to 722.
> > 
> > Needs some work still though.
> > 
> > Richard.
> > 
> > > Richard.
> > > 
> > > 2015-11-06  Richard Biener  
> > > 
> > >   * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
> > >   members.
> > >   (vect_stmt_in_region_p): Declare.
> > >   * tree-vect-slp.c (new_bb_vec_info): Work on a region.
> > >   (destroy_bb_vec_info): Likewise.
> > >   (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
> > >   (vect_get_and_check_slp_defs): Likewise.
> > >   (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
> > >   (vect_slp_bb): Likewise.
> > >   * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
> > >   in terms of vect_stmt_in_region_p.
> > >   (vect_pattern_recog): Iterate over the BB region.
> > >   * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
> > >   * tree-vectorizer.c (vect_stmt_in_region_p): New function.
> > >   (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
> > > 
> > >   * config/i386/i386.c: Include gimple-iterator.h.
> > >   * config/aarch64/aarch64.c: Likewise.
> > > 
> > >   * gcc.dg/vect/bb-slp-38.c: New testcase.
> > > 
> > > Index: gcc/tree-vectorizer.h
> > > ===
> > > *** gcc/tree-vectorizer.h.orig2015-11-05 09:52:00.640227178 +0100
> > > --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
> > > *** nested_in_vect_loop_p (struct loop *loop
> > > *** 390,395 
> > > --- 390,397 
> > >typedef struct _bb_vec_info : public vec_info
> > >{
> > >  basic_block bb;
> > > +   gimple_stmt_iterator region_begin;
> > > +   gimple_stmt_iterator region_end;
> > >} *bb_vec_info;
> > >   #define BB_VINFO_BB(B)   (B)->bb
> > > *** void vect_pattern_recog (vec_info *);
> > > *** 1085,1089 
> > > --- 1087,1092 
> > >/* In tree-vectorizer

Re: regrename: don't overflow insn_rr_info

2015-11-06 Thread Ramana Radhakrishnan


On 06/11/15 11:08, Bernd Schmidt wrote:
> This one is a fix for something that could currently only affect c6x, but I 
> have code that exposes it on i386.
> 
> When optionally gathering operand info in regrename, we can overflow the 
> array in certain situations. This can occur when we have a situation where a 
> value is constructed in multiple small registers and then accessed as a 
> larger one (CDImode in the testcase I have). In that case we enter the 
> "superset" path, which fails the involved chains, but the smaller pieces 
> still all get seen by record_operand_use, and there may be more of them than 
> MAX_REGS_PER_ADDRESS.
> 
> The following fixes it. Bootstrapped and tested  with -frename-registers 
> enabled at -O1 on x86_64-linux. Ok?
> 
> 
> Bernd

This sounds like it will fix http://gcc.gnu.org/PR66785 ...

Ramana


Re: [PATCH] Merge from gomp-4_5-branch to trunk

2015-11-06 Thread Thomas Schwinge
Hi!

On Thu, 5 Nov 2015 10:41:41 -0500, Nathan Sidwell  wrote:
> On 11/05/15 10:29, Jakub Jelinek wrote:
> > I've merged the current state of gomp-4_5-branch into trunk, after
> > bootstrapping/regtesting it on x86_64-linux and i686-linux.
> >
> > There are
> > +FAIL: gfortran.dg/goacc/private-3.f95   -O  (test for excess errors)
> > +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
> > +UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
> > executable
> > +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
> > +UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
> > executable
> > +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
> > +UNRESOLVED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-v-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
> > executable
> > +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
> > +UNRESOLVED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/loop-red-w-2.c 
> > -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 compilation failed to produce 
> > executable
> > regressions, but I really don't know why OpenACC allows reductions against
> > private variables, so either the testcases are wrong, or if OpenACC
> > reduction can work against private vars (automatic vars inside of parallel
> > too?), then perhaps it shouldn't set check_non_private for OpenACC
> > reduction clauses or something similar.  Certainly, if there is private
> > on the target region, returning 1 from omp_check_private is IMNSHO desirable
> > (and required for OpenMP at least).
> 
> I'm working on porting patches for that, and I had noticed the 
> check_non_private 
> anomoly earlier today ...
> 
> I believe the c/c++ test cases are valid OpenACC, FWIW. (not checked the 
> fortran 
> one yet)

If that helps, this functionality ("private variable may also appear
inside a reduction clause"), and the Fortran test case got added by Cesar
in gomp-4_0-branch r215038,
.

> Anyway, thanks for the heads-up, my ball.

Meanwhile, XFAILed in r229841:

commit 6e9b4ab07e26928819f04e39c20cb3cfceda9740
Author: tschwinge 
Date:   Fri Nov 6 11:11:34 2015 +

XFAIL testcases regressed after r229814, "Merge from gomp-4_5-branch to 
trunk"

gcc/testsuite/
* gfortran.dg/goacc/private-3.f95: XFAIL.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: XFAIL.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229841 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog|4 
 gcc/testsuite/gfortran.dg/goacc/private-3.f95  |3 ++-
 libgomp/ChangeLog  |5 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c |2 ++
 libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c |2 ++
 5 files changed, 15 insertions(+), 1 deletion(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index af9bd72..b0e78e9 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-11-06  Thomas Schwinge  
+
+   * gfortran.dg/goacc/private-3.f95: XFAIL.
+
 2015-11-06  Joost VandeVondele  
 
PR middle-end/53852
diff --git gcc/testsuite/gfortran.dg/goacc/private-3.f95 
gcc/testsuite/gfortran.dg/goacc/private-3.f95
index aa12a56..af7d683 100644
--- gcc/testsuite/gfortran.dg/goacc/private-3.f95
+++ gcc/testsuite/gfortran.dg/goacc/private-3.f95
@@ -1,4 +1,6 @@
 ! { dg-do compile }
+! 

+! { dg-xfail-if "TODO" { *-*-* } }
 
 ! test for private variables in a reduction clause
 
@@ -7,7 +9,6 @@ program test
   integer, parameter :: n = 100
   integer :: i, k
 
-!  FIXME: This causes an ICE in the gimplifier.
 !  !$acc parallel private (k) reduction (+:k)
 !  do i = 1, n
 ! k = k + 1
diff --git libgomp/ChangeLog libgomp/ChangeLog
index 26377b6..ab2a25a 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2015-11-06  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: XFAIL.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
+
 2015-11-05  Jakub Jelinek  
Ilya Verbin  
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-

Re: [PATCH] Make BB vectorizer work on sub-BBs

2015-11-06 Thread Kyrill Tkachov

Hi Richard,

On 06/11/15 11:09, Richard Biener wrote:

On Fri, 6 Nov 2015, Richard Biener wrote:


The following patch makes the BB vectorizer not only handle BB heads
(until the first stmt with a data reference it cannot handle) but
arbitrary regions in a BB separated by such stmts.

This improves the number of BB vectorizations from 469 to 556
in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
1x481.wrf failing both patched and unpatched (have to update my
config used for such experiments it seems ...)

Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.

I'm currently re-testing for a cosmetic change I made when writing
the changelog.

I expected (and there are) some issues with compile-time.  Left
is unpatched and right is patched.

'403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
'483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
'416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
'435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
'447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
'453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
'454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
'481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)

other benchmarks are unchanged.  I'm double-checking now that a followup
patch I have which re-implements BB vectorization dependence checking
fixes this (that's the only quadraticness I know of).

Fixes all but

'453.povray': 00:04:54 (294)  | '453.povray': 00:06:46 (406)


Note that povray is currently suffering from PR 68198

Kyrill



it even improves compile-time on some:

'464.h264ref': 00:00:26 (26)  | '464.h264ref': 00:00:21 (21)

it also increases the number of vectorized BBs to 722.

Needs some work still though.

Richard.


Richard.

2015-11-06  Richard Biener  

* tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
members.
(vect_stmt_in_region_p): Declare.
* tree-vect-slp.c (new_bb_vec_info): Work on a region.
(destroy_bb_vec_info): Likewise.
(vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
(vect_get_and_check_slp_defs): Likewise.
(vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
(vect_slp_bb): Likewise.
* tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
in terms of vect_stmt_in_region_p.
(vect_pattern_recog): Iterate over the BB region.
* tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
* tree-vectorizer.c (vect_stmt_in_region_p): New function.
(pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.

* config/i386/i386.c: Include gimple-iterator.h.
* config/aarch64/aarch64.c: Likewise.

* gcc.dg/vect/bb-slp-38.c: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h.orig  2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vectorizer.h   2015-11-05 13:20:58.385786476 +0100
*** nested_in_vect_loop_p (struct loop *loop
*** 390,395 
--- 390,397 
   typedef struct _bb_vec_info : public vec_info
   {
 basic_block bb;
+   gimple_stmt_iterator region_begin;
+   gimple_stmt_iterator region_end;
   } *bb_vec_info;
   
   #define BB_VINFO_BB(B)   (B)->bb

*** void vect_pattern_recog (vec_info *);
*** 1085,1089 
--- 1087,1092 
   /* In tree-vectorizer.c.  */
   unsigned vectorize_loops (void);
   void vect_destroy_datarefs (vec_info *);
+ bool vect_stmt_in_region_p (vec_info *, gimple *);
   
   #endif  /* GCC_TREE_VECTORIZER_H  */

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c.orig2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
*** vect_get_and_check_slp_defs (vec_info *v
*** 209,215 
 unsigned int i, number_of_oprnds;
 gimple *def_stmt;
 enum vect_def_type dt = vect_uninitialized_def;
-   struct loop *loop = NULL;
 bool pattern = false;
 slp_oprnd_info oprnd_info;
 int first_op_idx = 1;
--- 209,214 
*** vect_get_and_check_slp_defs (vec_info *v
*** 218,226 
 bool first = stmt_num == 0;
 bool second = stmt_num == 1;
   
-   if (is_a  (vinfo))

- loop = LOOP_VINFO_LOOP (as_a  (vinfo));
-
 if (is_gimple_call (stmt))
   {
 number_of_oprnds = gimple_call_num_args (stmt);
--- 217,222 
*** again:
*** 276,286 
from the pattern.  Check that all the stmts of the node are in the
pattern.  */
 if (def_stmt && gimple_bb (def_stmt)
!   && ((is_a  (vinfo)
!  && flow_bb_inside_loop_p (loop, gimple_bb 

Re: [PATCH] Make BB vectorizer work on sub-BBs

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Richard Biener wrote:

> 
> The following patch makes the BB vectorizer not only handle BB heads
> (until the first stmt with a data reference it cannot handle) but
> arbitrary regions in a BB separated by such stmts.
> 
> This improves the number of BB vectorizations from 469 to 556
> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray 
> 1x481.wrf failing both patched and unpatched (have to update my
> config used for such experiments it seems ...)
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
> 
> I'm currently re-testing for a cosmetic change I made when writing
> the changelog.
> 
> I expected (and there are) some issues with compile-time.  Left
> is unpatched and right is patched.
> 
> '403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
> '483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
> '416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
> '435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
> '447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
> '453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
> 
> other benchmarks are unchanged.  I'm double-checking now that a followup
> patch I have which re-implements BB vectorization dependence checking
> fixes this (that's the only quadraticness I know of).

Fixes all but

'453.povray': 00:04:54 (294)  | '453.povray': 00:06:46 (406)

it even improves compile-time on some:

'464.h264ref': 00:00:26 (26)  | '464.h264ref': 00:00:21 (21)

it also increases the number of vectorized BBs to 722.

Needs some work still though.

Richard.

> Richard.
> 
> 2015-11-06  Richard Biener  
> 
>   * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
>   members.
>   (vect_stmt_in_region_p): Declare.
>   * tree-vect-slp.c (new_bb_vec_info): Work on a region.
>   (destroy_bb_vec_info): Likewise.
>   (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
>   (vect_get_and_check_slp_defs): Likewise.
>   (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
>   (vect_slp_bb): Likewise.
>   * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
>   in terms of vect_stmt_in_region_p.
>   (vect_pattern_recog): Iterate over the BB region.
>   * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
>   * tree-vectorizer.c (vect_stmt_in_region_p): New function.
>   (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
> 
>   * config/i386/i386.c: Include gimple-iterator.h.
>   * config/aarch64/aarch64.c: Likewise.
> 
>   * gcc.dg/vect/bb-slp-38.c: New testcase.
> 
> Index: gcc/tree-vectorizer.h
> ===
> *** gcc/tree-vectorizer.h.orig2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
> *** nested_in_vect_loop_p (struct loop *loop
> *** 390,395 
> --- 390,397 
>   typedef struct _bb_vec_info : public vec_info
>   {
> basic_block bb;
> +   gimple_stmt_iterator region_begin;
> +   gimple_stmt_iterator region_end;
>   } *bb_vec_info;
>   
>   #define BB_VINFO_BB(B)   (B)->bb
> *** void vect_pattern_recog (vec_info *);
> *** 1085,1089 
> --- 1087,1092 
>   /* In tree-vectorizer.c.  */
>   unsigned vectorize_loops (void);
>   void vect_destroy_datarefs (vec_info *);
> + bool vect_stmt_in_region_p (vec_info *, gimple *);
>   
>   #endif  /* GCC_TREE_VECTORIZER_H  */
> Index: gcc/tree-vect-slp.c
> ===
> *** gcc/tree-vect-slp.c.orig  2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vect-slp.c   2015-11-06 10:22:56.707880233 +0100
> *** vect_get_and_check_slp_defs (vec_info *v
> *** 209,215 
> unsigned int i, number_of_oprnds;
> gimple *def_stmt;
> enum vect_def_type dt = vect_uninitialized_def;
> -   struct loop *loop = NULL;
> bool pattern = false;
> slp_oprnd_info oprnd_info;
> int first_op_idx = 1;
> --- 209,214 
> *** vect_get_and_check_slp_defs (vec_info *v
> *** 218,226 
> bool first = stmt_num == 0;
> bool second = stmt_num == 1;
>   
> -   if (is_a  (vinfo))
> - loop = LOOP_VINFO_LOOP (as_a  (vinfo));
> - 
> if (is_gimple_call (stmt))
>   {
> number_of_oprnds = gimple_call_num_args (stmt);
> --- 217,222 
> *** again:
> *** 276,286 
>from the pattern.  Check that all the stmts of the node are in the
>pattern.  */
> if (def_stmt && gimple_bb (def_stmt)
> !   && ((is_a  (vinfo)
> !

regrename: don't overflow insn_rr_info

2015-11-06 Thread Bernd Schmidt
This one is a fix for something that could currently only affect c6x, 
but I have code that exposes it on i386.


When optionally gathering operand info in regrename, we can overflow the 
array in certain situations. This can occur when we have a situation 
where a value is constructed in multiple small registers and then 
accessed as a larger one (CDImode in the testcase I have). In that case 
we enter the "superset" path, which fails the involved chains, but the 
smaller pieces still all get seen by record_operand_use, and there may 
be more of them than MAX_REGS_PER_ADDRESS.


The following fixes it. Bootstrapped and tested  with -frename-registers 
enabled at -O1 on x86_64-linux. Ok?



Bernd
	* regrename.c (record_operand_use): Keep track of failed operands
	and stop appending if we see any.
	* regrename.h (struct operand_rr_info): Add a failed field and shrink
	n_chains to short.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 229049)
+++ gcc/regrename.c	(working copy)
@@ -204,8 +204,13 @@ mark_conflict (struct du_head *chains, u
 static void
 record_operand_use (struct du_head *head, struct du_chain *this_du)
 {
-  if (cur_operand == NULL)
+  if (cur_operand == NULL || cur_operand->failed)
 return;
+  if (head->cannot_rename)
+{
+  cur_operand->failed = true;
+  return;
+}
   gcc_assert (cur_operand->n_chains < MAX_REGS_PER_ADDRESS);
   cur_operand->heads[cur_operand->n_chains] = head;
   cur_operand->chains[cur_operand->n_chains++] = this_du;
Index: gcc/regrename.h
===
--- gcc/regrename.h	(revision 229049)
+++ gcc/regrename.h	(working copy)
@@ -68,7 +71,8 @@ struct du_chain
 struct operand_rr_info
 {
   /* The number of chains recorded for this operand.  */
-  int n_chains;
+  short n_chains;
+  bool failed;
   /* Holds either the chain for the operand itself, or for the registers in
  a memory operand.  */
   struct du_chain *chains[MAX_REGS_PER_ADDRESS];


Re: Merge of HSA branch

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 11:30 AM, Richard Biener wrote:

On Fri, 6 Nov 2015, Bernd Schmidt wrote:


Realistically we're probably not going to reject this work, but I still want
to ask whether the approach was acked by the community before you started. I'm
really not exactly thrilled about having two different classes of backends in
the compiler, and two different ways of handling offloading.


Realistically the other approaches werent acked either (well, implicitely
by review).


I think the LTO approach was discussed beforehand. As far as I remember 
(and Jakub may correct me) it was considered for intelmic, and Jakub had 
considerable input on it. I heard that it came up at the 2013 Cauldron.
Writing an rtl backend is the default thing to do for gcc and I would 
expect any other approach to be discussed beforehand.



Not doing an RTL backend for NVPTX would have simplified
your life as well.


I'm not convinced about this. At least I just had to turn off the 
register allocator, not write a new one.



Bernd


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 12:43 AM, Abe wrote:

Feedback from Bernd has also been applied.


But inconsistently, and I think not quite in the way I meant it in one case.


-/* Return true if a write into MEM may trap or fault.  */
-
  static bool
  noce_mem_write_may_trap_or_fault_p (const_rtx mem)
  {
-  rtx addr;
-
if (MEM_READONLY_P (mem))
  return true;

-  if (may_trap_or_fault_p (mem))
-return true;
-
+  rtx addr;
addr = XEXP (mem, 0);

/* Call target hook to avoid the effects of -fpic etc  */
@@ -2881,6 +2883,18 @@ noce_mem_write_may_trap_or_fault_p (const_rtx mem)
return false;
  }

+/* Return true if a write into MEM may trap or fault
+   without scratchpad support.  */
+
+static bool
+unsafe_address_p (const_rtx mem)
+{
+  if (may_trap_or_fault_p (mem))
+return true;
+
+  return noce_mem_write_may_trap_or_fault_p (mem);
+}


The naming seems backwards from what I suggested in terms of naming. You 
still haven't explained why you want to modify this function, or call 
the limited one even when generating scratchpads. I asked about this 
last time.



  static basic_block
-find_if_header (basic_block test_bb, int pass)
+find_if_header (basic_block test_bb, int pass, bool just_sz_spad)
  {


Arguments need documentation.


+DEFPARAM (PARAM_FORCE_ENABLE_RTL_IFCVT_SPADS,
+ "force-enable-rtl-ifcvt-spads",
+ "Force-enable the use of scratchpads in RTL if conversion, "
+ "overriding the target and the profile data or lack thereof.",
+ 0, 0, 1)
+

+DEFHOOKPOD
+(rtl_ifcvt_scratchpad_control,
+"*",
+enum rtl_ifcvt_spads_ctl_enum, rtl_ifcvt_spads_as_per_profile)
+
+DEFHOOK
+(rtl_ifcvt_get_spad,
+ "*",
+ rtx, (unsigned short size),
+ default_rtl_ifcvt_get_spad)


That moves the problematic bit in a target hook rather than fixing it. 
Two target hooks and a param is probably a bit much for a change like 
this. Which target do you actually want this for? It would help to 
understand why you're doing all this.



+enum rtl_ifcvt_spads_ctl_enum {


Names are still too verbose ("enum" shouldn't be part of it).


+rtx default_rtl_ifcvt_get_spad (unsigned short size)
+{
+  return assign_stack_local (BLKmode, size, 0);
+}


Formatting problem, here and in a few other places. I didn't fully read 
the patch this time around.


I'm probably not reviewing further patches because I don't see this 
progressing to a state where it's acceptable. Others may do so, but as 
far as I'm concerned the patch is rejected.



Bernd


regrename: Fix for earlyclobber operands

2015-11-06 Thread Bernd Schmidt
I have a patch which makes use of the renamer more often, and this 
exposed a bug with earlyclobber operands. The code that does the 
terminate_write step has the following comment:


  /* Step 5: Close open chains that overlap writes.  Similar to
 step 2, we hide in-out operands, since we do not want to
 close these chains.  We also hide earlyclobber operands,
 since we've opened chains for them in step 1, and earlier
 chains they would overlap with must have been closed at
 the previous insn at the latest, as such operands cannot
 possibly overlap with any input operands.  */

That's all right as far as it goes, but the problem is that this means 
there isn't a terminate_write step for earlyclobbers.


The following seems like the simplest possible fix. It was bootstrapped 
and tested with -frename-registers enabled at -O1 on x86_64-linux. Ok?


(Incidentally there are some avx tests that fail if they are renamed, 
apparently because the scan-assembler doesn't allow register numbers 
like %zmm10. avx512bw-vptestmb-1.c is one of those).



Bernd
	* regrename.c (record_out_operands): Terminate earlyclobbered
	operands here.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 229049)
+++ gcc/regrename.c	(working copy)
@@ -1513,6 +1525,8 @@ record_out_operands (rtx_insn *insn, boo
 	cur_operand = insn_info->op_info + i;
 
   prev_open = open_chains;
+  if (earlyclobber)
+	scan_rtx (insn, loc, cl, terminate_write, OP_OUT);
   scan_rtx (insn, loc, cl, mark_write, OP_OUT);
 
   /* ??? Many targets have output constraints on the SET_DEST


Re: Merge of HSA branch

2015-11-06 Thread Martin Liška
On 11/06/2015 11:12 AM, Bernd Schmidt wrote:
> On 11/05/2015 10:51 PM, Martin Jambor wrote:
>> Individual changes are described in slightly more detail in their
>> respective messages.  If you are interested in how the HSAIL
>> generation works in general, I encourage you to have a look at my
>> Cauldron slides or presentation, only very few things have changed as
>> far as the general principles are concerned.  Let me just quickly stress
>> here that we do acceleration within a single compiler, as opposed to
>> LTO-ways of all the other accelerator teams.
> 
> Realistically we're probably not going to reject this work, but I still want 
> to ask whether the approach was acked by the community before you started. 
> I'm really not exactly thrilled about having two different classes of 
> backends in the compiler, and two different ways of handling offloading.
> 
>> I also acknowledge that we should add HSA-specific tests to the GCC
>> testsuite but we are only now looking at how to do that and will
>> welcome any guidance in this regard.
> 
> Yeah, I was looking for any kind of new test, because...
> 
>> the class of OpenMP loops we can handle well is small,
> 
> I'd appreciate more information on what this means. Any examples or 
> performance numbers?

Hello.

As mentioned by Martin Jambor, it was explained during his speech at the 
Cauldron this year.
It can be easily explained on the following simple case:

#pragma omp target teams
#pragma omp distribute parallel for private(j)
   for (j=0; jhttps://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=get&target=mjambor-hsa-slides.pdf

> 
> 
> Bernd



Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-11-06 Thread Bernd Schmidt

On 06/17/2015 07:11 PM, Sandra Loosemore wrote:


Index: gcc/regrename.c
===
--- gcc/regrename.c (revision 224532)
+++ gcc/regrename.c (working copy)
@@ -942,19 +942,22 @@ regrename_do_replace (struct du_head *he
int reg_ptr = REG_POINTER (*chain->loc);

if (DEBUG_INSN_P (chain->insn) && REGNO (*chain->loc) != base_regno)
-   INSN_VAR_LOCATION_LOC (chain->insn) = gen_rtx_UNKNOWN_VAR_LOC ();
+   validate_change (chain->insn, &(INSN_VAR_LOCATION_LOC (chain->insn)),
+gen_rtx_UNKNOWN_VAR_LOC (), true);
else
{
- *chain->loc = gen_raw_REG (GET_MODE (*chain->loc), reg);
+ validate_change (chain->insn, chain->loc,
+  gen_raw_REG (GET_MODE (*chain->loc), reg), true);
  if (regno >= FIRST_PSEUDO_REGISTER)
ORIGINAL_REGNO (*chain->loc) = regno;
  REG_ATTRS (*chain->loc) = attr;


With a patch I'm working on that uses the renamer more often, I found 
that this is causing compare-debug failures. Validating changes to 
debug_insns (the INSN_VAR_LOCATION_LOC in particular) can apparently fail.


The following fix was bootstrapped and tested with -frename-registers 
enabled at -O1 on x86_64-linux. Ok?



Bernd

	* regrename.c (regrename_do_replace): Do not validate changes to
	debug insns.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 229049)
+++ gcc/regrename.c	(working copy)
@@ -946,10 +951,7 @@ regrename_do_replace (struct du_head *he
   struct reg_attrs *attr = REG_ATTRS (*chain->loc);
   int reg_ptr = REG_POINTER (*chain->loc);
 
-  if (DEBUG_INSN_P (chain->insn) && REGNO (*chain->loc) != base_regno)
-	validate_change (chain->insn, &(INSN_VAR_LOCATION_LOC (chain->insn)),
-			 gen_rtx_UNKNOWN_VAR_LOC (), true);
-  else
+  if (!DEBUG_INSN_P (chain->insn))
 	{
 	  validate_change (chain->insn, chain->loc, 
 			   gen_raw_REG (GET_MODE (*chain->loc), reg), true);
@@ -963,6 +965,16 @@ regrename_do_replace (struct du_head *he
   if (!apply_change_group ())
 return false;
 
+  for (chain = head->first; chain; chain = chain->next_use)
+if (DEBUG_INSN_P (chain->insn))
+  {
+	if (REGNO (*chain->loc) != base_regno)
+	  INSN_VAR_LOCATION_LOC (chain->insn) = gen_rtx_UNKNOWN_VAR_LOC ();
+	else
+	  *chain->loc = gen_raw_REG (GET_MODE (*chain->loc), reg);
+	df_insn_rescan (chain->insn);
+  }
+
   mode = GET_MODE (*head->first->loc);
   head->regno = reg;
   head->nregs = hard_regno_nregs[reg][mode];


[gomp4] Re: [3/3] OpenACC reductions

2015-11-06 Thread Thomas Schwinge
Hi Nathan!

On Mon, 2 Nov 2015 11:38:47 -0500, Nathan Sidwell  wrote:
> This patch are the initial set of tests.  The libgomp tests use an idiom of 
> summing thread identifiers and then checking the expected set of threads 
> participated.  They are all derived from the loop tests I recently added for 
> the 
> execution model itself.
> 
> The fortran test was duplicated in both the gfortran testsuite and the 
> libgomp 
> testsuite.   I deleted it from the former.  It was slightly bogus as it asked 
> for a vector-length of 40, and appeared to be working by accident by not 
> actually partitioning the loop.  I fixed that up

On gomp-4_0-branch, you had modified/XFAILed (ICE) that test in r228955,

-- which still needs to be resolved, so I left that as-is, that is, did
not delete the gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 file in
the merge commit.

> and reworked it to avoid 
> needing a reduction on a reference variable.  Reference handling will be a 
> later 
> patch.

As that is -- apparently -- functional on gomp-4_0-branch, I also left
the libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 file as-is;
it's also doing more elaborate testing in its gomp-4_0-branch variant.

Merged your trunk r229769 and r229770 into gomp-4_0-branch in r229837,
effectively just adding your new libgomp testsuite files unmodified:

commit a222b569f0234d219fec69cd13b66446f664440d
Merge: 089a022 06d6724
Author: tschwinge 
Date:   Fri Nov 6 09:40:44 2015 +

svn merge -r 229768:229770 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229837 
138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/testsuite/ChangeLog|  4 ++
 libgomp/ChangeLog  | 11 
 .../libgomp.oacc-c-c++-common/loop-red-g-1.c   | 54 
 .../libgomp.oacc-c-c++-common/loop-red-gwv-1.c | 56 
 .../libgomp.oacc-c-c++-common/loop-red-v-1.c   | 56 
 .../libgomp.oacc-c-c++-common/loop-red-v-2.c   | 59 ++
 .../libgomp.oacc-c-c++-common/loop-red-w-1.c   | 54 
 .../libgomp.oacc-c-c++-common/loop-red-w-2.c   | 57 +
 .../libgomp.oacc-c-c++-common/loop-red-wv-1.c  | 54 
 9 files changed, 405 insertions(+)


Grüße
 Thomas


signature.asc
Description: PGP signature


[gomp4] Re: [2/3] OpenACC reductions

2015-11-06 Thread Thomas Schwinge
Hi Nathan!

On Wed, 4 Nov 2015 11:59:28 -0500, Nathan Sidwell  wrote:
> [PTX backend pieces of OpenACC reduction handling]

Merged your trunk r229768 into gomp-4_0-branch in r229836:

commit 089a0224af68e30b55f42734de48adc645eb7370
Merge: 2b76127 78a78aa
Author: tschwinge 
Date:   Fri Nov 6 09:38:10 2015 +

svn merge -r 229767:229768 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229836 
138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/ChangeLog|  23 +++
 gcc/config/nvptx/nvptx.c | 169 +++
 2 files changed, 107 insertions(+), 85 deletions(-)

I hope I did the right thing replacing the existing code on
gomp-4_0-branch with what you committed to trunk: in particular, the
nvptx_lockless_update and nvptx_goacc_reduction_init functions.  That is,
in the merge commit, I effectively applied the following patch
(gomp-4_0-branch before vs. after):

--- gcc/ChangeLog
+++ gcc/ChangeLog
[...]
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -57,21 +57,22 @@
[#include directives reshuffled]
@@ -104,19 +105,18 @@ struct tree_hasher : ggc_cache_ptr_hash
 static GTY((cache)) hash_table *declared_fndecls_htab;
 static GTY((cache)) hash_table *needed_fndecls_htab;
 
-/* Size of buffer needed to broadcast across workers.  This is used
-   for both worker-neutering and worker broadcasting.   It is shared
-   by all functions emitted.  The buffer is placed in shared memory.
-   It'd be nice if PTX supported common blocks, because then this
-   could be shared across TUs (taking the largest size).  */
+/* Buffer needed to broadcast across workers.  This is used for both
+   worker-neutering and worker broadcasting.  It is shared by all
+   functions emitted.  The buffer is placed in shared memory.  It'd be
+   nice if PTX supported common blocks, because then this could be
+   shared across TUs (taking the largest size).  */
 static unsigned worker_bcast_size;
 static unsigned worker_bcast_align;
 #define worker_bcast_name "__worker_bcast"
 static GTY(()) rtx worker_bcast_sym;
 
-/* Size of buffer needed for worker reductions.  This has to be
-   distinct from the worker broadcast array, as both may be live
-   concurrently.  */
+/* Buffer needed for worker reductions.  This has to be distinct from
+   the worker broadcast array, as both may be live concurrently.  */
 static unsigned worker_red_size;
 static unsigned worker_red_align;
 #define worker_red_name "__worker_red"
@@ -3977,8 +3977,8 @@ nvptx_file_end (void)
 {
   /* Define the reduction buffer.  */
 
-  worker_red_size = (worker_red_size + worker_red_align - 1)
-   & ~(worker_red_align - 1);
+  worker_red_size = ((worker_red_size + worker_red_align - 1)
+& ~(worker_red_align - 1));
   
   fprintf (asm_out_file, "// BEGIN VAR DEF: %s\n", worker_red_name);
   fprintf (asm_out_file, ".shared .align %d .u8 %s[%d];\n",
@@ -3986,7 +3986,7 @@ nvptx_file_end (void)
   worker_red_name, worker_red_size);
 }
 }
-
+
 /* Expander for the shuffle builtins.  */
 
 static rtx
@@ -4046,6 +4046,10 @@ nvptx_expand_worker_addr (tree exp, rtx target,
   return target;
 }
 
+/* Expand the CMP_SWAP PTX builtins.  We have our own versions that do
+   not require taking the address of any object, other than the memory
+   cell being operated on.  */
+
 static rtx
 nvptx_expand_cmp_swap (tree exp, rtx target,
   machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore))
@@ -4096,7 +4100,7 @@ static GTY(()) tree 
nvptx_builtin_decls[NVPTX_BUILTIN_MAX];
 /* Return the NVPTX builtin for CODE.  */
 
 static tree
-nvptx_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+nvptx_builtin_decl (unsigned code, bool ARG_UNUSED (initialize_p))
 {
   if (code >= NVPTX_BUILTIN_MAX)
 return error_mark_node;
@@ -4110,10 +4114,10 @@ static void
 nvptx_init_builtins (void)
 {
 #define DEF(ID, NAME, T)   \
-  (nvptx_builtin_decls[NVPTX_BUILTIN_ ## ID] = \
-   add_builtin_function ("__builtin_nvptx_" NAME,  \
-build_function_type_list T,\
-NVPTX_BUILTIN_ ## ID, BUILT_IN_MD, NULL, NULL))
+  (nvptx_builtin_decls[NVPTX_BUILTIN_ ## ID]   \
+   = add_builtin_function ("__builtin_nvptx_" NAME,\
+  build_function_type_list T,  \
+  NVPTX_BUILTIN_ ## ID, BUILT_IN_MD, NULL, NULL))
 #define ST sizetype
 #define UINT unsigned_type_node
 #define LLUINT long_long_unsigned_type_node
@@ -4140,7 +4144,7 @@ nvptx_init_builtins (void)
IGNORE is nonzero if the value is to be ignored.  */
 
 static rtx
-nvptx_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
+nvptx_expand_builtin (tree exp, rtx target, rt

[gomp4] Re: [1/3] OpenACC reductions

2015-11-06 Thread Thomas Schwinge
Hi Nathan!

On Mon, 2 Nov 2015 11:18:37 -0500, Nathan Sidwell  wrote:
> This is the core execution bits of OpenACC reductions.

> One thing not handled by this patch are reductions of variables of reference 
> type.  We have an implementation on gomp4 branch [...]

Trying to keep the existing code on gomp-4_0-branch alive, I merged your
trunk r229767 into gomp-4_0-branch in r229835.  To avoid regressions in
libgomp reduction execution tests, I had to apply one hack; please have a
look.  For your easier review, here is the merge commit in two variants,
first displayed as a three-way diff by Git's --cc option:

commit 2b76127eebddb59d45e5f068324e14efe77bb05c
Merge: bed2efe 641a0fa
Author: tschwinge 
Date:   Fri Nov 6 09:33:40 2015 +

svn merge -r 229764:229767 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229835 
138bc75d-0d04-0410-961f-82ee72b054a4


 gcc/ChangeLog   | 28 +++-
 gcc/omp-low.c   | 58 ++---
 gcc/targhooks.h |  2 +-
 3 files changed, 67 insertions(+), 21 deletions(-)

diff --cc gcc/omp-low.c
index debedb1,6a0915b..da574a9
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@@ -5441,14 -5306,25 +5441,28 @@@ lower_oacc_reductions (location_t loc, 
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
{
tree orig = OMP_CLAUSE_DECL (c);
-   tree var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c);
 -  tree var = maybe_lookup_decl (orig, ctx);
++  tree var;
tree ref_to_res = NULL_TREE;
-   
+   tree incoming, outgoing;
+ 
+   enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
+   if (rcode == MINUS_EXPR)
+ rcode = PLUS_EXPR;
+   else if (rcode == TRUTH_ANDIF_EXPR)
+ rcode = BIT_AND_EXPR;
+   else if (rcode == TRUTH_ORIF_EXPR)
+ rcode = BIT_IOR_EXPR;
+   tree op = build_int_cst (unsigned_type_node, rcode);
+ 
++  var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c);
 +  if (!var)
 +var = maybe_lookup_decl (orig, ctx);
if (!var)
  var = orig;
+   gcc_assert (!is_reference (var));
  
+   incoming = outgoing = var;
+   
if (!inner)
  {
/* See if an outer construct also reduces this variable.  */
@@@ -5490,24 -5365,22 +5503,31 @@@
   see if there's a mapping for it.  */
if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
&& maybe_lookup_field (orig, outer))
- ref_to_res = build_receiver_ref (orig, false, outer);
+ {
+   ref_to_res = build_receiver_ref (orig, false, outer);
+   if (is_reference (orig))
+ ref_to_res = build_simple_mem_ref (ref_to_res);
  
+   outgoing = var;
+   incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+ }
++  /* This is enabled on trunk, but has been disabled in the merge of
++ trunk r229767 into gomp-4_0-branch, as otherwise there were a
++ lot of regressions in libgomp reduction execution tests.  It is
++ unclear if the problem is in the tests themselves, or here, or
++ elsewhere.  Given the usage of "var =
++ OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c)" on gomp-4_0-branch, maybe
++ we have to consider that here, too, instead of "orig"?  */
++#if 0
+   else
+ incoming = outgoing = orig;
++#endif
+ 
  has_outer_reduction:;
  }
-   gcc_assert (!is_reference (var));
+ 
if (!ref_to_res)
  ref_to_res = integer_zero_node;
-   else if (is_reference (orig))
- ref_to_res = build_simple_mem_ref (ref_to_res);
- 
-   enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
-   if (rcode == MINUS_EXPR)
- rcode = PLUS_EXPR;
-   else if (rcode == TRUTH_ANDIF_EXPR)
- rcode = BIT_AND_EXPR;
-   else if (rcode == TRUTH_ORIF_EXPR)
- rcode = BIT_IOR_EXPR;
-   tree op = build_int_cst (unsigned_type_node, rcode);
  
/* Determine position in reduction buffer, which may be used
   by target.  */
diff --cc gcc/targhooks.h
index f8efe47a,c34e4ae..4a4496a
--- gcc/targhooks.h
+++ gcc/targhooks.h
@@@ -109,10 -109,9 +109,10 @@@ extern void default_finish_cost (void *
  extern void default_destroy_cost_data (void *);
  
  /* OpenACC hooks.  */
- extern void default_goacc_reduction (gcall *);
  extern bool default_goacc_validate_dims (tree, int [], int);
 +extern unsigned default_goacc_dim_limit (unsigned);
  extern bool default_goacc_fork_join (gcall *, const int [], bool);
+ extern void default_goacc_reduction (gcall *);
  
  /* These are here, and not in hooks.[ch], because not all users of
 hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */

..., and second, as a "plain patch" (gomp-4_0-branch before vs. after):

--- gcc/ChangeLog
+++ gcc/ChangeLog

[PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-06 Thread Kyrill Tkachov

Hi all,

In this wrong-code PR the vector setmem expansion and 
arm_block_set_aligned_vect in particular
use the wrong offset when calling adjust_automodify_address. In the attached 
testcase during the
initial zeroing out we get two V16QI stores, but they both are recorded by 
adjust_automodify_address
as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28).

This led to the scheduling pass moving the store from "x.g = 2;" to before the 
zeroing stores.

This patch fixes the problem by keeping track of the offset to which stores are 
emitted and
passing it to adjust_automodify_address as appropriate.

From inspection I see arm_block_set_unaligned_vect also has this issue so I 
performed the same
fix in that function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

This bug appears on GCC 5 too and I'm currently testing this patch there.
Ok to backport to GCC 5 as well?

Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

PR target/68143
* config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
offset from dstbase and use it appropriately in
adjust_automodify_address.
(arm_block_set_aligned_vect): Likewise.

2015-11-06  Kyrylo Tkachov  

PR target/68143
* gcc.target/arm/pr68143_1.c: New test.
commit 78c6989a7af1df672ea227057180d79d717ed5f3
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 17:29:18 2015 +

[ARM] Properly update memory offsets when expanding setmem

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 66e8afc..adf3143 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29268,7 +29268,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
   rtx (*gen_func) (rtx, rtx);
   machine_mode mode;
   unsigned HOST_WIDE_INT v = value;
-
+  unsigned int offset = 0;
   gcc_assert ((align & 0x3) != 0);
   nelt_v8 = GET_MODE_NUNITS (V8QImode);
   nelt_v16 = GET_MODE_NUNITS (V16QImode);
@@ -29289,7 +29289,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
 return false;
 
   dst = copy_addr_to_reg (XEXP (dstbase, 0));
-  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
 
   v = sext_hwi (v, BITS_PER_WORD);
   val_elt = GEN_INT (v);
@@ -29306,7 +29306,11 @@ arm_block_set_unaligned_vect (rtx dstbase,
 {
   emit_insn ((*gen_func) (mem, reg));
   if (i + 2 * nelt_mode <= length)
-	emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+	{
+	  emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+	  offset += nelt_mode;
+	  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+	}
 }
 
   /* If there are not less than nelt_v8 bytes leftover, we must be in
@@ -29317,6 +29321,9 @@ arm_block_set_unaligned_vect (rtx dstbase,
   if (i + nelt_v8 < length)
 {
   emit_insn (gen_add2_insn (dst, GEN_INT (length - i)));
+  offset += length - i;
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+
   /* We are shifting bytes back, set the alignment accordingly.  */
   if ((length & 1) != 0 && align >= 2)
 	set_mem_align (mem, BITS_PER_UNIT);
@@ -29327,12 +29334,13 @@ arm_block_set_unaligned_vect (rtx dstbase,
   else if (i < length && i + nelt_v8 >= length)
 {
   if (mode == V16QImode)
-	{
-	  reg = gen_lowpart (V8QImode, reg);
-	  mem = adjust_automodify_address (dstbase, V8QImode, dst, 0);
-	}
+	reg = gen_lowpart (V8QImode, reg);
+
   emit_insn (gen_add2_insn (dst, GEN_INT ((length - i)
 	  + (nelt_mode - nelt_v8;
+  offset += (length - i) + (nelt_mode - nelt_v8);
+  mem = adjust_automodify_address (dstbase, V8QImode, dst, offset);
+
   /* We are shifting bytes back, set the alignment accordingly.  */
   if ((length & 1) != 0 && align >= 2)
 	set_mem_align (mem, BITS_PER_UNIT);
@@ -29359,6 +29367,7 @@ arm_block_set_aligned_vect (rtx dstbase,
   rtx rval[MAX_VECT_LEN];
   machine_mode mode;
   unsigned HOST_WIDE_INT v = value;
+  unsigned int offset = 0;
 
   gcc_assert ((align & 0x3) == 0);
   nelt_v8 = GET_MODE_NUNITS (V8QImode);
@@ -29390,14 +29399,15 @@ arm_block_set_aligned_vect (rtx dstbase,
   /* Handle first 16 bytes specially using vst1:v16qi instruction.  */
   if (mode == V16QImode)
 {
-  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
   emit_insn (gen_movmisalignv16qi (mem, reg));
   i += nelt_mode;
   /* Handle (8, 16) bytes leftover using vst1:v16qi again.  */
   if (i + nelt_v8 < length && i + nelt_v16 > length)
 	{
 	  emit_insn (gen_add2_insn (dst, GEN_INT (length - nelt_mode)));
-	  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+	  offset += length - nelt_mode;
+	  mem = adjust_automodify_address (dstbase, mode, dst, offset);
 	  /* We are shifting bytes back, set the alignment accordingly.  */
 	  if ((length & 0x3) == 0)
 	set_mem_align (mem, BITS_PER_UNIT * 4);
@@ -29419,7 +29429,7 @@ arm_block_set_aligned_vect (r

Re: [PATCH] Fix PR68067

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Alan Lawrence wrote:

> On 28/10/15 13:38, Richard Biener wrote:
> > 
> > Applied as follows.
> > 
> > Bootstrapped / tested on x86_64-unknown-linux-gnu.
> > 
> > Richard.
> > 
> > 2015-10-28  Richard Biener  
> > 
> > * fold-const.c (negate_expr_p): Adjust the division case to
> > properly avoid introducing undefined overflow.
> > (fold_negate_expr): Likewise.
> 
> Since this we've been seeing an ICE compiling polynom.c from 254.gap in
> SPEC2000 on aarch64-linux-gnu with -O3 -ffast-math -mcpu=cortex-a53 (or -Ofast
> -mcpu=cortex-a53), on both native (bootstrapped and --disable-bootstrap) and
> cross-linux builds.
> 
> A number of options prevent the ICE, e.g. any of -fno-thread-jumps,
> -fno-strict-overflow, -fdump-tree-alias or -fdump-tree-ealias (!). Similarly,
> dropping the -mcpu=cortex-a53, or changing to -mcpu=cortex-a57.
> 
> (I have a recent build in a chroot for which -fno-strict-overflow does *not*
> fix the ICE but haven't yet figured out exactly what the difference in the
> chroot environment is.)
> 
> Moreover, preprocessing in a separate step (i.e. piping preprocessed output
> via a file with -E), also avoids the ICE. (This is hindering my efforts to
> reduce the testcase!).  So my hypothesis is that this is a
> front-end/preprocessor bug, rather than anything directly due to this commit.
> 
> The error message in full (line refs from that commit, r229479) is:
> =
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function
> ‘NormalizeCoeffsListx’:
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error:
> incompatible types in PHI argument 0
>  TypHandle NormalizeCoeffsListx ( hdC )
>^
> long int
> 
> int
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
> references block not in block tree
> l1_279 = PHI <1(28), l1_299(33)>

^^^

this is the error to look at!  It means that the GC heap will be corrupted
quite easily.

> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid
> PHI argument

which means this could be a followup error.  We do have a bugreport (or 
two) about similar issues that were tracked down to different patches.

Somebody needs to sit down and debug this properly ;)

> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal compiler
> error: tree check: expected class ‘type’, have ‘declaration’ (namespace_decl)
> in useless_type_conversion_p, at gimple-expr.c:84
> 0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char
> const*, int, char const*)
> ../../gcc-fsf/gcc/tree.c:9643
> 0x82561b tree_class_check
> ../../gcc-fsf/gcc/tree.h:3042
> 0x82561b useless_type_conversion_p(tree_node*, tree_node*)
> ../../gcc-fsf/gcc/gimple-expr.c:84
> 0xaca043 verify_gimple_phi
> ../../gcc-fsf/gcc/tree-cfg.c:4673
> 0xaca043 verify_gimple_in_cfg(function*, bool)
> ../../gcc-fsf/gcc/tree-cfg.c:4967
> 0x9c2e0b execute_function_todo
> ../../gcc-fsf/gcc/passes.c:1967

Interesting would be for which pass this happens - just print
*pass at this point.

> 0x9c360b do_per_function
> ../../gcc-fsf/gcc/passes.c:1659
> 0x9c3807 execute_todo
> ../../gcc-fsf/gcc/passes.c:2022
> Please submit a full bug report,
> with preprocessed source if appropriate.
> =
> which looks like an "incompatible types from PHI argument" from a first call
> to verify_gimple_phi, then a second call to verify_gimple_phi prints "invalid
> phi argument" and ICEs in the test just before possibly printing a second
> incompatible_types message.
> 
> 
> --Alan
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: Re: [PATCH] Fix PRs 66502 and 67167

2015-11-06 Thread Jiong Wang



On 21/08/15 10:47, Jiong Wang wrote:

Richard Biener writes:


I see the following ICE:

t.c:13:1: internal compiler error: in decompose_normal_address, at
rtlanal.c:6090
  }
  ^
0xc94a37 decompose_normal_address
 /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6090
0xc94d25 decompose_address(address_info*, rtx_def**, machine_mode,
unsigned char, rtx_code)
 /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6167
0xc94dc3 decompose_mem_address(address_info*, rtx_def*)
 /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6187
0xb61149 process_address_1
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:2867
0xb61c4e process_address
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:3124
0xb62607 curr_insn_transform
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:3419
0xb65250 lra_constraints(bool)
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:4421

that looks like a latent issue to me in an area of GCC I am not
familiar with.  I suggest to open a bugreport and CC Vladimir.

Thanks for the info. Done https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305


Richard,

  Though the ICE itself is caused by one latent bug in ARM backend 
(PR67305), while my
further double check shows there is performance regression since this 
patch. The regression
should have been caused by other gcc latent bugs in tree-vrp pass. 
Bugzilla created to track


  Thanks.

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234

Regards,
Jiong





The r226850 change caused us to eliminate an induction variable
early (I suspect IVOPTs would have done this later anyway, but
I did not verify that):

Replaced redundant PHI node defining bl_2 with c_1
Replaced c_1 + 1 with bl_15 in all uses of c_16 = c_1 + 1;
Removing dead stmt c_16 = c_1 + 1;
Removing dead stmt bl_2 = PHI <0(2), bl_15(3)>

Thanks,
Richard.


   Thanks.





Re: [PATCH] Fix PR68067

2015-11-06 Thread Alan Lawrence

On 28/10/15 13:38, Richard Biener wrote:


Applied as follows.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2015-10-28  Richard Biener  

* fold-const.c (negate_expr_p): Adjust the division case to
properly avoid introducing undefined overflow.
(fold_negate_expr): Likewise.


Since this we've been seeing an ICE compiling polynom.c from 254.gap in SPEC2000 
on aarch64-linux-gnu with -O3 -ffast-math -mcpu=cortex-a53 (or -Ofast 
-mcpu=cortex-a53), on both native (bootstrapped and --disable-bootstrap) and 
cross-linux builds.


A number of options prevent the ICE, e.g. any of -fno-thread-jumps, 
-fno-strict-overflow, -fdump-tree-alias or -fdump-tree-ealias (!). Similarly, 
dropping the -mcpu=cortex-a53, or changing to -mcpu=cortex-a57.


(I have a recent build in a chroot for which -fno-strict-overflow does *not* fix 
the ICE but haven't yet figured out exactly what the difference in the chroot 
environment is.)


Moreover, preprocessing in a separate step (i.e. piping preprocessed output via 
a file with -E), also avoids the ICE. (This is hindering my efforts to reduce 
the testcase!).  So my hypothesis is that this is a front-end/preprocessor bug, 
rather than anything directly due to this commit.


The error message in full (line refs from that commit, r229479) is:
=
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function 
‘NormalizeCoeffsListx’:
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: incompatible 
types in PHI argument 0

 TypHandle NormalizeCoeffsListx ( hdC )
   ^
long int

int

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location 
references block not in block tree

l1_279 = PHI <1(28), l1_299(33)>
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid PHI 
argument


../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal compiler 
error: tree check: expected class ‘type’, have ‘declaration’ (namespace_decl) in 
useless_type_conversion_p, at gimple-expr.c:84
0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char const*, 
int, char const*)

../../gcc-fsf/gcc/tree.c:9643
0x82561b tree_class_check
../../gcc-fsf/gcc/tree.h:3042
0x82561b useless_type_conversion_p(tree_node*, tree_node*)
../../gcc-fsf/gcc/gimple-expr.c:84
0xaca043 verify_gimple_phi
../../gcc-fsf/gcc/tree-cfg.c:4673
0xaca043 verify_gimple_in_cfg(function*, bool)
../../gcc-fsf/gcc/tree-cfg.c:4967
0x9c2e0b execute_function_todo
../../gcc-fsf/gcc/passes.c:1967
0x9c360b do_per_function
../../gcc-fsf/gcc/passes.c:1659
0x9c3807 execute_todo
../../gcc-fsf/gcc/passes.c:2022
Please submit a full bug report,
with preprocessed source if appropriate.
=
which looks like an "incompatible types from PHI argument" from a first call to 
verify_gimple_phi, then a second call to verify_gimple_phi prints "invalid phi 
argument" and ICEs in the test just before possibly printing a second 
incompatible_types message.



--Alan



Re: Merge of HSA branch

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Bernd Schmidt wrote:

> On 11/05/2015 10:51 PM, Martin Jambor wrote:
> > Individual changes are described in slightly more detail in their
> > respective messages.  If you are interested in how the HSAIL
> > generation works in general, I encourage you to have a look at my
> > Cauldron slides or presentation, only very few things have changed as
> > far as the general principles are concerned.  Let me just quickly stress
> > here that we do acceleration within a single compiler, as opposed to
> > LTO-ways of all the other accelerator teams.
> 
> Realistically we're probably not going to reject this work, but I still want
> to ask whether the approach was acked by the community before you started. I'm
> really not exactly thrilled about having two different classes of backends in
> the compiler, and two different ways of handling offloading.

Realistically the other approaches werent acked either (well, implicitely
by review).  Not doing an RTL backend for NVPTX would have simplified
your life as well.  Not doing an RTL backend practically means not
going the LTO way as you couldn't easily even build a target without
RTL pieces (not sure how big a "dummy" RTL target would be).

Richard.

> > I also acknowledge that we should add HSA-specific tests to the GCC
> > testsuite but we are only now looking at how to do that and will
> > welcome any guidance in this regard.
> 
> Yeah, I was looking for any kind of new test, because...
>
> > the class of OpenMP loops we can handle well is small,
> 
> I'd appreciate more information on what this means. Any examples or
> performance numbers?
> 
> 
> Bernd
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 6/6] Make SRA replace constant-pool loads

2015-11-06 Thread Eric Botcazou
> Hmm, can you clarify, do you mean I should *not* replace constant pool
> values with their DECL_INITIAL? The attempt to substitute in the
> initial value is what leads to most of the problems. For example, in
> gnat/opt31.adb, create_access finds this expression accessing *.LC0:
> 
> MEM[(interfaces__unsigned_8[(sizetype)  opt31__messages_t___XUP>.P_BOUNDS->LB0: opt31__messages_t___XUP>.P_BOUNDS->UB0 >=  opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype)  struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> .P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}
> 
> this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
> far I haven't extended subst_constant_pool_initial to handle
> ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:
> 
> MEM[(interfaces__unsigned_8[(sizetype)  opt31__messages_t___XUP>.P_BOUNDS->LB0: opt31__messages_t___XUP>.P_BOUNDS->UB0 >=  opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype)  struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> .P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0]
> 
> because the type here has size:
> 
> MIN_EXPR <_GLOBAL.SZ2.ada_opt31 ( opt31__messages_t___XUP>.P_BOUNDS->UB0,  opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>
> 
> inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
> 4-element array (fine). Sadly while the MEM_REF
> type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
> does not

FWIW you are allowed to punt on this kind of complex expressions that appear 
only in Ada.  New optimizations are sort of allowed to work on the C family of 
languages first, and be extended or not to the rest of languages afterwards.

> One possibility is that this whole construct, ARRAY_RANGE_REF that it
> is, should mark *.LC0 in cannot_scalarize_away_bitmap.

ARRAY_RANGE_REF is only used in Ada so you can do that for now (unless this 
introduces regressions in the gnat.dg testsuite but I doubt it).

-- 
Eric Botcazou


Re: [PATCH] PR driver/67613 - spell suggestions for misspelled command line options

2015-11-06 Thread Bernd Schmidt

On 11/04/2015 03:53 PM, David Malcolm wrote:

This patch adds hints to the option-not-found error in the driver,
using the Levenshtein distance implementation posted here:
"[PATCH 0/2] Levenshtein-based suggestions (v3)"
   https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03379.html

It splits out the identifier-based implementation into a new
spellcheck-tree.c, keeping the core in spellcheck.c, since the tree
checking code needed for IDENTIFIER_POINTER etc isn't available in
the driver.


Nice. Ok.


Bernd


Re: Merge of HSA branch

2015-11-06 Thread Bernd Schmidt

On 11/05/2015 10:51 PM, Martin Jambor wrote:

Individual changes are described in slightly more detail in their
respective messages.  If you are interested in how the HSAIL
generation works in general, I encourage you to have a look at my
Cauldron slides or presentation, only very few things have changed as
far as the general principles are concerned.  Let me just quickly stress
here that we do acceleration within a single compiler, as opposed to
LTO-ways of all the other accelerator teams.


Realistically we're probably not going to reject this work, but I still 
want to ask whether the approach was acked by the community before you 
started. I'm really not exactly thrilled about having two different 
classes of backends in the compiler, and two different ways of handling 
offloading.



I also acknowledge that we should add HSA-specific tests to the GCC
testsuite but we are only now looking at how to do that and will
welcome any guidance in this regard.


Yeah, I was looking for any kind of new test, because...


the class of OpenMP loops we can handle well is small,


I'd appreciate more information on what this means. Any examples or 
performance numbers?



Bernd


Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-06 Thread Pedro Alves
On 11/06/2015 01:56 AM, Jonathan Wakely wrote:
> On 5 November 2015 at 23:31, Daniel Gutson

>> The issue is, as I understand it, to do the actual work of operator
>> new, i.e. allocate memory. It should force
>> us to copy most of the code of the original code of operator new,
>> which may change on new versions of the
>> STL, forcing us to keep updated.
> 
> It can just call malloc, and the replacement operator delete can call free.
> 
> That is very unlikely to need to change (which is corroborated by the
> fact that the default definitions in libsupc++ change very rarely).

Or perhaps libsupc++ could provide the default operator new under
a __default_operator_new alias or some such, so that the user-defined
replacement can fallback to calling it.  Likewise for op delete.

Thanks,
Pedro Alves



Re: [hsa 9/12] Small alloc-pool fix

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Martin Liška wrote:

> On 11/06/2015 10:00 AM, Richard Biener wrote:
> > On Thu, 5 Nov 2015, Martin Jambor wrote:
> > 
> >> Hi,
> >>
> >> we use C++ new operators based on alloc-pools a lot in the subsequent
> >> patches and realized that on the current trunk, such new operators
> >> would needlessly call the placement ::new operator within the allocate
> >> method of pool-alloc.  Fixed below by providing a new allocation
> >> method which does not call placement new, which is only safe to use
> >> from within a new operator.
> >>
> >> The patch also fixes the slightly weird two parameter operator new
> >> (which we do not use in HSA backend) so that it does not do the same.
> > 
> 
> Hi.
> 
> > Why do you need to add the pointer variant then?
> 
> You are right, we originally used the variant in the branch, but it was 
> eventually
> left.
> 
> > 
> > Also isn't the issue with allocate() that it does
> > 
> > return ::new (m_allocator.allocate ()) T ();
> > 
> > which 1) value-initializes and 2) doesn't even work with types like
> > 
> > struct T { T(int); };
> > 
> > thus types without a default constructor.
> 
> You are right, it produces compilation error.
> 
> > 
> > I think the allocator was poorly C++-ified without updating the
> > specification for the cases it is supposed to handle.  And now
> > we have C++ uses that are not working because the allocator is
> > broken.
> > 
> > An incrementally better version (w/o fixing the issue with
> > types w/o default constructor) is
> > 
> > return ::new (m_allocator.allocate ()) T;
> 
> I've tried that, and it also calls default ctor:
> 
> ../../gcc/alloc-pool.h: In instantiation of ‘T* 
> object_allocator::allocate() [with T = et_occ]’:
> ../../gcc/alloc-pool.h:531:22:   required from ‘void* operator new(size_t, 
> object_allocator&) [with T = et_occ; size_t = long unsigned int]’
> ../../gcc/et-forest.c:449:46:   required from here
> ../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private
>et_occ ();
>^
> In file included from ../../gcc/et-forest.c:28:0:
> ../../gcc/alloc-pool.h:483:44: error: within this context
>  return ::new (m_allocator.allocate ()) T;

Yes, but it does slightly cheaper initialization of PODs

> 
> > 
> > thus default-initialize which does no initialization for PODs (without
> > array members...) which is what the old pool allocator did.
> 
> I'm not so familiar with differences related to PODs.
> 
> > 
> > To fix the new operator (how do you even call that?  does it allow
> > specifying constructor args and thus work without a default constructor?)
> > it should indeed use an allocation method not performing the placement
> > new.  But I'd call it allocate_raw rather than vallocate.
> 
> For situations where do not have a default ctor, one should you the 
> helper method defined at the end of alloc-pool.h:
> 
> template 
> inline void *
> operator new (size_t, object_allocator &a)
> {
>   return a.allocate ();
> }
> 
> For instance:
> et_occ *nw = new (et_occurrences) et_occ (2);

Oh, so it uses placement new syntax...  works for me.

> or as used in the HSA branch:
> 
> /* New operator to allocate convert instruction from pool alloc.  */
> 
> void *
> hsa_insn_cvt::operator new (size_t)
> {
>   return hsa_allocp_inst_cvt->allocate_raw ();
> }
> 
> and
> 
> cvtinsn = new hsa_insn_cvt (reg, *ptmp2);
> 
> 
> I attached patch where I rename the method as suggested.

Ok.

Thanks,
Richard.

> Thanks,
> Martin
> 
> > 
> > Thanks.
> > Richard.
> > 
> >> Thanks,
> >>
> >> Martin
> >>
> >>
> >> 2015-11-05  Martin Liska  
> >>Martin Jambor  
> >>
> >>* alloc-pool.h (object_allocator::vallocate): New method.
> >>(operator new): Call vallocate instead of allocate.
> >>(operator new): New operator.
> >>
> >>
> >> diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
> >> index 0dc05cd..46b6550 100644
> >> --- a/gcc/alloc-pool.h
> >> +++ b/gcc/alloc-pool.h
> >> @@ -483,6 +483,12 @@ public:
> >>  return ::new (m_allocator.allocate ()) T ();
> >>}
> >>  
> >> +  inline void *
> >> +  vallocate () ATTRIBUTE_MALLOC
> >> +  {
> >> +return m_allocator.allocate ();
> >> +  }
> >> +
> >>inline void
> >>remove (T *object)
> >>{
> >> @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
> >>  };
> >>  
> >>  /* Helper for classes that do not provide default ctor.  */
> >> -
> >>  template 
> >>  inline void *
> >>  operator new (size_t, object_allocator &a)
> >>  {
> >> -  return a.allocate ();
> >> +  return a.vallocate ();
> >> +}
> >> +
> >> +/* Helper for classes that do not provide default ctor.  */
> >> +template 
> >> +inline void *
> >> +operator new (size_t, object_allocator *a)
> >> +{
> >> +  return a->vallocate ();
> >>  }
> >>  
> >>  /* Hashtable mapping alloc_pool names to descriptors.  */
> >>
> >>
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [hsa 9/12] Small alloc-pool fix

2015-11-06 Thread Martin Liška
On 11/06/2015 10:00 AM, Richard Biener wrote:
> On Thu, 5 Nov 2015, Martin Jambor wrote:
> 
>> Hi,
>>
>> we use C++ new operators based on alloc-pools a lot in the subsequent
>> patches and realized that on the current trunk, such new operators
>> would needlessly call the placement ::new operator within the allocate
>> method of pool-alloc.  Fixed below by providing a new allocation
>> method which does not call placement new, which is only safe to use
>> from within a new operator.
>>
>> The patch also fixes the slightly weird two parameter operator new
>> (which we do not use in HSA backend) so that it does not do the same.
> 

Hi.

> Why do you need to add the pointer variant then?

You are right, we originally used the variant in the branch, but it was 
eventually
left.

> 
> Also isn't the issue with allocate() that it does
> 
> return ::new (m_allocator.allocate ()) T ();
> 
> which 1) value-initializes and 2) doesn't even work with types like
> 
> struct T { T(int); };
> 
> thus types without a default constructor.

You are right, it produces compilation error.

> 
> I think the allocator was poorly C++-ified without updating the
> specification for the cases it is supposed to handle.  And now
> we have C++ uses that are not working because the allocator is
> broken.
> 
> An incrementally better version (w/o fixing the issue with
> types w/o default constructor) is
> 
> return ::new (m_allocator.allocate ()) T;

I've tried that, and it also calls default ctor:

../../gcc/alloc-pool.h: In instantiation of ‘T* object_allocator::allocate() 
[with T = et_occ]’:
../../gcc/alloc-pool.h:531:22:   required from ‘void* operator new(size_t, 
object_allocator&) [with T = et_occ; size_t = long unsigned int]’
../../gcc/et-forest.c:449:46:   required from here
../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private
   et_occ ();
   ^
In file included from ../../gcc/et-forest.c:28:0:
../../gcc/alloc-pool.h:483:44: error: within this context
 return ::new (m_allocator.allocate ()) T;


> 
> thus default-initialize which does no initialization for PODs (without
> array members...) which is what the old pool allocator did.

I'm not so familiar with differences related to PODs.

> 
> To fix the new operator (how do you even call that?  does it allow
> specifying constructor args and thus work without a default constructor?)
> it should indeed use an allocation method not performing the placement
> new.  But I'd call it allocate_raw rather than vallocate.

For situations where do not have a default ctor, one should you the helper 
method defined
at the end of alloc-pool.h:

template 
inline void *
operator new (size_t, object_allocator &a)
{
  return a.allocate ();
}

For instance:
et_occ *nw = new (et_occurrences) et_occ (2);

or as used in the HSA branch:

/* New operator to allocate convert instruction from pool alloc.  */

void *
hsa_insn_cvt::operator new (size_t)
{
  return hsa_allocp_inst_cvt->allocate_raw ();
}

and

cvtinsn = new hsa_insn_cvt (reg, *ptmp2);


I attached patch where I rename the method as suggested.

Thanks,
Martin

> 
> Thanks.
> Richard.
> 
>> Thanks,
>>
>> Martin
>>
>>
>> 2015-11-05  Martin Liska  
>>  Martin Jambor  
>>
>>  * alloc-pool.h (object_allocator::vallocate): New method.
>>  (operator new): Call vallocate instead of allocate.
>>  (operator new): New operator.
>>
>>
>> diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
>> index 0dc05cd..46b6550 100644
>> --- a/gcc/alloc-pool.h
>> +++ b/gcc/alloc-pool.h
>> @@ -483,6 +483,12 @@ public:
>>  return ::new (m_allocator.allocate ()) T ();
>>}
>>  
>> +  inline void *
>> +  vallocate () ATTRIBUTE_MALLOC
>> +  {
>> +return m_allocator.allocate ();
>> +  }
>> +
>>inline void
>>remove (T *object)
>>{
>> @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
>>  };
>>  
>>  /* Helper for classes that do not provide default ctor.  */
>> -
>>  template 
>>  inline void *
>>  operator new (size_t, object_allocator &a)
>>  {
>> -  return a.allocate ();
>> +  return a.vallocate ();
>> +}
>> +
>> +/* Helper for classes that do not provide default ctor.  */
>> +template 
>> +inline void *
>> +operator new (size_t, object_allocator *a)
>> +{
>> +  return a->vallocate ();
>>  }
>>  
>>  /* Hashtable mapping alloc_pool names to descriptors.  */
>>
>>
> 

diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
index 0dc05cd..8b8c023 100644
--- a/gcc/alloc-pool.h
+++ b/gcc/alloc-pool.h
@@ -477,11 +477,22 @@ public:
 m_allocator.release_if_empty ();
   }
 
+  /* Allocate memory for instance of type T and call a default constructor.  */
+
   inline T *
   allocate () ATTRIBUTE_MALLOC
   {
 return ::new (m_allocator.allocate ()) T ();
   }
+  /* Allocate memory for instance of type T and return void * that
+ could be used in situations where a default constructor is not provided
+ by the class T.  */
+
+  inline void *
+  allocate_raw () ATTRIBUTE_MALLOC
+  {
+return m_allocator.allocat

RE: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-06 Thread VandeVondele Joost
Thanks Paul. I believe PR53852 won't be fixed on 4.9/5 as it seems to depend on 
the recent graphite cleanup work and recent isl. As such I'll commit to trunk 
only.

[PATCH] Make BB vectorizer work on sub-BBs

2015-11-06 Thread Richard Biener

The following patch makes the BB vectorizer not only handle BB heads
(until the first stmt with a data reference it cannot handle) but
arbitrary regions in a BB separated by such stmts.

This improves the number of BB vectorizations from 469 to 556
in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray 
1x481.wrf failing both patched and unpatched (have to update my
config used for such experiments it seems ...)

Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.

I'm currently re-testing for a cosmetic change I made when writing
the changelog.

I expected (and there are) some issues with compile-time.  Left
is unpatched and right is patched.

'403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
'483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
'416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
'435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
'447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
'453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
'454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
'481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)

other benchmarks are unchanged.  I'm double-checking now that a followup
patch I have which re-implements BB vectorization dependence checking
fixes this (that's the only quadraticness I know of).

Richard.

2015-11-06  Richard Biener  

* tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
members.
(vect_stmt_in_region_p): Declare.
* tree-vect-slp.c (new_bb_vec_info): Work on a region.
(destroy_bb_vec_info): Likewise.
(vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
(vect_get_and_check_slp_defs): Likewise.
(vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
(vect_slp_bb): Likewise.
* tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
in terms of vect_stmt_in_region_p.
(vect_pattern_recog): Iterate over the BB region.
* tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
* tree-vectorizer.c (vect_stmt_in_region_p): New function.
(pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.

* config/i386/i386.c: Include gimple-iterator.h.
* config/aarch64/aarch64.c: Likewise.

* gcc.dg/vect/bb-slp-38.c: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h.orig  2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vectorizer.h   2015-11-05 13:20:58.385786476 +0100
*** nested_in_vect_loop_p (struct loop *loop
*** 390,395 
--- 390,397 
  typedef struct _bb_vec_info : public vec_info
  {
basic_block bb;
+   gimple_stmt_iterator region_begin;
+   gimple_stmt_iterator region_end;
  } *bb_vec_info;
  
  #define BB_VINFO_BB(B)   (B)->bb
*** void vect_pattern_recog (vec_info *);
*** 1085,1089 
--- 1087,1092 
  /* In tree-vectorizer.c.  */
  unsigned vectorize_loops (void);
  void vect_destroy_datarefs (vec_info *);
+ bool vect_stmt_in_region_p (vec_info *, gimple *);
  
  #endif  /* GCC_TREE_VECTORIZER_H  */
Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c.orig2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
*** vect_get_and_check_slp_defs (vec_info *v
*** 209,215 
unsigned int i, number_of_oprnds;
gimple *def_stmt;
enum vect_def_type dt = vect_uninitialized_def;
-   struct loop *loop = NULL;
bool pattern = false;
slp_oprnd_info oprnd_info;
int first_op_idx = 1;
--- 209,214 
*** vect_get_and_check_slp_defs (vec_info *v
*** 218,226 
bool first = stmt_num == 0;
bool second = stmt_num == 1;
  
-   if (is_a  (vinfo))
- loop = LOOP_VINFO_LOOP (as_a  (vinfo));
- 
if (is_gimple_call (stmt))
  {
number_of_oprnds = gimple_call_num_args (stmt);
--- 217,222 
*** again:
*** 276,286 
   from the pattern.  Check that all the stmts of the node are in the
   pattern.  */
if (def_stmt && gimple_bb (def_stmt)
!   && ((is_a  (vinfo)
!  && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
! || (is_a  (vinfo)
! && gimple_bb (def_stmt) == as_a  (vinfo)->bb
! && gimple_code (def_stmt) != GIMPLE_PHI))
&& vinfo_for_stmt (def_stmt)
&& STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
  && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
--- 272,278 
   from the pattern.  Check that all the stmts of the node are in the
   pattern.  */
if (def_stmt && gimp

Re: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-06 Thread Paul Richard Thomas
Dear Joost,

These two testcases look fine to me. PR53852 is marked as being a
4.9,5,6 regression. If I have understood correctly, it has only been
fixed on trunk. Do you know if there is any intention to fix it on the
other branches?

OK for trunk and, subject to the previous question being answered
affirmatively, for 4.9/5 branches.

Thanks

Paul

On 6 November 2015 at 08:04, VandeVondele  Joost
 wrote:
>> Attached testcases for two previously fixed PRs (and thanks to Dominique who 
>> was quicker for PR67982).
> ping ?



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-06 Thread Christophe Lyon
On 4 November 2015 at 16:37, James Greenhalgh  wrote:
>
> On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
>> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
>> >+ i = tmp_i; <- Should be cleaned up
>>
>> Maybe reword as "Subsequent passes are expected to clean up the
>> extra moves", otherwise it sounds like a TODO item.
>>
>> >+   read back in anotyher SET, as might occur in a swap idiom or
>>
>> Typo.
>>
>> >+  if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
>> >+{
>> >+  /* The write to targets[i] is only live until the read
>> >+ here.  As the condition codes match, we can propagate
>> >+ the set to here.  */
>> >+   new_val = SET_SRC (single_set (unmodified_insns[i]));
>> >+}
>>
>> Shouldn't use braces around single statements (also goes for the
>> surrounding for loop).
>>
>> >+  /* We must have at least one real insn to convert, or there will
>> >+ be trouble!  */
>> >+  unsigned count = 0;
>>
>> The comment seems a bit strange in this context - I think it's left
>> over from the earlier version?
>>
>> As far as I'm concerned this is otherwise ok.
>
> Thanks,
>
> I've updated the patch with those issues addressed. As the cost model was
> controversial in an earlier revision, I'll leave this on list for 24 hours
> and, if nobody jumps in to object, commit it tomorrow.
>
> I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
> just to check that I got the braces right, with no issues.
>

The new test does not pass on some ARM configurations, I filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

Christophe.

> Thanks,
> James
>
> ---
> gcc/
>
> 2015-11-04  James Greenhalgh  
>
> * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
> (noce_convert_multiple_sets): Likewise.
> (noce_process_if_block): Call them.
>
> gcc/testsuite/
>
> 2015-11-04  James Greenhalgh  
>
> * gcc.dg/ifcvt-4.c: New.
>


[PATCH] Fix PR ipa/68035

2015-11-06 Thread Martin Liška
Hello.

Following patch triggers hash calculation of items (functions and variables)
in situations where LTO mode is not utilized.

Patch survives regression tests and bootstraps on x86_64-linux-pc.

Ready for trunk?
Thanks,
Martin
>From 62266e21a89777c6dbd680f7c87f15abe603c024 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 18:31:31 +0100
Subject: [PATCH] Fix PR ipa/68035

gcc/testsuite/ChangeLog:

2015-11-05  Martin Liska  

	* gcc.dg/ipa/pr68035.c: New test.

gcc/ChangeLog:

2015-11-05  Martin Liska  

	PR ipa/68035
	* ipa-icf.c (sem_item_optimizer::build_graph): Force building
	of a hash value for an item if we are not running in LTO mode.
---
 gcc/ipa-icf.c  |   4 ++
 gcc/testsuite/gcc.dg/ipa/pr68035.c | 108 +
 2 files changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr68035.c

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 7bb3af5..09c42a1 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -2744,6 +2744,10 @@ sem_item_optimizer::build_graph (void)
 {
   sem_item *item = m_items[i];
   m_symtab_node_map.put (item->node, item);
+
+  /* Initialize hash values if we are not in LTO mode.  */
+  if (!in_lto_p)
+	item->get_hash ();
 }
 
   for (unsigned i = 0; i < m_items.length (); i++)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr68035.c b/gcc/testsuite/gcc.dg/ipa/pr68035.c
new file mode 100644
index 000..a8cb779
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr68035.c
@@ -0,0 +1,108 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-icf"  } */
+
+static const unsigned short list_0[] = { 777, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_1[] = { 0, 777, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_2[] = { 0, 1, 777, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_3[] = { 0, 1, 2, 777, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_4[] = { 0, 1, 2, 3, 777, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_5[] = { 0, 1, 2, 3, 4, 777, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_6[] = { 0, 1, 2, 3, 4, 5, 777, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_7[] = { 0, 1, 2, 3, 4, 5, 6, 777, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_8[] = { 0, 1, 2, 3, 4, 5, 6, 7, 777, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_9[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 777, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_10[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 777, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_11[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 777, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_12[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 777, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_13[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 777, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_14[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 777, 15, 1

<    1   2   3   >