Re: [PATCH, rs6000] Fix PR target/80210: ICE in extract_insn

2017-08-18 Thread Peter Bergner
On 8/18/17 6:27 PM, Segher Boessenkool wrote:
> On Thu, Aug 17, 2017 at 07:02:14PM -0500, Peter Bergner wrote:
>> This is also broken in GCC 7, GCC 6 and GCC 5.  Ok for those after this
>> has been on trunk for a little while and assuming testing passes?
> 
> Okay for trunk and all branches.  Thanks!

Ok, committed to trunk and the back port bootstraps/regtesting
are running.  If they're clean, I'll commit this sometime next
week after the trunk version has baked a little bit.
Thanks!

Peter



[PATCH 1/2] c-family/c/c++: pass optional vec to c-format.c

2017-08-18 Thread David Malcolm
This patch passes along the vec of argument locations
at a callsite from the C frontend to check_function_arguments and
from there to c-format.c, so that we can underline the pertinent
argument to mismatched format codes even for tree codes like decls
and constants which lack a location_t for their usage sites.

This takes e.g.:

printf("hello %i %i %i ", foo, bar, baz);
 ~^
 %s

to:

printf("hello %i %i %i ", foo, bar, baz);
 ~^~~~
 %s

which is useful for cases where there's more than one variadic argument.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c-family/ChangeLog:
* c-common.c (check_function_arguments): Add "arglogs" param; pass
it to check_function_format.
* c-common.h (check_function_arguments): Add vec *
param.
(check_function_format): Likewise.
* c-format.c (struct format_check_context): Add field "arglocs".
(check_function_format): Add param "arglocs"; pass it to
check_format_info.
(check_format_info):  Add param "arglocs"; use it to initialize
new field of format_ctx.
(check_format_arg): Pass format_ctx->arglocs to new param of
check_format_info_main.
(class argument_parser): New field "arglocs".
(argument_parser::argument_parser): Add "arglocs_" param and use
it to initialize new field.
(argument_parser::check_argument_type): Pass new arglocs field to
check_format_types.
(check_format_info_main): Add param "arglocs", and use it when
constructing arg_parser.
(check_format_types): Add param "arglocs"; use it if non-NULL when
!EXPR_HAS_LOCATION (cur_param) to get at location information.

gcc/c/ChangeLog:
* c-typeck.c (build_function_call_vec): Pass arg_loc to call
to check_function_arguments.

gcc/cp/ChangeLog:
* call.c (build_over_call): Pass NULL for new parameter to
check_function_arguments.
* typeck.c (cp_build_function_call_vec): Likewise.

gcc/testsuite/ChangeLog:
* gcc.dg/format/diagnostic-ranges.c: Update expected results
to show underlining of all pertinent params.
* gcc.dg/format/pr72858.c: Likewise.
---
 gcc/c-family/c-common.c |   4 +-
 gcc/c-family/c-common.h |   4 +-
 gcc/c-family/c-format.c |  52 
 gcc/c/c-typeck.c|   2 +-
 gcc/cp/call.c   |   2 +-
 gcc/cp/typeck.c |   2 +-
 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c |  41 --
 gcc/testsuite/gcc.dg/format/pr72858.c   | 102 +---
 8 files changed, 98 insertions(+), 111 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4dc3b33..156c89d 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5539,7 +5539,7 @@ attribute_fallthrough_p (tree attr)
diagnostics.  Return true if -Wnonnull warning has been diagnosed.  */
 bool
 check_function_arguments (location_t loc, const_tree fndecl, const_tree fntype,
- int nargs, tree *argarray)
+ int nargs, tree *argarray, vec *arglocs)
 {
   bool warned_p = false;
 
@@ -5553,7 +5553,7 @@ check_function_arguments (location_t loc, const_tree 
fndecl, const_tree fntype,
   /* Check for errors in format strings.  */
 
   if (warn_format || warn_suggest_attribute_format)
-check_function_format (TYPE_ATTRIBUTES (fntype), nargs, argarray);
+check_function_format (TYPE_ATTRIBUTES (fntype), nargs, argarray, arglocs);
 
   if (warn_format)
 check_function_sentinel (fntype, nargs, argarray);
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 980d396..8e36768 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -808,7 +808,7 @@ extern tree fname_decl (location_t, unsigned, tree);
 
 extern int check_user_alignment (const_tree, bool);
 extern bool check_function_arguments (location_t loc, const_tree, const_tree,
- int, tree *);
+ int, tree *, vec *);
 extern void check_function_arguments_recurse (void (*)
  (void *, tree,
   unsigned HOST_WIDE_INT),
@@ -816,7 +816,7 @@ extern void check_function_arguments_recurse (void (*)
  unsigned HOST_WIDE_INT);
 extern bool check_builtin_function_arguments (location_t, vec,
  tree, int, tree *);
-extern void check_function_format (tree, int, tree *);
+extern void check_function_format (tree, int, tree *, vec *);
 extern bool attribute_fallthrough_p (tree);
 extern tree handle_format_attribute (tree *, tree, tree, 

[PATCH 2/2] C: use full locations within c_parser_expr_list's vec

2017-08-18 Thread David Malcolm
The previous patch uncovered a bug in how c_parser_expr_list builds the
vec: it was only using the location of the first token
within each assignment-expression in the expr-list.

This shows up in e.g. this -Wformat warning, where only part of the
2nd param is underlined:

   printf("hello %i", (long)0);
 ~^   ~
 %li

This patch fixes c_parser_expr_list to use the full range of
each assignment-expression in the list for the vec, so
that for the above we print:

   printf("hello %i", (long)0);
 ~^   ~~~
 %li

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c/ChangeLog:
* c-parser.c (c_parser_expr_list): Use c_expr::get_location ()
rather than peeking the location of the first token.
* c-tree.h (c_expr::get_location): New method.

gcc/testsuite/ChangeLog:
* gcc.dg/format/diagnostic-ranges.c (test_mismatching_types):
Update expected result to show all of "(long)0" being underlined.
* gcc.dg/plugin/diagnostic-test-string-literals-1.c
(test_multitoken_macro): Update expected underlining.
---
 gcc/c/c-parser.c  | 11 +--
 gcc/c/c-tree.h|  8 
 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c   |  2 +-
 .../gcc.dg/plugin/diagnostic-test-string-literals-1.c |  2 +-
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1402ba6..dfc75e0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -8933,7 +8933,6 @@ c_parser_expr_list (c_parser *parser, bool convert_p, 
bool fold_p,
   vec *ret;
   vec *orig_types;
   struct c_expr expr;
-  location_t loc = c_parser_peek_token (parser)->location;
   unsigned int idx = 0;
 
   ret = make_tree_vector ();
@@ -8946,14 +8945,14 @@ c_parser_expr_list (c_parser *parser, bool convert_p, 
bool fold_p,
 c_parser_check_literal_zero (parser, literal_zero_mask, 0);
   expr = c_parser_expr_no_commas (parser, NULL);
   if (convert_p)
-expr = convert_lvalue_to_rvalue (loc, expr, true, true);
+expr = convert_lvalue_to_rvalue (expr.get_location (), expr, true, true);
   if (fold_p)
 expr.value = c_fully_fold (expr.value, false, NULL);
   ret->quick_push (expr.value);
   if (orig_types)
 orig_types->quick_push (expr.original_type);
   if (locations)
-locations->safe_push (loc);
+locations->safe_push (expr.get_location ());
   if (sizeof_arg != NULL
   && expr.original_code == SIZEOF_EXPR)
 {
@@ -8963,19 +8962,19 @@ c_parser_expr_list (c_parser *parser, bool convert_p, 
bool fold_p,
   while (c_parser_next_token_is (parser, CPP_COMMA))
 {
   c_parser_consume_token (parser);
-  loc = c_parser_peek_token (parser)->location;
   if (literal_zero_mask)
c_parser_check_literal_zero (parser, literal_zero_mask, idx + 1);
   expr = c_parser_expr_no_commas (parser, NULL);
   if (convert_p)
-   expr = convert_lvalue_to_rvalue (loc, expr, true, true);
+   expr = convert_lvalue_to_rvalue (expr.get_location (), expr, true,
+true);
   if (fold_p)
expr.value = c_fully_fold (expr.value, false, NULL);
   vec_safe_push (ret, expr.value);
   if (orig_types)
vec_safe_push (orig_types, expr.original_type);
   if (locations)
-   locations->safe_push (loc);
+   locations->safe_push (expr.get_location ());
   if (++idx < 3
  && sizeof_arg != NULL
  && expr.original_code == SIZEOF_EXPR)
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 92bcc70..5182cc5 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -147,6 +147,14 @@ struct c_expr
   location_t get_start () const { return src_range.m_start; }
   location_t get_finish () const { return src_range.m_finish; }
 
+  location_t get_location () const
+  {
+if (CAN_HAVE_LOCATION_P (value))
+  return EXPR_LOCATION (value);
+else
+  return make_location (get_start (), get_start (), get_finish ());
+  }
+
   /* Set the value to error_mark_node whilst ensuring that src_range
  is initialized.  */
   void set_error ()
diff --git a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c 
b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
index bc6f665..e56e159 100644
--- a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
+++ b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
@@ -25,7 +25,7 @@ void test_mismatching_types (const char *msg)
   printf("hello %i", (long)0);  /* { dg-warning "format '%i' expects argument 
of type 'int', but argument 2 has type 'long int' " } */
 /* { dg-begin-multiline-output "" }
printf("hello %i", (long)0);
- ~^   ~
+ ~^   ~~~
  %li
{ dg-end-multiline-output "" } */
 }
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c 

Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-18 Thread Segher Boessenkool
On Fri, Aug 18, 2017 at 04:49:44PM -0500, Steven Munroe wrote:
> On Thu, 2017-08-17 at 00:47 -0500, Segher Boessenkool wrote:
> > On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote:
> > > This it part 3/3 for contributing PPC64LE support for X86 SSE
> > > instrisics. This patch includes testsuite/gcc.target tests for the
> > > intrinsics included by xmmintrin.h. 
> > 
> > > +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT)   \
> > 
> > Should that be UNION_TYPE?
> 
> It is spelled 'UINON_TYPE' in
> ./gcc/testsuite/gcc.target/i386/m128-check.h which the source for the
> powerpc version.
> 
> There is no obvious reason why it could not be spelled UNION_TYPE.
> Unless there is some symbol collision further up the SSE/AVX stack.
> 
> Bingo:
> 
> avx512f-helper.h:#define UNION_TYPE(SIZE, NAME) EVAL(union, SIZE, NAME)
> 
> I propose not to change this.

Heh.  Okay :-)


Segher


[committed] jit: fix segfault with autovectorization (PR tree-optimization/46805)

2017-08-18 Thread David Malcolm
On Thu, 2017-08-17 at 11:51 -0400, David Malcolm wrote:
> On Thu, 2017-08-17 at 09:15 +1200, Michael Cree wrote:
> > On Wed, Aug 16, 2017 at 10:01:57AM -0400, David Malcolm wrote:
> > > On Wed, 2017-08-16 at 21:58 +1200, Michael Cree wrote:
> > > > 
> > > > But I have hit a problem which I suspect is a bug in the gcc
> > > > optimiser.
> > > > 
> > > > In the vein of your example above, but working on uint8_t pixel
> > > > data
> > > > and adding saturation, the jit compiler segfaults in the
> > > > optimiser. I
> > > > provide below the gimple produced by the function that causes
> > > > the
> > > > problem (I presume that is more useful than the code calling
> > > > the
> > > > gcc_jit routines), 
> > > 
> > > There's actually a handy entrypoint for generating minimal
> > > reproducers
> > > for such crashes:
> > >   gcc_jit_context_dump_reproducer_to_file
> > > 
> > > https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#gcc_jit_c
> > > on
> > > text_dump_reproducer_to_file
> > > 
> > > Can you add a call to that to your code (after the context is
> > > fully
> > > populated), and see if the resulting .c file leads to the crash
> > > when
> > > run?  If so, can you post the .c file here please (or attach it
> > > to
> > > bugzilla), and hopefully I can then reproduce it at my end.
> > 
> > Attached.
> > 
> > Cheers
> > Michael.
> 
> Thanks.  I'm able to reproduce the crash using that; am
> investigating.
> 
> Dave

libgccjit ran into its own version of PR tree-optimization/46805 (seen
with the Go frontend); this patch fixes it in the same way.

Successfully bootstrapped on x86_64-pc-linux-gnu.
Takes jit.sum from 9759 passes to 9769.

Committed to trunk as r251192.

gcc/jit/ChangeLog:
PR tree-optimization/46805
* dummy-frontend.c (jit_langhook_parse_file): Handle vector types.

gcc/testsuite/ChangeLog:
PR tree-optimization/46805
* jit.dg/all-non-failing-tests.h: Add test-autovectorize.c.
* jit.dg/test-autovectorize.c: New test case.
---
 gcc/jit/dummy-frontend.c |  11 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h |  10 +
 gcc/testsuite/jit.dg/test-autovectorize.c| 375 +++
 3 files changed, 396 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-autovectorize.c

diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
index d7d2172..b217290 100644
--- a/gcc/jit/dummy-frontend.c
+++ b/gcc/jit/dummy-frontend.c
@@ -165,6 +165,17 @@ jit_langhook_parse_file (void)
 static tree
 jit_langhook_type_for_mode (machine_mode mode, int unsignedp)
 {
+  /* Build any vector types here (see PR 46805).  */
+  if (VECTOR_MODE_P (mode))
+{
+  tree inner;
+
+  inner = jit_langhook_type_for_mode (GET_MODE_INNER (mode), unsignedp);
+  if (inner != NULL_TREE)
+   return build_vector_type_for_mode (inner, mode);
+  return NULL_TREE;
+}
+
   if (mode == TYPE_MODE (float_type_node))
 return float_type_node;
 
diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h 
b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index 4af704a..acfcc40 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -50,6 +50,13 @@
 #undef create_code
 #undef verify_code
 
+/* test-autovectorize.c */
+#define create_code create_code_autovectorize
+#define verify_code verify_code_autovectorize
+#include "test-autovectorize.c"
+#undef create_code
+#undef verify_code
+
 /* test-calling-external-function.c */
 #define create_code create_code_calling_external_function
 #define verify_code verify_code_calling_external_function
@@ -267,6 +274,9 @@ const struct testcase testcases[] = {
   {"arrays",
create_code_arrays,
verify_code_arrays},
+  {"autovectorize",
+   create_code_autovectorize,
+   verify_code_autovectorize},
   {"calling_external_function",
create_code_calling_external_function,
verify_code_calling_external_function},
diff --git a/gcc/testsuite/jit.dg/test-autovectorize.c 
b/gcc/testsuite/jit.dg/test-autovectorize.c
new file mode 100644
index 000..9cfd0b2
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-autovectorize.c
@@ -0,0 +1,375 @@
+#include 
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void * user_data)
+{
+  gcc_jit_type *type_void = gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID);
+  gcc_jit_type *type_int = gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_type *type_unsigned_char = gcc_jit_context_get_type (ctxt, 
GCC_JIT_TYPE_UNSIGNED_CHAR);
+  gcc_jit_type *type_void_ptr =
+gcc_jit_type_get_pointer (type_void);
+  gcc_jit_type *type_void_ptr_ptr =
+gcc_jit_type_get_pointer (type_void_ptr);
+  gcc_jit_type *type_unsigned_char__ =
+gcc_jit_type_get_pointer (type_unsigned_char);
+  gcc_jit_field *field_x =
+gcc_jit_context_new_field (ctxt,
+   NULL, /* gcc_jit_location *loc */
+   type_int, /* gcc_jit_type *type, */
+

[committed] jit: make simpler reproducers

2017-08-18 Thread David Malcolm
The C reproducers generated by gcc_jit_context_dump_reproducer_to_file
contain numerous pointer values (from %p) to ensure uniqueness of the
identifiers, but this makes them less readable than they could be.

This patch updates reproducer::make_identifier so that the pointer
is only added if it's necessary for uniqueness.

Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r251191.

gcc/jit/ChangeLog:
* jit-recording.c (class gcc::jit::reproducer): Rename field
"m_identifiers" to "m_map_memento_to_identifier".  Add field
"m_set_identifiers" and struct hash_traits for it.
(gcc::jit::reproducer::reproducer): Update for above.
(convert_to_identifier): New function.
(gcc::jit::reproducer::ensure_identifier_is_unique): New method.
(gcc::jit::reproducer::make_identifier): Avoid appending the %p
unless necessary for uniqueness.  Update for field renaming.
(gcc::jit::reproducer::get_identifier): Update for field renaming.
---
 gcc/jit/jit-recording.c | 62 +++--
 1 file changed, 50 insertions(+), 12 deletions(-)

diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index ea4ebb1..0e7f46e0 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -236,7 +236,16 @@ class reproducer : public dump
 GNU_PRINTF(2, 3);
 
  private:
-  hash_map m_identifiers;
+  const char * ensure_identifier_is_unique (const char *candidate, void *ptr);
+
+ private:
+  hash_map m_map_memento_to_identifier;
+
+  struct hash_traits : public string_hash
+  {
+static void remove (const char *) {}
+  };
+  hash_set m_set_identifiers;
   allocator m_allocator;
 };
 
@@ -245,7 +254,8 @@ class reproducer : public dump
 reproducer::reproducer (recording::context ,
const char *filename) :
   dump (ctxt, filename, 0),
-  m_identifiers (),
+  m_map_memento_to_identifier (),
+  m_set_identifiers (),
   m_allocator ()
 {
 }
@@ -286,6 +296,35 @@ reproducer::write_args (const vec  
)
 }
 }
 
+/* Ensure that STR is a valid C identifier by overwriting
+   any invalid chars in-place with underscores.
+
+   This doesn't special-case the first character.  */
+
+static void
+convert_to_identifier (char *str)
+{
+  for (char *p = str; *p; p++)
+if (!ISALNUM (*p))
+  *p = '_';
+}
+
+/* Given CANDIDATE, a possible C identifier for use in a reproducer,
+   ensure that it is unique within the generated source file by
+   appending PTR to it if necessary.  Return the resulting string.
+
+   The reproducer will eventually clean up the buffer in its dtor.  */
+
+const char *
+reproducer::ensure_identifier_is_unique (const char *candidate, void *ptr)
+{
+  if (m_set_identifiers.contains (candidate))
+candidate = m_allocator.xstrdup_printf ("%s_%p", candidate, ptr);
+  gcc_assert (!m_set_identifiers.contains (candidate));
+  m_set_identifiers.add (candidate);
+  return candidate;
+}
+
 /* Generate a C identifier for the given memento, associating the generated
buffer with the memento (for future calls to get_identifier et al).
 
@@ -293,21 +332,20 @@ reproducer::write_args (const vec  
)
 const char *
 reproducer::make_identifier (recording::memento *m, const char *prefix)
 {
-  char *result;
+  const char *result;
   if (strlen (m->get_debug_string ()) < 100)
 {
-  result = m_allocator.xstrdup_printf ("%s_%s_%p",
-  prefix,
-  m->get_debug_string (),
-  (void *) m);
-  for (char *p = result; *p; p++)
-   if (!ISALNUM (*p))
- *p = '_';
+  char *buf = m_allocator.xstrdup_printf ("%s_%s",
+ prefix,
+ m->get_debug_string ());
+  convert_to_identifier (buf);
+  result = buf;
 }
   else
 result = m_allocator.xstrdup_printf ("%s_%p",
 prefix, (void *) m);
-  m_identifiers.put (m, result);
+  result = ensure_identifier_is_unique (result, m);
+  m_map_memento_to_identifier.put (m, result);
   return result;
 }
 
@@ -350,7 +388,7 @@ reproducer::get_identifier (recording::memento *m)
 if (!loc->created_by_user ())
   return "NULL";
 
-  const char **slot = m_identifiers.get (m);
+  const char **slot = m_map_memento_to_identifier.get (m);
   if (!slot)
 {
   get_context ().add_error (NULL,
-- 
1.8.5.3



Re: [PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-18 Thread Segher Boessenkool
On Thu, Aug 17, 2017 at 08:40:34PM -0500, Steven Munroe wrote:
> > > +/* Convert the lower SPFP value to a 32-bit integer according to the 
> > > current
> > > +   rounding mode.  */
> > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > > __artificial__))
> > > +_mm_cvtss_si32 (__m128 __A)
> > > +{
> > > +  __m64 res = 0;
> > > +#ifdef _ARCH_PWR8
> > > +  __m128 vtmp;
> > > +  __asm__(
> > > +  "xxsldwi %x1,%x2,%x2,3;\n"
> > > +  "xscvspdp %x1,%x1;\n"
> > > +  "fctiw  %1,%1;\n"
> > > +  "mfvsrd  %0,%x1;\n"
> > > +  : "=r" (res),
> > > + "=" (vtmp)
> > > +  : "wa" (__A)
> > > +  : );
> > > +#endif
> > > +  return (res);
> > > +}
> > 
> > Maybe it could do something better than return the wrong answer for non-p8?
> 
> Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer
> would go through storage. But that is not the worst of it.

Float to int conversion goes trough storage on older systems, too.

> The semantic of cvtss requires rint or llrint. But __builtin_rint will
> generate a call to libm unless we assert -ffast-math.

Yeah, we should fix that some day.  If we can.

> And we don't have
> builtins to generate fctiw/fctid directly.

Yup.  Well, __builtin_rint*, but that currently calls out to libm.

> So I will add the #else using __builtin_rint if that libm dependency is
> ok (this will pop in the DG test for older machines.

Another option is to not support this intrinsic for < POWER8.

I don't have a big (or well-informed) opinion on which it best; but I
doubt always returning 0 is the best we can do ;-)


Segher


Re: [PATCH, rs6000] Fix PR target/80210: ICE in extract_insn

2017-08-18 Thread Segher Boessenkool
On Thu, Aug 17, 2017 at 07:02:14PM -0500, Peter Bergner wrote:
> PR target/80210 exposes a problem in rs6000_set_current_function() where
> is fails to correctly clear the rs6000_previous_fndecl cache correctly.
> With the test case, we notice that rs6000_previous_fndecl is set (when it
> shouldn't be) and we end up restoring options from it.  In this case,
> we end up disabling HW sqrt (because of the pragma) when we earlier
> decided we could generate HW sqrts which leads to an ICE.
> 
> The current code in rs6000_set_current_function() is kind of a rats nest,
> so I threw it out and rewrote it, modeling it after how S390 and i386
> handle it, which correctly clears the *_previous_fndecl caches.
> It also makes the code much more readable in my view.

Yeah, it almost looks easy this way.  Treacherous :-)

> This passed bootstrap and regtesting with no regressions and it fixes
> the ICE.  Ok for trunk?
> 
> This is also broken in GCC 7, GCC 6 and GCC 5.  Ok for those after this
> has been on trunk for a little while and assuming testing passes?

Okay for trunk and all branches.  Thanks!


Segher


> gcc/
>   * config/rs6000/rs6000.c (rs6000_activate_target_options): New function.
>   (rs6000_set_current_function): Rewrite function to use it.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/pr80210.c: New test.


Re: [PATCH], Delete some PowerPC debug switches

2017-08-18 Thread Segher Boessenkool
Hi!

On Thu, Aug 17, 2017 at 07:48:54PM -0400, Michael Meissner wrote:
> This patch deletes some of the debug switches that I've added over the years,
> and now don't use any more.
> 
> I did bootstrap builds and make check runs on a little endian power8 system.
> There were no regressions.  Can I check this into the trunk?

Maybe some (or all) of them can be deleted, not leaving a stub?  The
options were undocumented after all, no one should have been using them.

Leaving stubs doesn't hurt much of course, just a bit of clutter.

> +; This option existed in the past, but it hadn't been used in awhile
>  mallow-df-permute
> -Target Undocumented Var(TARGET_ALLOW_DF_PERMUTE) Save
> -; Allow permutation of DF/DI vectors
> +Target RejectNegative Undocumented Ignore

That comment isn't very enlightening, just use the same text as for
the others?

The patch is okay for trunk with or without that change.  Thanks!


Segher


Re: [PATCH] Fix fallout from VRP strict-overflow changes

2017-08-18 Thread Andreas Schwab
On Aug 17 2017, Richard Biener <rguent...@suse.de> wrote:

> I was notifed I broke proper handling of undefined overflow in
> multiplicative ops handling.  The following resurrects previous
> behavior (and adds a testcase).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

This breaks gfortran.dg/alloc_comp_auto_array_2.f90 on aarch64 with
-mabi=ilp32 (only for -O3):

FAIL: gfortran.dg/alloc_comp_auto_array_2.f90   -O3 -g  (test for excess errors)
Excess errors:
/opt/gcc/gcc-20170818/gcc/testsuite/gfortran.dg/alloc_comp_auto_array_2.f90:33:0:
 Warning: '__builtin_memcpy' specified size between 2147483648 and 4294967295 
exceeds maximum object size 2147483647 [-Wstringop-overflow=]

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-18 Thread Steven Munroe
On Thu, 2017-08-17 at 00:47 -0500, Segher Boessenkool wrote:
> On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote:
> > This it part 3/3 for contributing PPC64LE support for X86 SSE
> > instrisics. This patch includes testsuite/gcc.target tests for the
> > intrinsics included by xmmintrin.h. 
> 
> > +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT) \
> 
> Should that be UNION_TYPE?

It is spelled 'UINON_TYPE' in
./gcc/testsuite/gcc.target/i386/m128-check.h which the source for the
powerpc version.

There is no obvious reason why it could not be spelled UNION_TYPE.
Unless there is some symbol collision further up the SSE/AVX stack.

Bingo:

avx512f-helper.h:#define UNION_TYPE(SIZE, NAME) EVAL(union, SIZE, NAME)

I propose not to change this.




Re: [PATCH, rs6000] testcase coverage for vec_perm built-ins

2017-08-18 Thread Segher Boessenkool
Hi Will,

On Thu, Aug 17, 2017 at 09:19:23AM -0500, Will Schmidt wrote:
> Add some Testcase coverage for the vector permute intrinsics.
> 
> Tested across power platforms.  OK for trunk?

>   * gcc.target/powerpc/fold-vec-perm-char.c: New.
>   * gcc.target/powerpc/fold-vec-perm-double.c: New.
>   * gcc.target/powerpc/fold-vec-perm-float.c: New.
>   * gcc.target/powerpc/fold-vec-perm-int.c: New.
>   * gcc.target/powerpc/fold-vec-perm-longlong.c: New.
>   * gcc.target/powerpc/fold-vec-perm-pixel.c: New.
>   * gcc.target/powerpc/fold-vec-perm-short.c: New.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-double.c
> @@ -0,0 +1,16 @@
> +/* Verify that overloaded built-ins for vec_perm with 
> +   double inputs produce the right results.  */

That suggests it is a run test, but it's not.  s/results/code/ maybe?
(Same in other tests).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-float.c
> @@ -0,0 +1,16 @@
> +/* Verify that overloaded built-ins for vec_perm with float
> +   inputs produce the right results.  */
> +
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-maltivec -O2" } */

vsx vs. altivec again.  You probably just need to add a comment what this
is about, or you can use -mvsx instead.

> new file mode 100644
> index 000..9f5c786
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-pixel.c
> @@ -0,0 +1,16 @@
> +/* Verify that overloaded built-ins for vec_perm with pixel
> +   inputs produce the right results.  */
> +
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-mvsx -O2" } */

Why vsx for pixel?  It's an altivec thing.


Segher


Re: [PATCH, rs6000] testcase coverage for vec_perm built-ins

2017-08-18 Thread Segher Boessenkool
On Thu, Aug 17, 2017 at 09:19:17AM -0500, Will Schmidt wrote:
> Add testcase coverage for the vec_ld intrinsic builtins.
> 
> Tested across power platforms (p6 and newer). OK for trunk?

>   * gcc.target/powerpc/fold-vec-ld-char.c: New.
>   * gcc.target/powerpc/fold-vec-ld-double.c: New.
>   * gcc.target/powerpc/fold-vec-ld-float.c: New.
>   * gcc.target/powerpc/fold-vec-ld-int.c: New.
>   * gcc.target/powerpc/fold-vec-ld-longlong.c: New.
>   * gcc.target/powerpc/fold-vec-ld-short.c: New.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-ld-float.c
> @@ -0,0 +1,23 @@
> +/* Verify that overloaded built-ins for vec_ld with float
> +   inputs produce the right results.  */
> +
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-maltivec -O2" } */

This is vsx_ok but you're only using -maltivec?  Should it use
altivec_ok instead?

> @@ -0,0 +1,28 @@
> +/* Verify that overloaded built-ins for vec_ld* with long long
> +   inputs produce the right results.  */
> +
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mvsx -mpower8-vector -O2" } */

-mvsx is redundant with -mpower8-vector; remove -mvsx?

Okay for trunk with those two things taken care of.  Thanks,


Segher


Re: [PATCH, rs6000] Add testcase coverage for vec_sums built-in

2017-08-18 Thread Segher Boessenkool
On Thu, Aug 17, 2017 at 09:18:54AM -0500, Will Schmidt wrote:
> Add some testcase coverage for the vec_sums() built-in.
> 
> Tested across power platforms (p6 and newer). OK for trunk?

Okay, thanks!


Segher


>   * gcc.target/powerpc/fold-vec-sums-int.c: New.


Re: [PATCH], Enable -mfloat128 by default on PowerPC VSX systems

2017-08-18 Thread Segher Boessenkool
On Wed, Aug 16, 2017 at 06:59:50PM -0400, Michael Meissner wrote:
> This patch enables -mfloat128 to be the default on PowerPC Linux VSX systems.
> 
> This patch depends on the libquadmatch/81848 patch being approved and
> installed:
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00977.html

That patch is still waiting, but I'll review this one already.

> I've checked this on a big endian power7 system (both 32-bit and 64-bit) and a
> little endian power8 system.

It may be good to test it on a system without VSX as well, if you can?

> +The default for @option{-mfloat128} is enabled on PowerPC Linux
> +systems using the VSX instruction set, and disabled on other systems.
> +
> +The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7},
> +@option{-mcpu=power8}) must be enabled to use the IEEE 128-bit
> +floating point support.  The IEEE 128-bit floating point support only
> +works on PowerPC Linux systems.

Maybe swap these two?  More logical that way I think.

> +The default for @option{-mfloat128-hardware} is enabled on PowerPC
> +Linux systems using the ISA 3.0 instruction set, and disabled on other
> +systems.

You cannot enable it without ISA 3.0, it looks like it does not say that?


The patch looks fine (maybe improve the doc a bit), it's approved for
trunk.  Thanks!


Segher


libgo patch commited: Don't use little-endian code to dump big-endian PPC regs

2017-08-18 Thread Ian Lance Taylor
According to GCC PR 81893 the code that dumps the registers for PPC
only works for little-endian.  This patch fixes it to only be used in
that case.  Bootstrapped on x86_64-pc-linux-gnu, for what that's
worth.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251182)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-28e49825162465172ed706283628bf5cc1996260
+2c4a2bd826e58c8c8c51b9196c8d2c67abc4037e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-signal.c
===
--- libgo/runtime/go-signal.c   (revision 251127)
+++ libgo/runtime/go-signal.c   (working copy)
@@ -343,7 +343,7 @@ dumpregs(siginfo_t *info __attribute__((
   #endif
 #endif
 
-#ifdef __PPC__
+#if defined(__PPC__) && defined(__LITTLE_ENDIAN__)
   #ifdef __linux__
  {
mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
@@ -359,6 +359,9 @@ dumpregs(siginfo_t *info __attribute__((
runtime_printf("xer %X\n", m->regs->xer);
  }
   #endif
+#endif
+
+#ifdef __PPC__
   #ifdef _AIX
  {
mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;


Re: std::list optimizations

2017-08-18 Thread Jonathan Wakely

On 28/07/17 18:42 +0200, François Dumont wrote:

Hi

   Completing execution of the testsuite revealed a bug. So here is 
the correct version of this patch.


François

On 21/07/2017 19:14, François Dumont wrote:

Hi

   Here is a proposal for 2 optimizations in the std::list 
implementation.


   Optimization on the move constructor taking an allocator for 
always equal allocators. Compare to the version in my previous 
std::list patch I am now doing it at std::list level rather than at 
_List_base level. This way we won't instantiate the insert call and 
we won't check for empty list when the allocator always compare 
equal.


   2nd optimization, I replace the _S_distance method by the 
std::distance algo which benefit from the nice [begin(), end()) 
range optimization when cxx11 abi is being used.


   Note that I am proposing the 2 change in 1 patch to save some 
review time but I can commit those separately.


Tested under x86_64 Linux normal mode.


   * include/bits/stl_list.h
   (_List_base<>::_S_distance): Remove.
   (_List_impl(_List_impl&&, _Node_alloc_type&&)): New.
   (_List_base(_List_base&&, _Node_alloc_type&&)): Use latter.
   (_List_base(_Node_alloc_type&&)): New.
   (_List_base<>::_M_distance, _List_base<>::_M_node_count): Move...
   (list<>::_M_distance, list<>::_M_node_count): ...here. Replace 
calls to

   _S_distance with calls to std::distance.
   (list(list&&, const allocator_type&, true_type)): New.
   (list(list&&, const allocator_type&, false_type)): New.
   (list(list&&, const allocator_type&)): Adapt to call latters.

Ok to commit ?

François






diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index cef94f7..eb71a5e 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -364,19 +364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
rebind<_List_node<_Tp> >::other _Node_alloc_type;
  typedef __gnu_cxx::__alloc_traits<_Node_alloc_type> _Node_alloc_traits;

-  static size_t
-  _S_distance(const __detail::_List_node_base* __first,
- const __detail::_List_node_base* __last)
-  {
-   size_t __n = 0;
-   while (__first != __last)
- {
-   __first = __first->_M_next;
-   ++__n;
- }
-   return __n;
-  }
-


Removing this function will break code that expects it to exist in an
explicit instantiation. The function could be left there, but unused.


  struct _List_impl
  : public _Node_alloc_type
  {
@@ -393,6 +380,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
#if __cplusplus >= 201103L
_List_impl(_List_impl&&) = default;

+   _List_impl(_List_impl&& __x, _Node_alloc_type&& __a)
+   : _Node_alloc_type(std::move(__a)), _M_node(std::move(__x._M_node))
+   { }
+


This is fine.


_List_impl(_Node_alloc_type&& __a) noexcept
: _Node_alloc_type(std::move(__a))
{ }
@@ -409,28 +400,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  void _M_inc_size(size_t __n) { _M_impl._M_node._M_size += __n; }

  void _M_dec_size(size_t __n) { _M_impl._M_node._M_size -= __n; }
-
-  size_t
-  _M_distance(const __detail::_List_node_base* __first,
- const __detail::_List_node_base* __last) const
-  { return _S_distance(__first, __last); }
-
-  // return the stored size
-  size_t _M_node_count() const { return _M_get_size(); }
#else
  // dummy implementations used when the size is not stored
  size_t _M_get_size() const { return 0; }
  void _M_set_size(size_t) { }
  void _M_inc_size(size_t) { }
  void _M_dec_size(size_t) { }
-  size_t _M_distance(const void*, const void*) const { return 0; }
-
-  // count the number of nodes
-  size_t _M_node_count() const
-  {
-   return _S_distance(_M_impl._M_node._M_next,
-  std::__addressof(_M_impl._M_node));
-  }
#endif


Same problem again: these will no longer be defined by explicit
instantiations.



  typename _Node_alloc_traits::pointer
@@ -466,12 +441,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  _List_base(_List_base&&) = default;

  _List_base(_List_base&& __x, _Node_alloc_type&& __a)
+  : _M_impl(std::move(__x._M_impl), std::move(__a))
+  { }
+
+  _List_base(_Node_alloc_type&& __a)
  : _M_impl(std::move(__a))
-  {
-   if (__x._M_get_Node_allocator() == _M_get_Node_allocator())
- _M_move_nodes(std::move(__x));
-   // else caller must move individual elements.
-  }
+  { }



I like this change in principle, but it alters the behaviour of an
existing constructor. Existing code might use the constructor and get
broken by this.

You can avoid that by leaving the existing constructor alone and
adding two new ones for new code to use. Reordering the parameters
will make the new one distinct from the old one:

 // Used when is_always_equal
 _List_base(_Node_alloc_type&& __a, _List_base&& __x))
 : 

Re: [PATCH] detect incompatible aliases (PR c/81854)

2017-08-18 Thread Martin Sebor

On 08/18/2017 10:48 AM, Jonathan Wakely wrote:

On 18/08/17 08:54 -0600, Martin Sebor wrote:

On 08/18/2017 07:10 AM, Jonathan Wakely wrote:

On 17/08/17 21:21 -0600, Martin Sebor wrote:

Joseph, while looking into implementing enhancement your request
pr81824 I noticed that GCC silently accepts incompatible alias
declarations (pr81854) so as sort of a proof-concept for the
former I enhanced the checking already done for other kinds of
incompatibilities to also detect those mentioned in the latter
bug.  Attached is this patch, tested on x85_64-linux.

Jonathan, the patch requires suppressing the warning in libstdc++
compatibility symbol definitions in compatibility.cc.  I couldn't
find a way to do it without the suppression but I'd be happy to
try again if you have an idea for how.


Doing it that way is fine, but ...


diff --git a/libstdc++-v3/src/c++98/compatibility.cc
b/libstdc++-v3/src/c++98/compatibility.cc
index 381f4c4..5f56b9e 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -213,6 +213,11 @@
_ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv
_ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv
*/

+// Disable warning about declaring aliases between functions with
+// incompatible types.
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattributes"
+


Could this be moved closer to the point where it's needed?

It's not needed until after line 361, right?


Sure.  The other possibility that I forgot to mention is to
declare the alias without a prototype, which in C++ looks
like this:

 void foo (...);

The patch would then look like this.  Do you have a preference
between these two approaches?


If this doesn't change the generated code, but avoids the warnings
then I think I prefer this. i.e. fix the code, not just suppress the
warnings.


It's the same as calling a function without a prototype in C.
I rebuilt libstdc++ with this change and reran the test suite
with no unexpected failures so I'll go ahead and commit this
change instead.

To be clear, though, this is just a suppression mechanism not
unlike a cast.  The ideal solution would be to declare the
aliases to have the right type, e.g., using __typeof__ or
decltype.  I just couldn't find a way to make it work with
the macros.

Martin





Martin

diff --git a/libstdc++-v3/src/c++98/compatibility.cc
b/libstdc++-v3/src/c++98/compatibility.cc
index 381f4c4..b49a5ca 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -367,13 +367,13 @@ _GLIBCXX_END_NAMESPACE_VERSION

#define _GLIBCXX_3_4_SYMVER(XXname, name) \
   extern "C" void \
-   _X##name() \
+   _X##name(...) \
   __attribute__ ((alias(#XXname))); \
   asm (".symver " "_X" #name "," #name "@GLIBCXX_3.4");

#define _GLIBCXX_3_4_5_SYMVER(XXname, name) \
   extern "C" void \
-   _Y##name() \
+   _Y##name(...) \
   __attribute__ ((alias(#XXname))); \
   asm (".symver " "_Y" #name  "," #name "@@GLIBCXX_3.4.5");






[PATCH/AARCH64] Generate FRINTZ for (double)(long) under -ffast-math on aarch64

2017-08-18 Thread Andrew Pinski
Like https://gcc.gnu.org/ml/gcc-patches/2010-09/msg00060.html for
PowerPC, we should do something similar for aarch64.  This pattern
does show up in SPEC CPU 2006 in astar but I did not look into
performance improvement of it though.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64.md (*frintz): New pattern.

testsuite/ChangeLog:
* testsuite/gcc.target/aarch64/floattointtofloat-1.c: New testcase.
commit 9cef5e196729df5a197b81b72192d687683a057a
Author: Andrew Pinski 
Date:   Thu Aug 17 22:31:15 2017 -0700

Fix 19869: (double)(long)double_var could be better optimized

Use FRINTZ to optimize this.

Signed-off-by: Andrew Pinski 

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5e6b027..1439bd0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4645,6 +4645,16 @@
   [(set_attr "type" "f_cvti2f")]
 )
 
+(define_insn "*frintz"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+(float:GPF (fix:GPI (match_operand:GPF 1 "register_operand" "w"]
+  "TARGET_FLOAT && flag_unsafe_math_optimizations && !flag_trapping_math"
+  "frintz %0, %1"
+  [(set_attr "simd" "yes")
+   (set_attr "fp" "no")
+   (set_attr "type" "neon_int_to_fp_")]
+)
+
 ;; If we do not have ARMv8.2-A 16-bit floating point extensions, the
 ;; midend will arrange for an SImode conversion to HFmode to first go
 ;; through DFmode, then to HFmode.  But first it will try converting
diff --git a/gcc/testsuite/gcc.target/aarch64/floattointtofloat-1.c 
b/gcc/testsuite/gcc.target/aarch64/floattointtofloat-1.c
new file mode 100644
index 000..c1209c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/floattointtofloat-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+double dll(double a)
+{
+  return (long long)a;
+}
+double di(double a)
+{
+  return (int)a;
+}
+float fll(float a)
+{
+  return (long long)a;
+}
+float fi(float a)
+{
+  return (int)a;
+}
+/* { dg-final { scan-assembler-times "frintz" 4 } } */
+


[PATCH] Fix PR81488

2017-08-18 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81488 reports a problem with SLSR 
where
too many memory resources are required to complete SLSR processing of 
conditional
candidates.  The code in question contains large trees of PHI dependencies that 
must
be examined in order to find all candidates for replacement.  Not only are the
dependency chains deep, but many PHIs contain duplicate source operands 
arriving by
different paths, and SLSR isn't currently smart enough to avoid tracing them 
more
than once.  This leads to exponential behavior and a bad ending.

Even removing the exponential behavior is not sufficient to fix the problem.  
The
dependencies are just too complex.  So it is also necessary to put a limit on 
how
much time we want to spend examining PHI nodes before giving up.  I've 
arbitrarily
chosen 16 as the maximum number of PHI nodes to visit for each candidate, which
seems likely to be sufficient in most cases.

A side benefit of removing the exponential behavior is better accuracy in making
cost-model decisions.  With tracing through the same PHI dependencies more than
once, the insertion (+) and replacement (-) costs are overcounted.  This should
now be improved.

The original bug went latent on the trunk after it was reported, but I was able
to reproduce with an older revision and verify that the following patch fixes
the problem.  I've also bootstrapped and tested it on powerpc64le-linux-gnu with
no regressions.  Is this ok for trunk?

Thanks,
Bill


2017-08-18  Bill Schmidt  

PR tree-optimization/81488
* gimple-ssa-strength-reduction (struct slsr_cand_d): Add visited
and cached_basis fields.
(MAX_SPREAD): New constant.
(alloc_cand_and_find_basis): Initialize new fields.
(clear_visited): New function.
(create_phi_basis_1): Rename from create_phi_basis, set visited
and cached_basis fields.
(create_phi_basis): New wrapper function.
(phi_add_costs_1): Rename from phi_add_costs, add spread
parameter, set visited field, short-circuit when limits reached.
(phi_add_costs): New wrapper function.
(record_phi_increments_1): Rename from record_phi_increments, set
visited field.
(record_phi_increments): New wrapper function.
(phi_incr_cost_1): Rename from phi_incr_cost, set visited field.
(phi_incr_cost): New wrapper function.
(all_phi_incrs_profitable_1): Rename from
all_phi_incrs_profitable, set visited field.
(all_phi_incrs_profitable): New wrapper function.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 251159)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -281,6 +281,14 @@ struct slsr_cand_d
   /* Savings that can be expected from eliminating dead code if this
  candidate is replaced.  */
   int dead_savings;
+
+  /* For PHI candidates, use a visited flag to keep from processing the
+ same PHI twice from multiple paths.  */
+  int visited;
+
+  /* We sometimes have to cache a phi basis with a phi candidate to
+ avoid processing it twice.  Valid only if visited==1.  */
+  tree cached_basis;
 };
 
 typedef struct slsr_cand_d slsr_cand, *slsr_cand_t;
@@ -369,7 +377,11 @@ enum count_phis_status
   DONT_COUNT_PHIS = 0,
   COUNT_PHIS = 1
 };
- 
+
+/* Constrain how many PHI nodes we will visit for a conditional
+   candidate (depth and breadth).  */
+const int MAX_SPREAD = 16;
+
 /* Pointer map embodying a mapping from statements to candidates.  */
 static hash_map *stmt_cand_map;
 
@@ -671,6 +683,8 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi
   c->sibling = 0;
   c->def_phi = kind == CAND_MULT ? find_phi_def (base) : 0;
   c->dead_savings = savings;
+  c->visited = 0;
+  c->cached_basis = NULL_TREE;
 
   cand_vec.safe_push (c);
 
@@ -2317,19 +2331,33 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
   return lhs;
 }
 
-/* Given a candidate C with BASIS_NAME being the LHS of C's basis which
-   is hidden by the phi node FROM_PHI, create a new phi node in the same
-   block as FROM_PHI.  The new phi is suitable for use as a basis by C,
-   with its phi arguments representing conditional adjustments to the
-   hidden basis along conditional incoming paths.  Those adjustments are
-   made by creating add statements (and sometimes recursively creating
-   phis) along those incoming paths.  LOC is the location to attach to
-   the introduced statements.  KNOWN_STRIDE is true iff C's stride is a
-   constant.  */
+/* Clear the visited field for a tree of PHI candidates.  */
 
+static void
+clear_visited (gphi *phi)
+{
+  unsigned i;
+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
+
+  if (phi_cand->visited)
+{
+  phi_cand->visited = 0;
+
+  for (i = 0; i < gimple_phi_num_args (phi); i++)
+   {
+ tree arg = gimple_phi_arg_def (phi, 

[PATCH] Simplify allocator usage in unordered containers

2017-08-18 Thread Jonathan Wakely

While fixing PR 81891 I noticed that _Hashtable always creates a
__value_alloc_type for constructing and destroying the elements, which
is unnecessary. https://wg21.link/lwg2218 confirmed that.

We can avoid constructing new allocators just for these calls and can
call them directly on the stored allocator.

* include/bits/hashtable_policy.h (_ReuseOrAllocNode): Remove
__value_alloc_type and __value_alloc_traits typedefs.
(_ReuseOrAllocNode::operator()): Call construct and destroy on the
node allocator.
(_Hashtable_alloc): Simplify __value_alloc_traits typedef.
(_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&&...)): Call
construct on the node allocator.
(_Hashtable_alloc<_NodeAlloc>::_M_deallocate_node(__node_type*)): Call
destroy on the node allocator.

Tested powerpc64le-linux, committed to trunk.

commit caa7283396a9f74a42c979697b0a53115cee
Author: Jonathan Wakely 
Date:   Fri Aug 18 19:12:00 2017 +0100

Simplify allocator usage in unordered containers

* include/bits/hashtable_policy.h (_ReuseOrAllocNode): Remove
__value_alloc_type and __value_alloc_traits typedefs.
(_ReuseOrAllocNode::operator()): Call construct and destroy on the
node allocator.
(_Hashtable_alloc): Simplify __value_alloc_traits typedef.
(_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&&...)): Call
construct on the node allocator.
(_Hashtable_alloc<_NodeAlloc>::_M_deallocate_node(__node_type*)): 
Call
destroy on the node allocator.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index a3a31d1fb11..5f2d8776aaa 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -111,9 +111,6 @@ namespace __detail
 private:
   using __node_alloc_type = _NodeAlloc;
   using __hashtable_alloc = _Hashtable_alloc<__node_alloc_type>;
-  using __value_alloc_type = typename 
__hashtable_alloc::__value_alloc_type;
-  using __value_alloc_traits =
-   typename __hashtable_alloc::__value_alloc_traits;
   using __node_alloc_traits =
typename __hashtable_alloc::__node_alloc_traits;
   using __node_type = typename __hashtable_alloc::__node_type;
@@ -135,18 +132,17 @@ namespace __detail
  __node_type* __node = _M_nodes;
  _M_nodes = _M_nodes->_M_next();
  __node->_M_nxt = nullptr;
- __value_alloc_type __a(_M_h._M_node_allocator());
- __value_alloc_traits::destroy(__a, __node->_M_valptr());
+ auto& __a = _M_h._M_node_allocator();
+ __node_alloc_traits::destroy(__a, __node->_M_valptr());
  __try
{
- __value_alloc_traits::construct(__a, __node->_M_valptr(),
- std::forward<_Arg>(__arg));
+ __node_alloc_traits::construct(__a, __node->_M_valptr(),
+std::forward<_Arg>(__arg));
}
  __catch(...)
{
  __node->~__node_type();
- __node_alloc_traits::deallocate(_M_h._M_node_allocator(),
- __node, 1);
+ __node_alloc_traits::deallocate(__a, __node, 1);
  __throw_exception_again;
}
  return __node;
@@ -2000,10 +1996,8 @@ namespace __detail
   // Use __gnu_cxx to benefit from _S_always_equal and al.
   using __node_alloc_traits = __gnu_cxx::__alloc_traits<__node_alloc_type>;
 
-  using __value_type = typename __node_type::value_type;
-  using __value_alloc_type =
-   __alloc_rebind<__node_alloc_type, __value_type>;
-  using __value_alloc_traits = std::allocator_traits<__value_alloc_type>;
+  using __value_alloc_traits = typename __node_alloc_traits::template
+   rebind_traits;
 
   using __node_base = __detail::_Hash_node_base;
   using __bucket_type = __node_base*;  
@@ -2057,10 +2051,10 @@ namespace __detail
__node_type* __n = std::__addressof(*__nptr);
__try
  {
-   __value_alloc_type __a(_M_node_allocator());
::new ((void*)__n) __node_type;
-   __value_alloc_traits::construct(__a, __n->_M_valptr(),
-   std::forward<_Args>(__args)...);
+   __node_alloc_traits::construct(_M_node_allocator(),
+  __n->_M_valptr(),
+  std::forward<_Args>(__args)...);
return __n;
  }
__catch(...)
@@ -2076,8 +2070,7 @@ namespace __detail
 {
   typedef typename __node_alloc_traits::pointer _Ptr;
   auto __ptr = std::pointer_traits<_Ptr>::pointer_to(*__n);

Re: Ping on target independent stack clash protection patches

2017-08-18 Thread Eric Botcazou
> #01 of #08:
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01971.html
> 
> #02 of #08:
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01972.html
> 
> #03 of #08:
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01974.html
> 
> Need to reach some kind of closure on these, then I can start pinging
> the target maintainers for the rest of the bits...

All OK with me, thanks for your attention to the interaction with Ada.

Minor nit:

+   Stack checking is designed to detect infinite recursion for Ada
+   programs.  Furthermore stack checking tries to ensure that scenario
+   that enough stack space is left to run a signal handler.

Let's use the same wording as in invoke.texi: "...infinite recursion and stack 
overflows for...".  Missing "in" before "that scenario'.

[Ada folks in the embedded world are mainly scared about the possibility of 
tasks (threads) overwriting each other's stack; in that case, their only 
requirement is to be able to run a last chance handler to terminate the 
task properly.  But the ACATS testsuite contains a handul of tests that 
litteraly play with stack overflows and this complicates the implementation 
for artificial reasons].

-- 
Eric Botcazou


[PATCH] PR libstdc++/81891 fix double-free in hashtable constructor

2017-08-18 Thread Jonathan Wakely

We introduced a regression in r214986 when changing the _Hashtable
range constructor to delegate to another constructor. That change
means that the object has completed construction after the target
constructor completes, and so if an exception is thrown in the
delegating constructor then the destructor will run. This results in
calling _M_deallocate_buckets twice, causing a double-free.

The fix is to simply omit the try-catch in the delegating constructor,
so that the destructor takes care of the clean up.

PR libstdc++/81891
* include/bits/hashtable.h (_Hashtable(_InputIterator, _InputIterator,
size_type, const _H1&, const _H2&, const _Hash&, const _Equal&,
const _ExtractKey&, const allocator_type&)): Let destructor do clean
up if an exception is thrown.
* testsuite/23_containers/unordered_map/cons/81891.cc: New.

Tested powerpc64le-linux, committed to trunk.

As a regression since 4.9 this needs to be backported to all active
branches.


commit 30ce5842ed9b91813b97b756960afe6369f1a568
Author: Jonathan Wakely 
Date:   Fri Aug 18 18:03:51 2017 +0100

PR libstdc++/81891 fix double-free in hashtable constructor

PR libstdc++/81891
* include/bits/hashtable.h (_Hashtable(_InputIterator, 
_InputIterator,
size_type, const _H1&, const _H2&, const _Hash&, const _Equal&,
const _ExtractKey&, const allocator_type&)): Let destructor do clean
up if an exception is thrown.
* testsuite/23_containers/unordered_map/cons/81891.cc: New.

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index bc7448bcf13..e0806dc93a1 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -973,17 +973,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_bucket_count = __bkt_count;
  }
 
-   __try
- {
-   for (; __f != __l; ++__f)
- this->insert(*__f);
- }
-   __catch(...)
- {
-   clear();
-   _M_deallocate_buckets();
-   __throw_exception_again;
- }
+   for (; __f != __l; ++__f)
+ this->insert(*__f);
   }
 
   templatehttp://www.gnu.org/licenses/>.
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+
+struct fails_on_copy {
+  fails_on_copy() = default;
+  fails_on_copy(const fails_on_copy&) { throw 0; };
+};
+
+using value_type = std::pair;
+
+void
+test01()
+{
+  value_type p;
+  try
+  {
+std::unordered_map umap(,  + 1);
+  }
+  catch(...)
+  { }
+}
+
+void
+test02()
+{
+  using Alloc = __gnu_test::tracker_allocator;
+  using std::hash;
+  using std::equal_to;
+
+  value_type p;
+  try
+  {
+std::unordered_map
+   umap(,  + 1);
+  }
+  catch(...)
+  { }
+
+  using counter = __gnu_test::tracker_allocator_counter;
+  VERIFY(counter::get_allocation_count() == counter::get_deallocation_count());
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}


Re: [PATCH] detect incompatible aliases (PR c/81854)

2017-08-18 Thread Jonathan Wakely

On 18/08/17 08:54 -0600, Martin Sebor wrote:

On 08/18/2017 07:10 AM, Jonathan Wakely wrote:

On 17/08/17 21:21 -0600, Martin Sebor wrote:

Joseph, while looking into implementing enhancement your request
pr81824 I noticed that GCC silently accepts incompatible alias
declarations (pr81854) so as sort of a proof-concept for the
former I enhanced the checking already done for other kinds of
incompatibilities to also detect those mentioned in the latter
bug.  Attached is this patch, tested on x85_64-linux.

Jonathan, the patch requires suppressing the warning in libstdc++
compatibility symbol definitions in compatibility.cc.  I couldn't
find a way to do it without the suppression but I'd be happy to
try again if you have an idea for how.


Doing it that way is fine, but ...


diff --git a/libstdc++-v3/src/c++98/compatibility.cc
b/libstdc++-v3/src/c++98/compatibility.cc
index 381f4c4..5f56b9e 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv
_ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv
*/

+// Disable warning about declaring aliases between functions with
+// incompatible types.
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattributes"
+


Could this be moved closer to the point where it's needed?

It's not needed until after line 361, right?


Sure.  The other possibility that I forgot to mention is to
declare the alias without a prototype, which in C++ looks
like this:

 void foo (...);

The patch would then look like this.  Do you have a preference
between these two approaches?


If this doesn't change the generated code, but avoids the warnings
then I think I prefer this. i.e. fix the code, not just suppress the
warnings.



Martin

diff --git a/libstdc++-v3/src/c++98/compatibility.cc 
b/libstdc++-v3/src/c++98/compatibility.cc

index 381f4c4..b49a5ca 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -367,13 +367,13 @@ _GLIBCXX_END_NAMESPACE_VERSION

#define _GLIBCXX_3_4_SYMVER(XXname, name) \
   extern "C" void \
-   _X##name() \
+   _X##name(...) \
   __attribute__ ((alias(#XXname))); \
   asm (".symver " "_X" #name "," #name "@GLIBCXX_3.4");

#define _GLIBCXX_3_4_5_SYMVER(XXname, name) \
   extern "C" void \
-   _Y##name() \
+   _Y##name(...) \
   __attribute__ ((alias(#XXname))); \
   asm (".symver " "_Y" #name  "," #name "@@GLIBCXX_3.4.5");




Re: [PATCH] PR libstdc++/79162 ambiguity in string assignment due to string_view overload (LWG 2946)

2017-08-18 Thread Daniel Krügler
Hi,

This is a friendly reminder asking for a review of the suggested patch!

Thanks,

- Daniel

2017-07-30 15:01 GMT+02:00 Daniel Krügler :
> 2017-07-28 22:40 GMT+02:00 Daniel Krügler :
>> 2017-07-28 22:29 GMT+02:00 Daniel Krügler :
>>> 2017-07-28 22:25 GMT+02:00 Tim Song :
 On Fri, Jul 28, 2017 at 4:10 PM, Daniel Krügler
  wrote:
> +  // Performs an implicit conversion from _Tp to __sv_type.
> +  template
> +static __sv_type _S_to_string_view(const _Tp& __svt)
> +{
> +  return __svt;
> +}

 I might have gone for

 +static __sv_type _S_to_string_view(__sv_type __svt) noexcept
 +{
 +  return __svt;
 +}

 With that, we can also use noexcept(_S_to_string_view(__t)) to make up
 for the absence of is_nothrow_convertible (basically the same thing I
 did in LWG 2993's PR).
>>>
>>> Agreed, that makes very much sense. I will adjust the P/R, but before
>>> I resubmit I would like to get feedback whether the other two compare
>>> functions also should become conditionally noexcept.
>>
>> Locally I have now performed the sole change of the _S_to_string_view
>> declaration getting rid of the template, but would also like to gather
>> feedback from the maintainers whether I should also change the form of
>> the conditional noexcept to use the expression
>>
>> noexcept(_S_to_string_view(__t))
>>
>> instead of the current
>>
>> is_same<_Tp, __sv_type>::value
>>
>> as suggested by Tim Song.
>>
>> I'm asking also, because I have a paper proposing to standardize
>> is_nothrow_convertible submitted for the upcoming C++ mailing - This
>> would be one of the first applications in the library ;-)
>
> A slightly revised patch update: It replaces the _S_to_string_view
> template by a simpler _S_to_string_view function as of Tim Song's
> suggestion, but still uses the simplified noexcept specification
> deferring it to a future application case for is_nothrow_convertible.
> Furthermore now all three compare function templates are now
> (conditionally) noexcept by an (off-list) suggestion from Jonathan
> Wakely.
>
> Thanks,
>
> - Daniel



-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154



Re: [PATCH] detect incompatible aliases (PR c/81854)

2017-08-18 Thread Martin Sebor

On 08/18/2017 07:10 AM, Jonathan Wakely wrote:

On 17/08/17 21:21 -0600, Martin Sebor wrote:

Joseph, while looking into implementing enhancement your request
pr81824 I noticed that GCC silently accepts incompatible alias
declarations (pr81854) so as sort of a proof-concept for the
former I enhanced the checking already done for other kinds of
incompatibilities to also detect those mentioned in the latter
bug.  Attached is this patch, tested on x85_64-linux.

Jonathan, the patch requires suppressing the warning in libstdc++
compatibility symbol definitions in compatibility.cc.  I couldn't
find a way to do it without the suppression but I'd be happy to
try again if you have an idea for how.


Doing it that way is fine, but ...


diff --git a/libstdc++-v3/src/c++98/compatibility.cc
b/libstdc++-v3/src/c++98/compatibility.cc
index 381f4c4..5f56b9e 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv
_ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv
 */

+// Disable warning about declaring aliases between functions with
+// incompatible types.
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattributes"
+


Could this be moved closer to the point where it's needed?

It's not needed until after line 361, right?


Sure.  The other possibility that I forgot to mention is to
declare the alias without a prototype, which in C++ looks
like this:

  void foo (...);

The patch would then look like this.  Do you have a preference
between these two approaches?

Martin

diff --git a/libstdc++-v3/src/c++98/compatibility.cc 
b/libstdc++-v3/src/c++98/compatibility.cc

index 381f4c4..b49a5ca 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -367,13 +367,13 @@ _GLIBCXX_END_NAMESPACE_VERSION

 #define _GLIBCXX_3_4_SYMVER(XXname, name) \
extern "C" void \
-   _X##name() \
+   _X##name(...) \
__attribute__ ((alias(#XXname))); \
asm (".symver " "_X" #name "," #name "@GLIBCXX_3.4");

 #define _GLIBCXX_3_4_5_SYMVER(XXname, name) \
extern "C" void \
-   _Y##name() \
+   _Y##name(...) \
__attribute__ ((alias(#XXname))); \
asm (".symver " "_Y" #name  "," #name "@@GLIBCXX_3.4.5");




[PATCH] Asm memory constraints

2017-08-18 Thread Alan Modra
This patch adds some documentation on asm memory constraints, aimed
especially at constraints for arrays.  I may have invented something
new here as I've never seen "=m" (*(T (*)[]) ptr) used before.
So this isn't simply a documentation patch.  It needs blessing from a
global maintainer, I think, as to whether this is a valid approach and
something that gcc ought to continue supporting.  My poking around the
code and looking at dumps convinced me that it's OK..

PR inline-asm/81890
* doc/extend.texi (Clobbers): Correct vax example.  Delete old
example of a memory input for a string of known length.  Move
commentary out of table.  Add a number of new examples
covering array memory inputs and outputs.
testsuite/
* gcc.target/i386/asm-mem.c: New test.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 93d542d..224518f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8755,7 +8755,7 @@ registers:
 asm volatile ("movc3 %0, %1, %2"
: /* No outputs. */
: "g" (from), "g" (to), "g" (count)
-   : "r0", "r1", "r2", "r3", "r4", "r5");
+   : "r0", "r1", "r2", "r3", "r4", "r5", "memory");
 @end example
 
 Also, there are two special clobber arguments:
@@ -8786,14 +8786,75 @@ Note that this clobber does not prevent the 
@emph{processor} from doing
 speculative reads past the @code{asm} statement. To prevent that, you need 
 processor-specific fence instructions.
 
-Flushing registers to memory has performance implications and may be an issue 
-for time-sensitive code.  You can use a trick to avoid this if the size of 
-the memory being accessed is known at compile time. For example, if accessing 
-ten bytes of a string, use a memory input like: 
+@end table
 
-@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}.
+Flushing registers to memory has performance implications and may be
+an issue for time-sensitive code.  You can provide better information
+to GCC to avoid this, as shown in the following examples.  At a
+minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
+need to be flushed.  Also, if GCC can prove that all of the outputs of
+a non-volatile @code{asm} statement are unused, then the @code{asm}
+may be deleted.  Removal of otherwise dead @code{asm} statements will
+not happen if they clobber @code{"memory"}.
 
-@end table
+Here is a fictitious sum of squares instruction, that takes two
+pointers to floating point values in memory and produces a floating
+point register output.
+Notice that @code{x}, and @code{y} both appear twice in the @code{asm}
+parameters, once to specify memory accessed, and once to specify a
+base register used by the @code{asm}.  You won't normally be wasting a
+register by doing this as GCC can use the same register for both
+purposes.  However, it would be foolish to use both @code{%1} and
+@code{%3} for @code{x} in this @code{asm} and expect them to be the
+same.  In fact, @code{%3} may well not even be a register.  It might
+be a symbolic memory reference to the object pointed to by @code{x}.
+
+@smallexample
+asm ("sumsq %0, %1, %2"
+ : "+f" (result)
+ : "r" (x), "r" (y), "m" (*x), "m" (*y));
+@end smallexample
+
+Here is a fictitious @code{*z++ = *x++ * *y++} instruction.
+Notice that the @code{x}, @code{y} and @code{z} pointer registers
+must be specified as input/output because the @code{asm} modifies
+them.
+
+@smallexample
+asm ("vecmul %0, %1, %2"
+ : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
+ : "m" (*x), "m" (*y));
+@end smallexample
+
+An x86 example where the string memory argument is of unknown length.
+
+@smallexample
+asm("repne scasb"
+: "=c" (count), "+D" (p)
+: "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
+@end smallexample
+
+If you know the above will only be reading a ten byte array then you
+could instead use a memory input like:
+@code{"m" (*(const char (*)[10]) p)}.
+
+Here is an example of a PowerPC vector scale implemented in assembly,
+complete with vector and condition code clobbers, and some initialized
+offset registers that are unchanged by the @code{asm}.
+
+@smallexample
+void
+dscal (size_t n, double *x, double alpha)
+@{
+  asm ("/* lots of asm here */"
+   : "+m" (*(double (*)[n]) x), "+r" (n), "+b" (x)
+   : "d" (alpha), "b" (32), "b" (48), "b" (64),
+ "b" (80), "b" (96), "b" (112)
+   : "cr0",
+ "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
+ "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
+@}
+@end smallexample
 
 @anchor{GotoLabels}
 @subsubsection Goto Labels
diff --git a/gcc/testsuite/gcc.target/i386/asm-mem.c 
b/gcc/testsuite/gcc.target/i386/asm-mem.c
new file mode 100644
index 000..01522fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/asm-mem.c
@@ -0,0 +1,58 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+/* Check that "m" array references are effective in 

constexpr basic_string_view as required by C++17

2017-08-18 Thread Benjamin Buch
The current implementation does miss a lot of constexpr in
basic_string_view. (Bug 70483)

The patch adds the missing constexpr's as required by C++17. I did run
the the stdlibc++-v3 testsuite and got no errors. I didn't changes the
basic_string_view test cases, because I couldn't find a pattern in it
how to test if the function declarations are constexpr. (This is my
first patch for GCC.)
diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index fd9a6afb..86f1a72 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2017-08-18  Benjamin Buch  
+
+	* include/std/string_view: Added constexpr as required by the C++17.
+	* include/bits/string_view.tcc: Likewise.
+
 2017-08-11  Jonathan Wakely  
 
 	PR libstdc++/81808
@@ -2783,7 +2788,7 @@
 2017-01-01  Jakub Jelinek  
 
 	Update copyright years.
-
+
 Copyright (C) 2017 Free Software Foundation, Inc.
 
 Copying and distribution of this file, with or without modification,
diff --git a/libstdc++-v3/include/bits/string_view.tcc b/libstdc++-v3/include/bits/string_view.tcc
index f4bd50f..b8ab78c 100644
--- a/libstdc++-v3/include/bits/string_view.tcc
+++ b/libstdc++-v3/include/bits/string_view.tcc
@@ -45,7 +45,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find(const _CharT* __str, size_type __pos, size_type __n) const noexcept
 {
@@ -66,7 +66,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find(_CharT __c, size_type __pos) const noexcept
 {
@@ -82,7 +82,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 rfind(const _CharT* __str, size_type __pos, size_type __n) const noexcept
 {
@@ -102,7 +102,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 rfind(_CharT __c, size_type __pos) const noexcept
 {
@@ -119,7 +119,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find_first_of(const _CharT* __str, size_type __pos, size_type __n) const
 {
@@ -135,7 +135,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find_last_of(const _CharT* __str, size_type __pos, size_type __n) const
 {
@@ -156,7 +156,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find_first_not_of(const _CharT* __str, size_type __pos, size_type __n) const
 {
@@ -168,7 +168,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find_first_not_of(_CharT __c, size_type __pos) const noexcept
 {
@@ -179,7 +179,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find_last_not_of(const _CharT* __str, size_type __pos, size_type __n) const
 {
@@ -200,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-typename basic_string_view<_CharT, _Traits>::size_type
+constexpr typename basic_string_view<_CharT, _Traits>::size_type
 basic_string_view<_CharT, _Traits>::
 find_last_not_of(_CharT __c, size_type __pos) const noexcept
 {
diff --git a/libstdc++-v3/include/std/string_view b/libstdc++-v3/include/std/string_view
index 88a7686..8b4ac05 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -106,7 +106,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_str{__str}
   { }
 
-  basic_string_view&
+  constexpr basic_string_view&
   operator=(const basic_string_view&) noexcept = default;
 
   // [string.view.iterators], iterators

RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-08-18 Thread Tsimbalist, Igor V
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Friday, August 18, 2017 3:53 PM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> 
> On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V
>  wrote:
> >> -Original Message-
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: Tuesday, August 15, 2017 3:43 PM
> >> To: Tsimbalist, Igor V 
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >>
> >> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V
> >>  wrote:
> >> > Part#1. Add generic part for Intel CET enabling.
> >> >
> >> > The spec is available at
> >> >
> >> > https://software.intel.com/sites/default/files/managed/4d/2a/contro
> >> > l-f low-enforcement-technology-preview.pdf
> >> >
> >> > High-level design.
> >> > --
> >> >
> >> > A proposal is to introduce a target independent flag
> >> > -finstrument-control-flow with a semantic to instrument a code to
> >> > control validness or integrity of control-flow transfers using jump
> >> > and call instructions. The main goal is to detect and block a
> >> > possible malware execution through transfer the execution to
> >> > unknown target address. Implementation could be either software or
> >> > target based. Any target platforms can provide their implementation
> >> > for instrumentation under this option.
> >> >
> >> > When the -finstrument-control-flow flag is set each implementation
> >> > has to check if a support exists for a target platform and report
> >> > an error if no support is found.
> >> >
> >> > The compiler should instrument any control-flow transfer points in
> >> > a program (ex. call/jmp/ret) as well as any landing pads, which are
> >> > targets of for control-flow transfers.
> >> >
> >> > A new 'notrack' attribute is introduced to provide hand tuning support.
> >> > The attribute directs the compiler to skip a call to a function and
> >> > a function's landing pad from instrumentation (tracking). The
> >> > attribute can be used for function and pointer to function types,
> >> > otherwise it will be ignored. The attribute is saved in a type and
> >> > propagated to a GIMPLE call statement and later to a call instruction.
> >> >
> >> > Currently all platforms except i386 will report the error and do no
> >> > instrumentation. i386 will provide the implementation based on a
> >> > specification published by Intel for a new technology called
> >> > Control-flow Enforcement Technology (CET).
> >>
> >> diff --git a/gcc/gimple.c b/gcc/gimple.c index 479f90c..2e4ab2d
> >> 100644
> >> --- a/gcc/gimple.c
> >> +++ b/gcc/gimple.c
> >> @@ -378,6 +378,23 @@ gimple_build_call_from_tree (tree t)
> >>gimple_set_no_warning (call, TREE_NO_WARNING (t));
> >>gimple_call_set_with_bounds (call, CALL_WITH_BOUNDS_P (t));
> >>
> >> +  if (fndecl == NULL_TREE)
> >> +{
> >> +  /* Find the type of an indirect call.  */
> >> +  tree addr = CALL_EXPR_FN (t);
> >> +  if (TREE_CODE (addr) != FUNCTION_DECL)
> >> +   {
> >> + tree fntype = TREE_TYPE (addr);
> >> + gcc_assert (POINTER_TYPE_P (fntype));
> >> + fntype = TREE_TYPE (fntype);
> >> +
> >> + /* Check if its type has the no-track attribute and propagate
> >> +it to the CALL insn.  */
> >> + if (lookup_attribute ("notrack", TYPE_ATTRIBUTES (fntype)))
> >> +   gimple_call_set_with_notrack (call, TRUE);
> >> +   }
> >> +}
> >>
> >> this means notrack is not recognized if fndecl is not NULL.  Note
> >> that only the two callers know the real function type in effect (they
> >> call gimple_call_set_fntype with it).  I suggest to pass down that
> >> type to gimple_build_call_from_tree and move the
> >> gimple_call_set_fntype call there as well.  And simply use the type for the
> above.
> >
> > The best way to say is notrack is not propagated if fndecl is not NULL.
> Fndecl, if not NULL, is a direct call and notrack is not applicable for such 
> calls. I
> will add a comment before the if.
> 
> Hmm.  But what does this mean?  I guess direct calls are always 'notrack' then
> and thus we're fine to ignore it.

Yes, that's correct - direct calls are always 'notrack' and we are ignoring it 
in calls.

> > I would like to propose modifying the existing code without changing
> interfaces. The idea is that at the time the notrack is propagated (the code
> snippet above) the gimple call was created and the correct type was assigned
> to the 'call' exactly by gimple_call_set_fntype. My proposal is to get the 
> type
> out of the gimple 'call' (like gimple_call_fntype) instead of the tree 't'. 
> Is it
> right?
> 
> Yes, but note that the type on the gimple 'call' isn't yet correctly set 
> (only the
> 

Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space

2017-08-18 Thread Georg-Johann Lay

On 18.08.2017 12:01, Richard Biener wrote:

On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay  wrote:

On 28.07.2017 09:34, Richard Biener wrote:

On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay  wrote:

On 27.07.2017 14:34, Richard Biener wrote:

On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay  wrote:


For some targets, the best place to put read-only lookup tables as
generated by -ftree-switch-conversion is not the generic address space
but some target specific address space.

This is the case for AVR, where for most devices .rodata must be
located in RAM.

Part #1 adds a new, optional target hook that queries the back-end
about the desired address space.  There is currently only one user:
tree-switch-conversion.c::build_one_array() which re-builds value_type
and array_type if the address space returned by the backend is not
the generic one.

Part #2 is the AVR part that implements the new hook and adds some
sugar around it.


Given that switch-conversion just creates a constant initializer doesn't
AVR
benefit from handling those uniformly (at RTL expansion?).  Not sure but
I think it goes through the regular constant pool handling.

Richard.


avr doesn't use constant pools because they are not profitable for
several reasons.

Moreover, it's not sufficient to just patch the pool's section, you'd
also have to patch *all* accesses to also use the exact same address
space so that they use the correct instruction (LPM instead of LD).


Sure.  So then not handle it in constant pool handling but in expanding
the initializers of global vars.  I think the entry for this is
varasm.c:assemble_variable - that sets DECL_RTL for the variable.


That cannot work, and here is why; just for completeness:

cgraphunit.c::symbol_table::compile()

runs
   ...
   expand_all_functions ();
   output_variables (); // runs assemble_variable
   ...

So any patching of DECL_RTL will result in wrong code: The address
space of the decl won't match the address space used by the code
accessing decl.


Ok, so it's more tricky to hack it that way (you'd need to catch the
time the decl gets its DECL_RTL set, not sure where that happens
for globals).


Too late, IMO.  Problem is that any reference must also have the
same AS.  Hence if somewhere, before patching decl_rtl, some code
uses TREE_TYPE on respective decl, that type will miss the AS,
same for all variables / pointers created with that type.

Been there, seen it...


That leaves doing a more broad transform.  One convenient place
to hook in would be the ipa_single_use pass which computes
the varpool_node.used_by_single_function flag.  All such variables
would be suitable for promotion (with some additional constraints
I suppose).  You'd add a transform phase to the pass that rewrites
the decls and performs GIMPLE IL adjustments (I think you need
to patch memory references to use the address-space).


Rewriting DECLs is not enough.  Only the place that cooks up
the decl (implicitly) knows whether it's appropriate to use
different AS and whether is has control over diffusion into
TREE_TYPEs. And as we have strong filter (see below) which
includes DECL_ARTIFICIAL, almost nothing will remain to be
transformed anyway...


Of course there would need to be a target hook determining
if/what section/address-space a variable should be promoted to
which somehow would allow switch-conversion to use that
as well.  Ho humm.

That said, do you think the switch-conversion created decl is
the only one that benefits from compiler-directed promotion
to different address-space?


Yes. Only compiler-generated lookup-tables may be transformed,
and the only such tables I know of are CSWTCH and vtables.

The conditions so far are:

* TREE_STATIC
* !TREE_PUBLIC
* TREE_READONLY (at least for the case this PR is after)
* DECL_ARTIFICIAL (or otherwise exclude inline asm)
* decl must not be cooked up by non-C FE (i.e. vtables are out).
* Not weak, comdat or linkonce (again, vtables are out).
* Not for debug purpose (like what dwarf2asm does).
* No section attribute (actually also CSWTCH could be mentioned
  in linker script, but we can argue that artificials are
  managed by the compiler + user has option to control CSWTCH).

What remains is CSWTCH.

Johann


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 03/08 - V3

2017-08-18 Thread Richard Biener
On Mon, Jul 31, 2017 at 7:41 AM, Jeff Law  wrote:
> This patch introduces some routines for logging of stack clash
> protection actions.
>
> I don't think this patch changed at all relative to V2.

Ok.

Richard.

>
> Jeff
>
>
> * function.c (dump_stack_clash_frame_info): New function.
> * function.h (dump_stack_clash_frame_info): Prototype.
> (enum stack_clash_probes): New enum.
>
> diff --git a/gcc/function.c b/gcc/function.c
> index f625489..ca48b3f 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -5695,6 +5695,58 @@ get_arg_pointer_save_area (void)
>return ret;
>  }
>
> +
> +/* If debugging dumps are requested, dump information about how the
> +   target handled -fstack-check=clash for the prologue.
> +
> +   PROBES describes what if any probes were emitted.
> +
> +   RESIDUALS indicates if the prologue had any residual allocation
> +   (i.e. total allocation was not a multiple of PROBE_INTERVAL).  */
> +
> +void
> +dump_stack_clash_frame_info (enum stack_clash_probes probes, bool residuals)
> +{
> +  if (!dump_file)
> +return;
> +
> +  switch (probes)
> +{
> +case NO_PROBE_NO_FRAME:
> +  fprintf (dump_file,
> +  "Stack clash no probe no stack adjustment in prologue.\n");
> +  break;
> +case NO_PROBE_SMALL_FRAME:
> +  fprintf (dump_file,
> +  "Stack clash no probe small stack adjustment in prologue.\n");
> +  break;
> +case PROBE_INLINE:
> +  fprintf (dump_file, "Stack clash inline probes in prologue.\n");
> +  break;
> +case PROBE_LOOP:
> +  fprintf (dump_file, "Stack clash probe loop in prologue.\n");
> +  break;
> +}
> +
> +  if (residuals)
> +fprintf (dump_file, "Stack clash residual allocation in prologue.\n");
> +  else
> +fprintf (dump_file, "Stack clash no residual allocation in prologue.\n");
> +
> +  if (frame_pointer_needed)
> +fprintf (dump_file, "Stack clash frame pointer needed.\n");
> +  else
> +fprintf (dump_file, "Stack clash no frame pointer needed.\n");
> +
> +  if (TREE_THIS_VOLATILE (cfun->decl))
> +fprintf (dump_file,
> +"Stack clash noreturn prologue, assuming no implicit"
> +" probes in caller.\n");
> +  else
> +fprintf (dump_file,
> +"Stack clash not noreturn prologue.\n");
> +}
> +
>  /* Add a list of INSNS to the hash HASHP, possibly allocating HASHP
> for the first time.  */
>
> diff --git a/gcc/function.h b/gcc/function.h
> index 0f34bcd..87dac80 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -553,6 +553,14 @@ do { 
>   \
>((TARGET_PTRMEMFUNC_VBIT_LOCATION == ptrmemfunc_vbit_in_pfn)  \
> ? MAX (FUNCTION_BOUNDARY, 2 * BITS_PER_UNIT) : FUNCTION_BOUNDARY)
>
> +enum stack_clash_probes {
> +  NO_PROBE_NO_FRAME,
> +  NO_PROBE_SMALL_FRAME,
> +  PROBE_INLINE,
> +  PROBE_LOOP
> +};
> +
> +extern void dump_stack_clash_frame_info (enum stack_clash_probes, bool);
>
>
>  extern void push_function_context (void);
>


libgo patch committed: Fix cgo tests for AIX

2017-08-18 Thread Ian Lance Taylor
This libgo patch by Tony Reix fixes the cgo tests for AIX.
Bootstrapped and ran testsuite on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251179)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9ff49c64ea6dbb5e08d1fa859b99b06049413279
+28e49825162465172ed706283628bf5cc1996260
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/misc/cgo/test/cthread_unix.c
===
--- libgo/misc/cgo/test/cthread_unix.c  (revision 250873)
+++ libgo/misc/cgo/test/cthread_unix.c  (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
 
 #include 
 #include "_cgo_export.h"
Index: libgo/misc/cgo/test/issue18146.go
===
--- libgo/misc/cgo/test/issue18146.go   (revision 250873)
+++ libgo/misc/cgo/test/issue18146.go   (working copy)
@@ -50,6 +50,8 @@ func test18146(t *testing.T) {
nproc = 6
case "darwin", "dragonfly", "freebsd", "netbsd", "openbsd":
nproc = 7
+   case "aix":
+   nproc = 9
}
if setNproc {
var rlim syscall.Rlimit


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 02/08 - V3

2017-08-18 Thread Richard Biener
On Mon, Jul 31, 2017 at 7:38 AM, Jeff Law  wrote:
>
> This patch introduces generic mechanisms to protect the dynamically
> allocated stack space against stack-clash attacks.
>
> Changes since V2:
>
> Dynamic allocations can be emitted as unrolled inlined probes or with a
> rotated loop.  Blockage insns are also properly emitted for the dynamic
> area probes and the dynamic area probing now supports targets that may
> make optimistic assumptions in their prologues.  Finally it uses the new
> param to control the probing interval.
>
> Tests were updated to explicitly specify the guard and probing interval.
>  New test to check inline/unrolled probes as well as rotated loop.
>
>
>
> * explow.c: Include "params.h".
> (anti_adjust_stack_and_probe_stack_clash): New function.
> (get_stack_check_protect): Likewise.
> (compute_stack_clash_protection_loop_data): Likewise.
> (emit_stack_clash_protection_loop_start): Likewise.
> (emit_stack_clash_protection_loop_end): Likewise.
> (allocate_dynamic_stack_space): Use get_stack_check_protect.
> Use anti_adjust_stack_and_probe_stack_clash.
> * explow.h (compute_stack_clash_protection_loop_data): Prototype.
> (emit_stack_clash_protection_loop_start): Likewise.
> (emit_stack_clash_protection_loop_end): Likewise.
> * rtl.h (get_stack_check_protect): Prototype.
> * defaults.h (STACK_CLASH_PROTECTION_NEEDS_FINAL_DYNAMIC_PROBE):
> Define new default.
> * doc/tm.texi.in (STACK_CLASH_PROTECTION_NEEDS_FINAL_DYNAMIC_PROBE):
> Define.

Please make this a hook instead of a target macro.

Besides this it looks good (I trust you on the RTL details).

Thanks,
Richard.

> * doc/tm.texi: Rebuilt.
>
> * config/aarch64/aarch64.c (aarch64_expand_prologue): Use
> get_stack_check_protect.
> * config/alpha/alpha.c (alpha_expand_prologue): Likewise.
> * config/arm/arm.c (arm_expand_prologue): Likewise.
> * config/i386/i386.c (ix86_expand_prologue): Likewise.
> * config/ia64/ia64.c (ia64_expand_prologue): Likewise.
> * config/mips/mips.c (mips_expand_prologue): Likewise.
> * config/powerpcspe/powerpcspe.c (rs6000_emit_prologue): Likewise.
> * config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise.
> * config/sparc/sparc.c (sparc_expand_prologue): Likewise.
>
>
> testsuite
>
> * gcc.dg/stack-check-3.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index ef1b5a8..0a8b40a 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3676,12 +3676,14 @@ aarch64_expand_prologue (void)
>  {
>if (crtl->is_leaf && !cfun->calls_alloca)
> {
> - if (frame_size > PROBE_INTERVAL && frame_size > STACK_CHECK_PROTECT)
> -   aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT,
> -   frame_size - STACK_CHECK_PROTECT);
> + if (frame_size > PROBE_INTERVAL
> + && frame_size > get_stack_check_protect ())
> +   aarch64_emit_probe_stack_range (get_stack_check_protect (),
> +   (frame_size
> +- get_stack_check_protect ()));
> }
>else if (frame_size > 0)
> -   aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size);
> +   aarch64_emit_probe_stack_range (get_stack_check_protect (), 
> frame_size);
>  }
>
>aarch64_sub_sp (IP0_REGNUM, initial_adjust, true);
> diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
> index 00a69c1..91f3d7c 100644
> --- a/gcc/config/alpha/alpha.c
> +++ b/gcc/config/alpha/alpha.c
> @@ -7741,7 +7741,7 @@ alpha_expand_prologue (void)
>
>probed_size = frame_size;
>if (flag_stack_check)
> -probed_size += STACK_CHECK_PROTECT;
> +probed_size += get_stack_check_protect ();
>
>if (probed_size <= 32768)
>  {
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c6101ef..9822ca7 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -21680,13 +21680,13 @@ arm_expand_prologue (void)
>
>if (crtl->is_leaf && !cfun->calls_alloca)
> {
> - if (size > PROBE_INTERVAL && size > STACK_CHECK_PROTECT)
> -   arm_emit_probe_stack_range (STACK_CHECK_PROTECT,
> -   size - STACK_CHECK_PROTECT,
> + if (size > PROBE_INTERVAL && size > get_stack_check_protect ())
> +   arm_emit_probe_stack_range (get_stack_check_protect (),
> +   size - get_stack_check_protect (),
> regno, live_regs_mask);
> }
>else if (size > 0)
> -   arm_emit_probe_stack_range (STACK_CHECK_PROTECT, size,
> +   arm_emit_probe_stack_range (get_stack_check_protect (), size,
> 

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 - V3

2017-08-18 Thread Richard Biener
On Mon, Jul 31, 2017 at 7:35 AM, Jeff Law  wrote:
> This patch introduces the stack clash protection options
>
> Changes since V2:
>
> Adds two new params.  The first controls the size of the guard area.
> This controls the threshold for when a function prologue requires
> probes.  The second controls the probing interval -- ie, once probes are
> needed, how often do we emit them.  These are really meant more for
> developers to experiment with than users.  Regardless I did go ahead and
> document them./PARAM
>
> It also adds some sanity checking WRT combining stack clash protection
> with -fstack-check.

diff --git a/gcc/params.c b/gcc/params.c
index fab0ffa..8afe4c4 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -209,6 +209,11 @@ set_param_value (const char *name, int value,
 error ("maximum value of parameter %qs is %u",
compiler_params[i].option,
compiler_params[i].max_value);
+  else if ((strcmp (name, "stack-clash-protection-guard-size") == 0
+|| strcmp (name, "stack-clash-protection-probe-interval") == 0)
+   && exact_log2 (value) == -1)
+error ("value of parameter %qs must be a power of 2",
+   compiler_params[i].option);
   else
 set_param_value_internal ((compiler_param) i, value,
   params, params_set, true);

I don't like this.  Either use them as if they were power-of-two
(floor_log2/ceil_log2 as appropriate) or simply make them take
the logarithm instead (like -mincoming-stack-boundary and friends).

Both -fstack-clash-protection and -fstack-check cannot be turned
off per function.  This means they would need merging in lto-wrapper.
The alternative is to mark them with 'Optimization' and allow per-function
specification (like we do for -fstack-protector).

Otherwise ok.

Richard.

> Jeff
>
>
> * common.opt (-fstack-clash-protection): New option.
> * flag-types.h (enum stack_check_type): Note difference between
> -fstack-check= and -fstack-clash-protection.
> * params.h (set_param_value): Verify stack-clash-protection
> params are powers of two.
> * params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): New PARAM.
> (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Likewise.
> * toplev.c (process_options): Issue warnings/errors for cases
> not handled with -fstack-clash-protection.
>
>
>
> * doc/invoke.texi (-fstack-clash-protection): Document new option.
> (-fstack-check): Note additional problem with -fstack-check=generic.
> Note that -fstack-check is primarily for Ada and refer users
> to -fstack-clash-protection for stack-clash-protection.
> Document new params for stack clash protection.
>
>
>
>
> * toplev.c (process_options): Handle -fstack-clash-protection.
>
> testsuite/
>
> * gcc.dg/stack-check-2.c: New test.
> * lib/target-supports.exp
> (check_effective_target_supports_stack_clash_protection): New 
> function.
> (check_effective_target_frame_pointer_for_non_leaf): Likewise.
> (check_effective_target_caller_implicit_probes): Likewise.
>
>
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e81165c..cfaf2bc 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2306,6 +2306,11 @@ fstack-check
>  Common Alias(fstack-check=, specific, no)
>  Insert stack checking code into the program.  Same as -fstack-check=specific.
>
> +fstack-clash-protection
> +Common Report Var(flag_stack_clash_protection)
> +Insert code to probe each page of stack space as it is allocated to protect
> +from stack-clash style attacks
> +
>  fstack-limit
>  Common Var(common_deferred_options) Defer
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 3e5cee8..8095dc2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10128,6 +10128,20 @@ compilation without.  The value for compilation with 
> profile feedback
>  needs to be more conservative (higher) in order to make tracer
>  effective.
>
> +@item stack-clash-protection-guard-size
> +The size (in bytes) of the stack guard.  Acceptable values are powers of 2
> +between 4096 and 134217728 and defaults to 4096.  Higher values may reduce 
> the
> +number of explicit probes, but a value larger than the operating system
> +provided guard will leave code vulnerable to stack clash style attacks.
> +
> +@item stack-clash-protection-probe-interval
> +Stack clash protection involves probing stack space as it is allocated.  This
> +param controls the maximum distance (in bytes) between probes into the stack.
> +Acceptable values are powers of 2 between 4096 and 65536 and defaults to
> +4096.  Higher values may reduce the number of explicit probes, but a value
> +larger than the operating system provided guard will leave code vulnerable to
> +stack clash style attacks.
> +
>  @item max-cse-path-length
>
>  The maximum number of basic blocks on path that CSE considers.
> @@ -11333,7 +11347,8 @@ target support in the compiler but comes 

Re: [PATCH 4/3] improve detection of attribute conflicts (PR 81544)

2017-08-18 Thread Jonathan Wakely

On 16/08/17 16:38 -0600, Martin Sebor wrote:

Jon,

Attached is the libstdc++ only patch to remove the pointless
const attribute from __pool::_M_destroy_thread_key(void*).

 https://gcc.gnu.org/ml/gcc/2017-08/msg00027.html

I only belatedly now broke it out of the larger patch under
review here:

 https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00599.html

Thanks
Martin



libstdc++-v3/ChangeLog:

PR c/81544
* include/ext/mt_allocator.h (_M_destroy_thread_key): Remove
pointless attribute const.


OK.


diff --git a/libstdc++-v3/include/ext/mt_allocator.h 
b/libstdc++-v3/include/ext/mt_allocator.h
index effb13b..f349ff8 100644
--- a/libstdc++-v3/include/ext/mt_allocator.h
+++ b/libstdc++-v3/include/ext/mt_allocator.h
@@ -355,7 +355,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }

  // XXX GLIBCXX_ABI Deprecated
-  _GLIBCXX_CONST void 
+  void

  _M_destroy_thread_key(void*) throw ();

  size_t 




Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V
 wrote:
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, August 15, 2017 3:43 PM
>> To: Tsimbalist, Igor V 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
>>
>> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V
>>  wrote:
>> > Part#1. Add generic part for Intel CET enabling.
>> >
>> > The spec is available at
>> >
>> > https://software.intel.com/sites/default/files/managed/4d/2a/control-f
>> > low-enforcement-technology-preview.pdf
>> >
>> > High-level design.
>> > --
>> >
>> > A proposal is to introduce a target independent flag
>> > -finstrument-control-flow with a semantic to instrument a code to
>> > control validness or integrity of control-flow transfers using jump
>> > and call instructions. The main goal is to detect and block a possible
>> > malware execution through transfer the execution to unknown target
>> > address. Implementation could be either software or target based. Any
>> > target platforms can provide their implementation for instrumentation
>> > under this option.
>> >
>> > When the -finstrument-control-flow flag is set each implementation has
>> > to check if a support exists for a target platform and report an error
>> > if no support is found.
>> >
>> > The compiler should instrument any control-flow transfer points in a
>> > program (ex. call/jmp/ret) as well as any landing pads, which are
>> > targets of for control-flow transfers.
>> >
>> > A new 'notrack' attribute is introduced to provide hand tuning support.
>> > The attribute directs the compiler to skip a call to a function and a
>> > function's landing pad from instrumentation (tracking). The attribute
>> > can be used for function and pointer to function types, otherwise it
>> > will be ignored. The attribute is saved in a type and propagated to a
>> > GIMPLE call statement and later to a call instruction.
>> >
>> > Currently all platforms except i386 will report the error and do no
>> > instrumentation. i386 will provide the implementation based on a
>> > specification published by Intel for a new technology called
>> > Control-flow Enforcement Technology (CET).
>>
>> diff --git a/gcc/gimple.c b/gcc/gimple.c index 479f90c..2e4ab2d 100644
>> --- a/gcc/gimple.c
>> +++ b/gcc/gimple.c
>> @@ -378,6 +378,23 @@ gimple_build_call_from_tree (tree t)
>>gimple_set_no_warning (call, TREE_NO_WARNING (t));
>>gimple_call_set_with_bounds (call, CALL_WITH_BOUNDS_P (t));
>>
>> +  if (fndecl == NULL_TREE)
>> +{
>> +  /* Find the type of an indirect call.  */
>> +  tree addr = CALL_EXPR_FN (t);
>> +  if (TREE_CODE (addr) != FUNCTION_DECL)
>> +   {
>> + tree fntype = TREE_TYPE (addr);
>> + gcc_assert (POINTER_TYPE_P (fntype));
>> + fntype = TREE_TYPE (fntype);
>> +
>> + /* Check if its type has the no-track attribute and propagate
>> +it to the CALL insn.  */
>> + if (lookup_attribute ("notrack", TYPE_ATTRIBUTES (fntype)))
>> +   gimple_call_set_with_notrack (call, TRUE);
>> +   }
>> +}
>>
>> this means notrack is not recognized if fndecl is not NULL.  Note that only 
>> the
>> two callers know the real function type in effect (they call
>> gimple_call_set_fntype with it).  I suggest to pass down that type to
>> gimple_build_call_from_tree and move the gimple_call_set_fntype call
>> there as well.  And simply use the type for the above.
>
> The best way to say is notrack is not propagated if fndecl is not NULL. 
> Fndecl, if not NULL, is a direct call and notrack is not applicable for such 
> calls. I will add a comment before the if.

Hmm.  But what does this mean?  I guess direct calls are always
'notrack' then and thus we're fine
to ignore it.

> I would like to propose modifying the existing code without changing 
> interfaces. The idea is that at the time the notrack is propagated (the code 
> snippet above) the gimple call was created and the correct type was assigned 
> to the 'call' exactly by gimple_call_set_fntype. My proposal is to get the 
> type out of the gimple 'call' (like gimple_call_fntype) instead of the tree 
> 't'. Is it right?

Yes, but note that the type on the gimple 'call' isn't yet correctly
set (only the caller does that).  The gimple_build_call_from_tree
is really an ugly thing and it should be privatized inside the gimplifier...

>> +static inline bool
>> +gimple_call_with_notrack_p (const gimple *gs) {
>> +  const gcall *gc = GIMPLE_CHECK2 (gs);
>> +  return gimple_call_with_notrack_p (gc); }
>>
>> please do not add gimple * overloads for new APIs, instead make sure to
>> pass down gcalls at callers.
>
> Ok, I will remove.
>
>> Please change the names to omit 'with_', thus just notrack and
>> GF_CALL_NOTRACK.
>
> Ok, I will rename.
>
>> I think 

Re: [PATCH v2] Simplify pow with constant

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 2:47 PM, Wilco Dijkstra  wrote:
> Alexander Monakov wrote:
>>
>> Note this changes the outcome for C == +Inf, x == 0 (pow is specified to
>> return 1.0 in that case, but x * C1 == NaN).  There's another existing
>> transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is
>> not a new problem.
>>
>> The whole set of these match.pd transforms is guarded by
>> flag_unsafe_math_optimizations, which is a bit strange, on the one hand
>> it does not include -ffinite-math-only, but on the other hand it's
>> defined broadly enough to imply that.
>
> Yes I was assuming that unsafe_math_optimizations was enough for these
> transformations to be... safe. I've added an isfinite check so that case 
> works fine.
> It looks we need to go through the more complex transformations (especially
> given pow has so many special cases) and add more finite_math checks.
> Here's the new version:
>
>
> This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C).
> Do this only for fast-math as accuracy is reduced.  This is much faster
> since pow is more complex than exp - with current GLIBC the speedup is
> more than 7 times for this transformation.
>
> The worst-case ULP error of the transformation for powf (10.0, x) in SPEC
> was 2.5.  If we allow use of exp10 in match.pd, the ULP error would be lower.

You can use exp10/pow10 in match.pd but it will only match if the programmer
already uses those functions as they are not C standard.  There's also exp2
which is a C99 function.  Is there any good reason to convert exp (C * x)
to exp2 (C' * x)?  (besides if C' becomes 1)

So eventually we can match pow (10., x) -> exp10 and pow (2., x) -> exp2
(though I believe in canonicalization as well -- all exp calls could
be canonicalized
to pow calls to simplify the cases we need to handle...).

The patch is ok.

Thanks,
Richard.

> ChangeLog:
> 2017-08-18  Wilco Dijkstra  
>
> * match.pd: Add pow (C, x) simplification.
> --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 0e36f46b914bc63c257cef47152ab1aa507963e5..a5552c5096de5100a882d52add6b620ba87c1f72
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3622,6 +3622,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (logs (pows @0 @1))
> (mult @1 (logs @0
>
> + /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
> + (for pows (POW)
> +  exps (EXP)
> +  logs (LOG)
> +  (simplify
> +   (pows REAL_CST@0 @1)
> +(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
> +&& real_isfinite (TREE_REAL_CST_PTR (@0)))
> + (exps (mult (logs @0) @1)
> +
>   (for sqrts (SQRT)
>cbrts (CBRT)
>pows (POW)
>
>


RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-08-18 Thread Tsimbalist, Igor V
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, August 15, 2017 3:43 PM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> 
> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V
>  wrote:
> > Part#1. Add generic part for Intel CET enabling.
> >
> > The spec is available at
> >
> > https://software.intel.com/sites/default/files/managed/4d/2a/control-f
> > low-enforcement-technology-preview.pdf
> >
> > High-level design.
> > --
> >
> > A proposal is to introduce a target independent flag
> > -finstrument-control-flow with a semantic to instrument a code to
> > control validness or integrity of control-flow transfers using jump
> > and call instructions. The main goal is to detect and block a possible
> > malware execution through transfer the execution to unknown target
> > address. Implementation could be either software or target based. Any
> > target platforms can provide their implementation for instrumentation
> > under this option.
> >
> > When the -finstrument-control-flow flag is set each implementation has
> > to check if a support exists for a target platform and report an error
> > if no support is found.
> >
> > The compiler should instrument any control-flow transfer points in a
> > program (ex. call/jmp/ret) as well as any landing pads, which are
> > targets of for control-flow transfers.
> >
> > A new 'notrack' attribute is introduced to provide hand tuning support.
> > The attribute directs the compiler to skip a call to a function and a
> > function's landing pad from instrumentation (tracking). The attribute
> > can be used for function and pointer to function types, otherwise it
> > will be ignored. The attribute is saved in a type and propagated to a
> > GIMPLE call statement and later to a call instruction.
> >
> > Currently all platforms except i386 will report the error and do no
> > instrumentation. i386 will provide the implementation based on a
> > specification published by Intel for a new technology called
> > Control-flow Enforcement Technology (CET).
> 
> diff --git a/gcc/gimple.c b/gcc/gimple.c index 479f90c..2e4ab2d 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -378,6 +378,23 @@ gimple_build_call_from_tree (tree t)
>gimple_set_no_warning (call, TREE_NO_WARNING (t));
>gimple_call_set_with_bounds (call, CALL_WITH_BOUNDS_P (t));
> 
> +  if (fndecl == NULL_TREE)
> +{
> +  /* Find the type of an indirect call.  */
> +  tree addr = CALL_EXPR_FN (t);
> +  if (TREE_CODE (addr) != FUNCTION_DECL)
> +   {
> + tree fntype = TREE_TYPE (addr);
> + gcc_assert (POINTER_TYPE_P (fntype));
> + fntype = TREE_TYPE (fntype);
> +
> + /* Check if its type has the no-track attribute and propagate
> +it to the CALL insn.  */
> + if (lookup_attribute ("notrack", TYPE_ATTRIBUTES (fntype)))
> +   gimple_call_set_with_notrack (call, TRUE);
> +   }
> +}
> 
> this means notrack is not recognized if fndecl is not NULL.  Note that only 
> the
> two callers know the real function type in effect (they call
> gimple_call_set_fntype with it).  I suggest to pass down that type to
> gimple_build_call_from_tree and move the gimple_call_set_fntype call
> there as well.  And simply use the type for the above.

The best way to say is notrack is not propagated if fndecl is not NULL. Fndecl, 
if not NULL, is a direct call and notrack is not applicable for such calls. I 
will add a comment before the if.

I would like to propose modifying the existing code without changing 
interfaces. The idea is that at the time the notrack is propagated (the code 
snippet above) the gimple call was created and the correct type was assigned to 
the 'call' exactly by gimple_call_set_fntype. My proposal is to get the type 
out of the gimple 'call' (like gimple_call_fntype) instead of the tree 't'. Is 
it right?

> +static inline bool
> +gimple_call_with_notrack_p (const gimple *gs) {
> +  const gcall *gc = GIMPLE_CHECK2 (gs);
> +  return gimple_call_with_notrack_p (gc); }
> 
> please do not add gimple * overloads for new APIs, instead make sure to
> pass down gcalls at callers.

Ok, I will remove.

> Please change the names to omit 'with_', thus just notrack and
> GF_CALL_NOTRACK.

Ok, I will rename.

> I think 'notrack' is somewhat unspecific of a name, what prevented you to
> use 'nocet'?

Actually it's specific. The HW will have a prefix with exactly this name and 
the same meaning. And I think, what is more important, 'track/notrack' gives 
better semantic for a user. CET is a name bound with Intel specific technology.

> Any idea how to implement a software-based solution efficiently?
> Creating a table of valid destination addresses in a special section should be
> possible without too much work, am I right in that only indirect 

Re: [PATCH] detect incompatible aliases (PR c/81854)

2017-08-18 Thread Jonathan Wakely

On 17/08/17 21:21 -0600, Martin Sebor wrote:

Joseph, while looking into implementing enhancement your request
pr81824 I noticed that GCC silently accepts incompatible alias
declarations (pr81854) so as sort of a proof-concept for the
former I enhanced the checking already done for other kinds of
incompatibilities to also detect those mentioned in the latter
bug.  Attached is this patch, tested on x85_64-linux.

Jonathan, the patch requires suppressing the warning in libstdc++
compatibility symbol definitions in compatibility.cc.  I couldn't
find a way to do it without the suppression but I'd be happy to
try again if you have an idea for how.


Doing it that way is fine, but ...


diff --git a/libstdc++-v3/src/c++98/compatibility.cc 
b/libstdc++-v3/src/c++98/compatibility.cc
index 381f4c4..5f56b9e 100644
--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -213,6 +213,11 @@ _ZNSt19istreambuf_iteratorIcSt11char_traitsIcEEppEv
_ZNSt19istreambuf_iteratorIwSt11char_traitsIwEEppEv
 */

+// Disable warning about declaring aliases between functions with
+// incompatible types.
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattributes"
+


Could this be moved closer to the point where it's needed?

It's not needed until after line 361, right?


namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -509,6 +514,9 @@ _GLIBCXX_MATHL_WRAPPER1 (tan, GLIBCXX_3.4);
#endif
#endif // _GLIBCXX_LONG_DOUBLE_COMPAT

+// Restore disable -Wattributes
+#pragma GCC diagnostic pop
+
#endif

#ifdef _GLIBCXX_LONG_DOUBLE_COMPAT





Re: [PATCH v2] Simplify pow with constant

2017-08-18 Thread Wilco Dijkstra
Alexander Monakov wrote:
>
> Note this changes the outcome for C == +Inf, x == 0 (pow is specified to
> return 1.0 in that case, but x * C1 == NaN).  There's another existing
> transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is
> not a new problem.
>
> The whole set of these match.pd transforms is guarded by
> flag_unsafe_math_optimizations, which is a bit strange, on the one hand
> it does not include -ffinite-math-only, but on the other hand it's
> defined broadly enough to imply that.

Yes I was assuming that unsafe_math_optimizations was enough for these
transformations to be... safe. I've added an isfinite check so that case works 
fine.
It looks we need to go through the more complex transformations (especially
given pow has so many special cases) and add more finite_math checks.
Here's the new version:


This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C).
Do this only for fast-math as accuracy is reduced.  This is much faster
since pow is more complex than exp - with current GLIBC the speedup is
more than 7 times for this transformation.

The worst-case ULP error of the transformation for powf (10.0, x) in SPEC
was 2.5.  If we allow use of exp10 in match.pd, the ULP error would be lower.

ChangeLog:
2017-08-18  Wilco Dijkstra  

* match.pd: Add pow (C, x) simplification.
--
diff --git a/gcc/match.pd b/gcc/match.pd
index 
0e36f46b914bc63c257cef47152ab1aa507963e5..a5552c5096de5100a882d52add6b620ba87c1f72
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3622,6 +3622,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(logs (pows @0 @1))
(mult @1 (logs @0
 
+ /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
+ (for pows (POW)
+  exps (EXP)
+  logs (LOG)
+  (simplify
+   (pows REAL_CST@0 @1)
+(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
+&& real_isfinite (TREE_REAL_CST_PTR (@0)))
+ (exps (mult (logs @0) @1)
+
  (for sqrts (SQRT)
   cbrts (CBRT)
   pows (POW)



Re: C PATCH to remove unused block of code

2017-08-18 Thread Marek Polacek
On Thu, Aug 17, 2017 at 02:17:31PM +, Joseph Myers wrote:
> On Thu, 17 Aug 2017, Marek Polacek wrote:
> 
> > I've been itching to remove this code for some time now.  The comment 
> > suggests
> > that the code is actually unused, so I replaced the body of that "else if" 
> > with
> > gcc_unreachable (); and ran regtest/bootstrap and nothing broke, so I 
> > propose
> > to do away with it.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> OK, with an appropriate change to the comment above the 
> c_parser_postfix_expression function to say that compound literals are not 
> handled here and callers have to call 
> c_parser_postfix_expression_after_paren_type on encountering them.  (I've 
> reviewed all paths to c_parser_postfix_expression in the current parser 
> and don't think any of them can send compound literals to it, because 
> either they are parsing a specific more restricted syntax incompatible 
> with compond literals (OMP cases), are parsing an expression that might be 
> a cast expression so need to parse the (type) first to distinguish, or are 
> parsing a context such as sizeof where a parenthesized type name is 
> allowed as well as a postfix expression and again need to parse the (type) 
> first to distinguish.)

Thanks for verifying.  Here's what I'm going to install:

2017-08-18  Marek Polacek  

* c-parser.c (c_parser_postfix_expression): Remove unused code.  Update
commentary.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 1402ba67204..5c965d4420d 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -7752,7 +7752,8 @@ c_parser_generic_selection (c_parser *parser)
 }
 
 /* Parse a postfix expression (C90 6.3.1-6.3.2, C99 6.5.1-6.5.2,
-   C11 6.5.1-6.5.2).
+   C11 6.5.1-6.5.2).  Compound literals aren't handled here; callers have to
+   call c_parser_postfix_expression_after_paren_type on encountering them.
 
postfix-expression:
  primary-expression
@@ -7792,7 +7793,7 @@ c_parser_generic_selection (c_parser *parser)
  __builtin_types_compatible_p ( type-name , type-name )
  __builtin_complex ( assignment-expression , assignment-expression )
  __builtin_shuffle ( assignment-expression , assignment-expression )
- __builtin_shuffle ( assignment-expression , 
+ __builtin_shuffle ( assignment-expression ,
 assignment-expression ,
 assignment-expression, )
 
@@ -7943,28 +7944,6 @@ c_parser_postfix_expression (c_parser *parser)
  set_c_expr_source_range (, loc, close_loc);
  mark_exp_read (expr.value);
}
-  else if (c_token_starts_typename (c_parser_peek_2nd_token (parser)))
-   {
- /* A compound literal.  ??? Can we actually get here rather
-than going directly to
-c_parser_postfix_expression_after_paren_type from
-elsewhere?  */
- location_t loc;
- struct c_type_name *type_name;
- c_parser_consume_token (parser);
- loc = c_parser_peek_token (parser)->location;
- type_name = c_parser_type_name (parser);
- c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
-"expected %<)%>");
- if (type_name == NULL)
-   {
- expr.set_error ();
-   }
- else
-   expr = c_parser_postfix_expression_after_paren_type (parser,
-type_name,
-loc);
-   }
   else
{
  /* A parenthesized expression.  */

Marek


Re: Add a full_integral_type_p helper function

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 1:04 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Aug 18, 2017 at 10:10 AM, Richard Sandiford
>>  wrote:
>>> There are several places that test whether:
>>>
>>> TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t))
>>>
>>> for some integer type T.  With SVE variable-length modes, this would
>>> need to become:
>>>
>>> TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t))
>>>
>>> (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case).
>>> But rather than add the "SCALAR_" everywhere, it seemed neater to
>>> introduce a new helper function that tests whether T is an integral
>>> type that has the same number of bits as its underlying mode.  This
>>> patch does that, calling it full_integral_type_p.
>>>
>>> It isn't possible to use TYPE_MODE in tree.h because vector_type_mode
>>> is defined in stor-layout.h, so for now the function accesses the mode
>>> field directly.  After the 77-patch machine_mode series (thanks again
>>> Jeff for the reviews) it would use SCALAR_TYPE_MODE instead.
>>>
>>> Of the changes that didn't previously have an INTEGRAL_TYPE_P check:
>>>
>>> - for fold_single_bit_test_into_sign_test it is obvious from the
>>>   integer_foop tests that this is restricted to integral types.
>>>
>>> - vect_recog_vector_vector_shift_pattern is inherently restricted
>>>   to integral types.
>>>
>>> - the register_edge_assert_for_2 hunk is dominated by:
>>>
>>>   TREE_CODE (val) == INTEGER_CST
>>>
>>> - the ubsan_instrument_shift hunk is preceded by an early exit:
>>>
>>>   if (!INTEGRAL_TYPE_P (type0))
>>> return NULL_TREE;
>>>
>>> - the second and third match.pd hunks are from:
>>>
>>> /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
>>> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>>>if the new mask might be further optimized.  */
>>>
>>> I'm a bit confused about:
>>>
>>> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>>>when profitable.
>>>For bitwise binary operations apply operand conversions to the
>>>binary operation result instead of to the operands.  This allows
>>>to combine successive conversions and bitwise binary operations.
>>>We combine the above two cases by using a conditional convert.  */
>>> (for bitop (bit_and bit_ior bit_xor)
>>>  (simplify
>>>   (bitop (convert @0) (convert? @1))
>>>   (if (((TREE_CODE (@1) == INTEGER_CST
>>>  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>>>  && int_fits_type_p (@1, TREE_TYPE (@0)))
>>> || types_match (@0, @1))
>>>/* ???  This transform conflicts with fold-const.c doing
>>>   Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>   constants (if x has signed type, the sign bit cannot be set
>>>   in c).  This folds extension into the BIT_AND_EXPR.
>>>   Restrict it to GIMPLE to avoid endless recursions.  */
>>>&& (bitop != BIT_AND_EXPR || GIMPLE)
>>>&& (/* That's a good idea if the conversion widens the operand, thus
>>>   after hoisting the conversion the operation will be narrower. 
>>>  */
>>>TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type)
>>>/* It's also a good idea if the conversion is to a non-integer
>>>   mode.  */
>>>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
>>>/* Or if the precision of TO is not the same as the precision
>>>   of its mode.  */
>>>|| TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE 
>>> (type
>>>(convert (bitop @0 (convert @1))
>>>
>>> though.  The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't
>>> rely on @0 and @1 being integral (although conversions from float would
>>> use FLOAT_EXPR), but then what is:
>>
>> bit_and is valid on POINTER_TYPE and vector integer types
>>
>>>
>>>/* It's also a good idea if the conversion is to a non-integer
>>>   mode.  */
>>>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
>>>
>>> letting through?  MODE_PARTIAL_INT maybe, but that's a sort of integer
>>> mode too.  MODE_COMPLEX_INT or MODE_VECTOR_INT?  I thought for those
>>> it would be better to apply the scalar rules to the element type.
>>
>> I suppose extra caution ;)  I think I have seen BLKmode for not naturally
>> aligned integer types at least on strict-align targets?  The code is
>> a copy from original code in tree-ssa-forwprop.c.
>>
>>> Either way, having allowed all non-INT modes, using full_integral_type_p
>>> for the remaining condition seems correct.
>>>
>>> If the feeling is that this isn't a useful abstraction, I can just update
>>> each site individually to cope with variable-sized modes.
>>
>> I think "full_integral_type_p" is a name from which I cannot infer
>> its meaning.  Maybe type_has_mode_precision_p? 

Re: [PATCH] detect incompatible aliases (PR c/81854)

2017-08-18 Thread Joseph Myers
On Thu, 17 Aug 2017, Martin Sebor wrote:

> PPS To make use of this checking (and compile without the new
> warnings) Glibc needs the following patch:

This should be submitted to libc-alpha (independently of the GCC patch, 
presuming existing GCC versions accept the correct types).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Add a full_integral_type_p helper function

2017-08-18 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Aug 18, 2017 at 10:10 AM, Richard Sandiford
>  wrote:
>> There are several places that test whether:
>>
>> TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t))
>>
>> for some integer type T.  With SVE variable-length modes, this would
>> need to become:
>>
>> TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t))
>>
>> (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case).
>> But rather than add the "SCALAR_" everywhere, it seemed neater to
>> introduce a new helper function that tests whether T is an integral
>> type that has the same number of bits as its underlying mode.  This
>> patch does that, calling it full_integral_type_p.
>>
>> It isn't possible to use TYPE_MODE in tree.h because vector_type_mode
>> is defined in stor-layout.h, so for now the function accesses the mode
>> field directly.  After the 77-patch machine_mode series (thanks again
>> Jeff for the reviews) it would use SCALAR_TYPE_MODE instead.
>>
>> Of the changes that didn't previously have an INTEGRAL_TYPE_P check:
>>
>> - for fold_single_bit_test_into_sign_test it is obvious from the
>>   integer_foop tests that this is restricted to integral types.
>>
>> - vect_recog_vector_vector_shift_pattern is inherently restricted
>>   to integral types.
>>
>> - the register_edge_assert_for_2 hunk is dominated by:
>>
>>   TREE_CODE (val) == INTEGER_CST
>>
>> - the ubsan_instrument_shift hunk is preceded by an early exit:
>>
>>   if (!INTEGRAL_TYPE_P (type0))
>> return NULL_TREE;
>>
>> - the second and third match.pd hunks are from:
>>
>> /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
>> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>>if the new mask might be further optimized.  */
>>
>> I'm a bit confused about:
>>
>> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>>when profitable.
>>For bitwise binary operations apply operand conversions to the
>>binary operation result instead of to the operands.  This allows
>>to combine successive conversions and bitwise binary operations.
>>We combine the above two cases by using a conditional convert.  */
>> (for bitop (bit_and bit_ior bit_xor)
>>  (simplify
>>   (bitop (convert @0) (convert? @1))
>>   (if (((TREE_CODE (@1) == INTEGER_CST
>>  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>>  && int_fits_type_p (@1, TREE_TYPE (@0)))
>> || types_match (@0, @1))
>>/* ???  This transform conflicts with fold-const.c doing
>>   Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>   constants (if x has signed type, the sign bit cannot be set
>>   in c).  This folds extension into the BIT_AND_EXPR.
>>   Restrict it to GIMPLE to avoid endless recursions.  */
>>&& (bitop != BIT_AND_EXPR || GIMPLE)
>>&& (/* That's a good idea if the conversion widens the operand, thus
>>   after hoisting the conversion the operation will be narrower.  
>> */
>>TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type)
>>/* It's also a good idea if the conversion is to a non-integer
>>   mode.  */
>>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
>>/* Or if the precision of TO is not the same as the precision
>>   of its mode.  */
>>|| TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE 
>> (type
>>(convert (bitop @0 (convert @1))
>>
>> though.  The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't
>> rely on @0 and @1 being integral (although conversions from float would
>> use FLOAT_EXPR), but then what is:
>
> bit_and is valid on POINTER_TYPE and vector integer types
>
>>
>>/* It's also a good idea if the conversion is to a non-integer
>>   mode.  */
>>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
>>
>> letting through?  MODE_PARTIAL_INT maybe, but that's a sort of integer
>> mode too.  MODE_COMPLEX_INT or MODE_VECTOR_INT?  I thought for those
>> it would be better to apply the scalar rules to the element type.
>
> I suppose extra caution ;)  I think I have seen BLKmode for not naturally
> aligned integer types at least on strict-align targets?  The code is
> a copy from original code in tree-ssa-forwprop.c.
>
>> Either way, having allowed all non-INT modes, using full_integral_type_p
>> for the remaining condition seems correct.
>>
>> If the feeling is that this isn't a useful abstraction, I can just update
>> each site individually to cope with variable-sized modes.
>
> I think "full_integral_type_p" is a name from which I cannot infer
> its meaning.  Maybe type_has_mode_precision_p?  Or
> type_matches_mode_p?

type_has_mode_precision_p sounds good.  With that name I guess it
should be written to cope with all types (even those with variable-
width modes), so I think we'd need 

Re: Add a full_integral_type_p helper function

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 10:10 AM, Richard Sandiford
 wrote:
> There are several places that test whether:
>
> TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t))
>
> for some integer type T.  With SVE variable-length modes, this would
> need to become:
>
> TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t))
>
> (or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case).
> But rather than add the "SCALAR_" everywhere, it seemed neater to
> introduce a new helper function that tests whether T is an integral
> type that has the same number of bits as its underlying mode.  This
> patch does that, calling it full_integral_type_p.
>
> It isn't possible to use TYPE_MODE in tree.h because vector_type_mode
> is defined in stor-layout.h, so for now the function accesses the mode
> field directly.  After the 77-patch machine_mode series (thanks again
> Jeff for the reviews) it would use SCALAR_TYPE_MODE instead.
>
> Of the changes that didn't previously have an INTEGRAL_TYPE_P check:
>
> - for fold_single_bit_test_into_sign_test it is obvious from the
>   integer_foop tests that this is restricted to integral types.
>
> - vect_recog_vector_vector_shift_pattern is inherently restricted
>   to integral types.
>
> - the register_edge_assert_for_2 hunk is dominated by:
>
>   TREE_CODE (val) == INTEGER_CST
>
> - the ubsan_instrument_shift hunk is preceded by an early exit:
>
>   if (!INTEGRAL_TYPE_P (type0))
> return NULL_TREE;
>
> - the second and third match.pd hunks are from:
>
> /* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
> (X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
>if the new mask might be further optimized.  */
>
> I'm a bit confused about:
>
> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>when profitable.
>For bitwise binary operations apply operand conversions to the
>binary operation result instead of to the operands.  This allows
>to combine successive conversions and bitwise binary operations.
>We combine the above two cases by using a conditional convert.  */
> (for bitop (bit_and bit_ior bit_xor)
>  (simplify
>   (bitop (convert @0) (convert? @1))
>   (if (((TREE_CODE (@1) == INTEGER_CST
>  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>  && int_fits_type_p (@1, TREE_TYPE (@0)))
> || types_match (@0, @1))
>/* ???  This transform conflicts with fold-const.c doing
>   Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>   constants (if x has signed type, the sign bit cannot be set
>   in c).  This folds extension into the BIT_AND_EXPR.
>   Restrict it to GIMPLE to avoid endless recursions.  */
>&& (bitop != BIT_AND_EXPR || GIMPLE)
>&& (/* That's a good idea if the conversion widens the operand, thus
>   after hoisting the conversion the operation will be narrower.  
> */
>TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type)
>/* It's also a good idea if the conversion is to a non-integer
>   mode.  */
>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
>/* Or if the precision of TO is not the same as the precision
>   of its mode.  */
>|| TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE (type
>(convert (bitop @0 (convert @1))
>
> though.  The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't
> rely on @0 and @1 being integral (although conversions from float would
> use FLOAT_EXPR), but then what is:

bit_and is valid on POINTER_TYPE and vector integer types

>
>/* It's also a good idea if the conversion is to a non-integer
>   mode.  */
>|| GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
>
> letting through?  MODE_PARTIAL_INT maybe, but that's a sort of integer
> mode too.  MODE_COMPLEX_INT or MODE_VECTOR_INT?  I thought for those
> it would be better to apply the scalar rules to the element type.

I suppose extra caution ;)  I think I have seen BLKmode for not naturally
aligned integer types at least on strict-align targets?  The code is
a copy from original code in tree-ssa-forwprop.c.

> Either way, having allowed all non-INT modes, using full_integral_type_p
> for the remaining condition seems correct.
>
> If the feeling is that this isn't a useful abstraction, I can just update
> each site individually to cope with variable-sized modes.

I think "full_integral_type_p" is a name from which I cannot infer
its meaning.  Maybe type_has_mode_precision_p?  Or
type_matches_mode_p?  Does TYPE_PRECISION == GET_MODE_PRECISION
imply TYPE_SIZE == GET_MODE_BITSIZE btw?

Richard.

> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2017-08-18  Richard Sandiford  
> Alan Hayward  
> David Sherwood  

Re: PR81635: Use chrecs to help find related data refs

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
 wrote:
> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
  wrote:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.

 Didn't read the change in detail, so sorry if I mis-understood the 
 issue.
 I made changes in scev to better fold type conversion by various 
 sources
 of information, for example, vrp, niters, undefined overflow behavior 
 etc.
 In theory these information should be available for other
 optimizers without
 querying scev.  For the mentioned test, vrp should compute accurate 
 range
 information for "i" so that we can fold (intptr_t) (i + 1) it without
 worrying
 overflow.  Note we don't do it in generic folding because
 (intptr_t) (i) + 1
 could be more expensive (especially in case of (T)(i + j)), or because 
 the
 CST part is in bigger precision after conversion.
 But such folding is wanted in several places, e.g, IVOPTs.  To provide 
 such
 an interface, we changed tree-affine and made it do aggressive fold.  
 I am
 curious if it's possible to use aff_tree to implement 
 strip_constant_offset
 here since aggressive folding is wanted.  After all, using additional 
 chrec
 looks like a little heavy wrto the simple test.
>>>
>>> Yeah, using aff_tree does work here when the bounds are constant.
>>> It doesn't look like it works for things like:
>>>
>>> double p[1000];
>>> double q[1000];
>>>
>>> void
>>> f4 (unsigned int n)
>>> {
>>>   for (unsigned int i = 0; i < n; i += 4)
>>> {
>>>   double a = q[i] + p[i];
>>>   double b = q[i + 1] + p[i + 1];
>>>   q[i] = a;
>>>   q[i + 1] = b;
>>> }
>>> }
>>>
>>> though, where the bounds on the global arrays guarantee that [i + 1] 
>>> can't
>>> overflow, even though "n" is unconstrained.  The patch as posted handles
>>> this case too.
>> BTW is this a missed optimization in value range analysis?  The range
>> information for i should flow in a way like: array boundary -> niters
>> -> scev/vrp.
>> I think that's what niters/scev do in analysis.
>
> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
> the previous VRP pass came before loop header copying, so the (single)
> header has to cope with n == 0 case.  Thus we get:
 Ah, there are several passes in between vrp and pass_ch, not sure if
 any such pass depends on vrp intensively.  I would suggestion reorder
 the two passes, or standalone VRP interface updating information for
 loop region after header copied?   This is a non-trivial issue that
 needs to be fixed.  Niters analyzer rely on
 simplify_using_initial_conditions heavily to get the same information,
 which in my opinion should be provided by VRP.  Though this won't be
 able to obsolete simplify_using_initial_conditions because VRP is weak
 in symbolic range...

>
>   Visiting statement:
>   i_15 = ASSERT_EXPR ;
>   Intersecting
> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
> [0, 0]
>   to
> [0, 0]  EQUIVALENCES: { i_6 } 

Re: PR81635: Use chrecs to help find related data refs

2017-08-18 Thread Richard Biener
On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>>  wrote:
 The first loop in the testcase regressed after my recent changes to
 dr_analyze_innermost.  Previously we would treat "i" as an iv even
 for bb analysis and end up with:

DR_BASE_ADDRESS: p or q
DR_OFFSET: 0
DR_INIT: 0 or 4
DR_STEP: 16

 We now always keep the step as 0 instead, so for an int "i" we'd have:

DR_BASE_ADDRESS: p or q
DR_OFFSET: (intptr_t) i
DR_INIT: 0 or 4
DR_STEP: 0

 This is also what we'd like to have for the unsigned "i", but the
 problem is that strip_constant_offset thinks that the "i + 1" in
 "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
 The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
 name that holds "(intptr_t) (i + 1)", meaning that the accesses no
 longer seem to be related to the [i] ones.
>>>
>>> Didn't read the change in detail, so sorry if I mis-understood the 
>>> issue.
>>> I made changes in scev to better fold type conversion by various sources
>>> of information, for example, vrp, niters, undefined overflow behavior 
>>> etc.
>>> In theory these information should be available for other
>>> optimizers without
>>> querying scev.  For the mentioned test, vrp should compute accurate 
>>> range
>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>> worrying
>>> overflow.  Note we don't do it in generic folding because
>>> (intptr_t) (i) + 1
>>> could be more expensive (especially in case of (T)(i + j)), or because 
>>> the
>>> CST part is in bigger precision after conversion.
>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide 
>>> such
>>> an interface, we changed tree-affine and made it do aggressive fold.  I 
>>> am
>>> curious if it's possible to use aff_tree to implement 
>>> strip_constant_offset
>>> here since aggressive folding is wanted.  After all, using additional 
>>> chrec
>>> looks like a little heavy wrto the simple test.
>>
>> Yeah, using aff_tree does work here when the bounds are constant.
>> It doesn't look like it works for things like:
>>
>> double p[1000];
>> double q[1000];
>>
>> void
>> f4 (unsigned int n)
>> {
>>   for (unsigned int i = 0; i < n; i += 4)
>> {
>>   double a = q[i] + p[i];
>>   double b = q[i + 1] + p[i + 1];
>>   q[i] = a;
>>   q[i + 1] = b;
>> }
>> }
>>
>> though, where the bounds on the global arrays guarantee that [i + 1] 
>> can't
>> overflow, even though "n" is unconstrained.  The patch as posted handles
>> this case too.
> BTW is this a missed optimization in value range analysis?  The range
> information for i should flow in a way like: array boundary -> niters
> -> scev/vrp.
> I think that's what niters/scev do in analysis.

 Yeah, maybe :-)  It looks like the problem is that when SLP runs,
 the previous VRP pass came before loop header copying, so the (single)
 header has to cope with n == 0 case.  Thus we get:
>>> Ah, there are several passes in between vrp and pass_ch, not sure if
>>> any such pass depends on vrp intensively.  I would suggestion reorder
>>> the two passes, or standalone VRP interface updating information for
>>> loop region after header copied?   This is a non-trivial issue that
>>> needs to be fixed.  Niters analyzer rely on
>>> simplify_using_initial_conditions heavily to get the same information,
>>> which in my opinion should be provided by VRP.  Though this won't be
>>> able to obsolete simplify_using_initial_conditions because VRP is weak
>>> in symbolic range...
>>>

   Visiting statement:
   i_15 = ASSERT_EXPR ;
   Intersecting
 [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
   and
 [0, 0]
   to
 [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
   Intersecting
 [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
   and
 [0, 1000]
   to
 [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
   Found new range for 

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-18 Thread Martin Liška
On 08/18/2017 11:30 AM, Richard Biener wrote:
> On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška  wrote:
>> On 08/14/2017 10:32 AM, Richard Biener wrote:
>>> Hmm, but the existing "lowering" part is called from the
>>> switch-conversion pass.  So
>>> I'm not sure a new file is good.
>>
>> Good, I'm not against having that in a single file. So new version of the 
>> patch
>> does that.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> Hmm, I see you duplicate add_case_node for example.  Is that just temporary?
> If not can you please factor out the data structure and common code?
> (case.[Ch]?)

You are right. As we'll generate just jump table in stmt.c the proper fix is to 
remove
all usages of 'case_node' in the file because simple iteration of labels will 
work fine.
Let me do it incrementally to minimize fall out :)

Martin

> 
> Thanks,
> Richard.
> 
>> Martin



Re: [PATCH] Fix file find utils and add unit tests (PR driver/81829).

2017-08-18 Thread Martin Liška
On 08/15/2017 02:45 PM, Martin Liška wrote:
> Hi.
> 
> As shown in the PR, remove_prefix function is written wrongly. It does not 
> distinguish
> in between a node in linked list and pprefix->plist. So I decide to rewrite 
> it.
> Apart from that, I identified discrepancy in between do_add_prefix and 
> prefix_from_string
> where the later one appends always DIR_SEPARATOR (if missing). So I also the 
> former function.
> And I'm adding unit tests for all functions in file-find.c.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2017-08-14  Martin Liska  
> 
>   PR driver/81829
>   * file-find.c (do_add_prefix): Always append DIR_SEPARATOR
>   at the end of a prefix.
>   (remove_prefix): Properly remove elements and accept also
>   path without a trailing DIR_SEPARATOR.
>   (purge): New function.
>   (file_find_verify_prefix_creation): Likewise.
>   (file_find_verify_prefix_add): Likewise.
>   (file_find_verify_prefix_removal): Likewise.
>   (file_find_c_tests): Likewise.
>   * selftest-run-tests.c (selftest::run_tests): Add new
>   file_find_c_tests.
>   * selftest.h (file_find_c_tests): Likewise.
> ---
>  gcc/file-find.c  | 182 
> ++-
>  gcc/selftest-run-tests.c |   1 +
>  gcc/selftest.h   |   1 +
>  3 files changed, 167 insertions(+), 17 deletions(-)
> 
> 

Hi.

As doko pointed out, the first version was not correct. Let me describe 2 
scenarios which should
be supported:

1) my original motivation where I configure gcc w/ --prefix=/my_bin and I 
manually create in /my_bin/bin:

$ ln -s gcc-ar ar
$ ln -s gcc-nm nm
$ ln -s gcc-ranlib ranlib

and then is set PATH=/my_bin/bin:... and LD_LIBRARY_PATH and I don't have to 
care about NM=gcc-nm and so on.
That's how I usually test a newly built GCC.

2) using NM=gcc-nm and so on will force usage of LTO plugin. That's what was 
broken in first version of the patch.

In order to support both cases we need to identify real name of GCC wrapper 
(like /my_bin/bin/gcc-ar) and when
find_a_file (, real_exe_name, X_OK) will point to the same file, the 
directory should be ignored from prefix.

Can you please test attached patch in your scenario?
Thanks,
Martin
>From f8029ed6d3dd444ee2608146118f2189cf9ef0d8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Aug 2017 13:56:32 +0200
Subject: [PATCH] Fix file find utils and add unit tests (PR driver/81829).

gcc/ChangeLog:

2017-08-14  Martin Liska  

	PR driver/81829
	* file-find.c (do_add_prefix): Always append DIR_SEPARATOR
	at the end of a prefix.
	(remove_prefix): Properly remove elements and accept also
	path without a trailing DIR_SEPARATOR.
	(purge): New function.
	(file_find_verify_prefix_creation): Likewise.
	(file_find_verify_prefix_add): Likewise.
	(file_find_verify_prefix_removal): Likewise.
	(file_find_c_tests): Likewise.
	* selftest-run-tests.c (selftest::run_tests): Add new
	file_find_c_tests.
	* selftest.h (file_find_c_tests): Likewise.
---
 gcc/file-find.c  | 182 ++-
 gcc/gcc-ar.c |  19 +++--
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 4 files changed, 179 insertions(+), 24 deletions(-)

diff --git a/gcc/file-find.c b/gcc/file-find.c
index b072a4993d7..23a883042a2 100644
--- a/gcc/file-find.c
+++ b/gcc/file-find.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "filenames.h"
 #include "file-find.h"
+#include "selftest.h"
 
 static bool debug = false;
 
@@ -126,11 +127,22 @@ do_add_prefix (struct path_prefix *pprefix, const char *prefix, bool first)
   /* Keep track of the longest prefix.  */
 
   len = strlen (prefix);
+  bool append_separator = !IS_DIR_SEPARATOR (prefix[len - 1]);
+  if (append_separator)
+len++;
+
   if (len > pprefix->max_len)
 pprefix->max_len = len;
 
   pl = XNEW (struct prefix_list);
-  pl->prefix = xstrdup (prefix);
+  char *dup = XCNEWVEC (char, len + 1);
+  memcpy (dup, prefix, append_separator ? len - 1 : len);
+  if (append_separator)
+{
+  dup[len - 1] = DIR_SEPARATOR;
+  dup[len] = '\0';
+}
+  pl->prefix = dup;
 
   if (*prev)
 pl->next = *prev;
@@ -212,34 +224,170 @@ prefix_from_string (const char *p, struct path_prefix *pprefix)
 void
 remove_prefix (const char *prefix, struct path_prefix *pprefix)
 {
-  struct prefix_list *remove, **prev, **remove_prev = NULL;
+  char *dup = NULL;
   int max_len = 0;
+  size_t len = strlen (prefix);
+  if (prefix[len - 1] != DIR_SEPARATOR)
+{
+  char *dup = XNEWVEC (char, len + 2);
+  memcpy (dup, prefix, len);
+  dup[len] = DIR_SEPARATOR;
+  dup[len + 1] = '\0';
+  prefix = dup;
+}
 
   if (pprefix->plist)
 {
-  prev = >plist;
-  for (struct prefix_list *pl = pprefix->plist; 

Re: [PATCH] [PR79542][Ada] Fix ICE in dwarf2out.c with nested func. inlining

2017-08-18 Thread Richard Biener
On Tue, Aug 15, 2017 at 1:16 PM, Richard Biener
 wrote:
> On Sat, Aug 12, 2017 at 11:09 AM, Pierre-Marie de Rodat
>  wrote:
>> On 08/11/2017 11:29 PM, Jason Merrill wrote:
>>>
>>> OK.
>>
>>
>> Committed. Thank you for your sustained review effort, Jason. :-)
>
> The way you use decl_ultimate_origin conflicts with the early LTO
> debug patches which
> make dwarf2out_abstract_function call set_decl_origin_self and thus the assert
> in gen_typedef_die triggers (and the rest probably misbehaves).
>
> Now I wonder whether we at any point need that self-origin?
>
> Currently it's set via
>
> static dw_die_ref
> gen_decl_die (tree decl, tree origin, struct vlr_context *ctx,
>   dw_die_ref context_die)
> {
> ...
> case FUNCTION_DECL:
> #if 0
>   /* FIXME */
>   /* This doesn't work because the C frontend sets DECL_ABSTRACT_ORIGIN
>  on local redeclarations of global functions.  That seems broken.  */
>   if (current_function_decl != decl)
> /* This is only a declaration.  */;
> #endif
>
>   /* If we're emitting a clone, emit info for the abstract instance.  */
>   if (origin || DECL_ORIGIN (decl) != decl)
> dwarf2out_abstract_function (origin
>  ? DECL_ORIGIN (origin)
>  : DECL_ABSTRACT_ORIGIN (decl));
>
>   /* If we're emitting an out-of-line copy of an inline function,
>  emit info for the abstract instance and set up to refer to it.  */
>   else if (cgraph_function_possibly_inlined_p (decl)
>&& ! DECL_ABSTRACT_P (decl)
>&& ! class_or_namespace_scope_p (context_die)
>/* dwarf2out_abstract_function won't emit a die if this is just
>   a declaration.  We must avoid setting DECL_ABSTRACT_ORIGIN 
> in
>   that case, because that works only if we have a die.  */
>&& DECL_INITIAL (decl) != NULL_TREE)
> {
>   dwarf2out_abstract_function (decl);
>   set_decl_origin_self (decl);
> }
>
> ok, not doing this at all doesn't work, doing it only in the above case 
> neither.
>
> Bah.
>
> Can anyone explain to me why we do the set_decl_origin_self dance?

Ok, so I need the following incremental patch to fix the fallout.

This allows Ada LTO bootstrap to succeed with the early LTO debug patches.

I assume this change is ok ontop of the LTO debug patches unless I
hear otherwise
til Monday (when I then finally will commit the series).

Full bootstrap/testing running now.

Thanks,
Richard.

2017-08-18  Richard Biener  

* dwarf2out.c (modified_type_die): Check for self origin before
recursing.
(gen_type_die_with_usage): Likewise.
(gen_typedef_die): Allow self origin.
* tree.c (variably_modified_type_p): Guard against Ada recursive
pointer types.


p
Description: Binary data


Re: [patch 0/2] PR49847: Add hook to place read-only lookup-tables in named address-space

2017-08-18 Thread Richard Biener
On Wed, Aug 16, 2017 at 3:32 PM, Georg-Johann Lay  wrote:
> On 28.07.2017 09:34, Richard Biener wrote:
>>
>> On Thu, Jul 27, 2017 at 3:32 PM, Georg-Johann Lay  wrote:
>>>
>>> On 27.07.2017 14:34, Richard Biener wrote:


 On Thu, Jul 27, 2017 at 2:29 PM, Georg-Johann Lay  wrote:
>
>
> For some targets, the best place to put read-only lookup tables as
> generated by -ftree-switch-conversion is not the generic address space
> but some target specific address space.
>
> This is the case for AVR, where for most devices .rodata must be
> located in RAM.
>
> Part #1 adds a new, optional target hook that queries the back-end
> about the desired address space.  There is currently only one user:
> tree-switch-conversion.c::build_one_array() which re-builds value_type
> and array_type if the address space returned by the backend is not
> the generic one.
>
> Part #2 is the AVR part that implements the new hook and adds some
> sugar around it.



 Given that switch-conversion just creates a constant initializer doesn't
 AVR
 benefit from handling those uniformly (at RTL expansion?).  Not sure but
 I think it goes through the regular constant pool handling.

 Richard.
>>>
>>>
>>>
>>> avr doesn't use constant pools because they are not profitable for
>>> several reasons.
>>>
>>> Moreover, it's not sufficient to just patch the pool's section, you'd
>>> also have to patch *all* accesses to also use the exact same address
>>> space so that they use the correct instruction (LPM instead of LD).
>>
>>
>> Sure.  So then not handle it in constant pool handling but in expanding
>> the initializers of global vars.  I think the entry for this is
>> varasm.c:assemble_variable - that sets DECL_RTL for the variable.
>
>
> That cannot work, and here is why; just for completeness:
>
> cgraphunit.c::symbol_table::compile()
>
> runs
>
>   ...
>   expand_all_functions ();
>   output_variables (); // runs assemble_variable
>   ...
>
> So any patching of DECL_RTL will result in wrong code: The address
> space of the decl won't match the address space used by the code
> accessing decl.

Ok, so it's more tricky to hack it that way (you'd need to catch the
time the decl gets its DECL_RTL set, not sure where that happens
for globals).

That leaves doing a more broad transform.  One convenient place
to hook in would be the ipa_single_use pass which computes
the varpool_node.used_by_single_function flag.  All such variables
would be suitable for promotion (with some additional constraints
I suppose).  You'd add a transform phase to the pass that rewrites
the decls and performs GIMPLE IL adjustments (I think you need
to patch memory references to use the address-space).

Of course there would need to be a target hook determining
if/what section/address-space a variable should be promoted to
which somehow would allow switch-conversion to use that
as well.  Ho humm.

That said, do you think the switch-conversion created decl is
the only one that benefits from compiler-directed promotion
to different address-space?

Richard.

> Johann


Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-18 Thread Richard Biener
On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška  wrote:
> On 08/14/2017 10:32 AM, Richard Biener wrote:
>> Hmm, but the existing "lowering" part is called from the
>> switch-conversion pass.  So
>> I'm not sure a new file is good.
>
> Good, I'm not against having that in a single file. So new version of the 
> patch
> does that.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Hmm, I see you duplicate add_case_node for example.  Is that just temporary?
If not can you please factor out the data structure and common code?
(case.[Ch]?)

Thanks,
Richard.

> Martin


Re: [GCC RFC]Expensive internal function calls.

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 10:45 AM, Bin Cheng  wrote:
> Hi,
> As a followup patch for fix to PR81832, this patch considers internal 
> function call to
> IFN_LOOP_DIST_ALIAS and IFN_LOOP_VECTORIZED as expensive.  Or interpreted
> in another way: return false since we shouldn't be asked the question?  Any 
> comments?
> BTW, I have no problem to drop this if not appropriate.

I think as the meaning is to cost the actual call making it expensive
is wrong given it
will end up as constant.  So, no, this doesn't look correct.

Richard.

> Thanks,
> bin
> 2017-08-16  Bin Cheng  
>
> * gimple.c (gimple_inexpensive_call_p): Consider IFN_LOOP_DIST_ALIAS
> and IFN_LOOP_VECTORIZED as expensive calls.


[GCC RFC]Expensive internal function calls.

2017-08-18 Thread Bin Cheng
Hi,
As a followup patch for fix to PR81832, this patch considers internal function 
call to
IFN_LOOP_DIST_ALIAS and IFN_LOOP_VECTORIZED as expensive.  Or interpreted
in another way: return false since we shouldn't be asked the question?  Any 
comments?
BTW, I have no problem to drop this if not appropriate.

Thanks,
bin
2017-08-16  Bin Cheng  

* gimple.c (gimple_inexpensive_call_p): Consider IFN_LOOP_DIST_ALIAS
and IFN_LOOP_VECTORIZED as expensive calls.diff --git a/gcc/gimple.c b/gcc/gimple.c
index c4e6f81..6d4e376 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -3038,7 +3038,16 @@ bool
 gimple_inexpensive_call_p (gcall *stmt)
 {
   if (gimple_call_internal_p (stmt))
-return true;
+{
+  /* Some internal function calls are only meant to indicate temporary
+arrangement of optimization and are never used in code generation.
+We always consider these calls expensive.  */
+  if (gimple_call_internal_p (stmt, IFN_LOOP_DIST_ALIAS)
+ || gimple_call_internal_p (stmt, IFN_LOOP_VECTORIZED))
+   return false;
+
+  return true;
+}
   tree decl = gimple_call_fndecl (stmt);
   if (decl && is_inexpensive_builtin (decl))
 return true;


Re: [PATCH] PR c++/80287 add new testcase

2017-08-18 Thread Yvan Roux
On 4 August 2017 at 15:52, Yvan Roux  wrote:
> On 13 July 2017 at 14:02, Yvan Roux  wrote:
>> Hi,
>>
>> as discussed in the PR, this patch adds a new reduced testcase which
>> doesn't rely on c++17 features, this is a prereq to the backport of
>> the fix into GCC 6 branch which is impacted by this issue.
>>
>> Validated on x86, ARM and AArch64 targets.
>>
>> Ok for trunk ? and maybe on gcc-7-branch ?
>
> ping

ping

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00730.html

>> Thanks,
>> Yvan
>>
>> gcc/testsuite
>> 2017-07-13  Yvan Roux  
>>
>> PR c++/80287
>> * g++.dg/pr80287.C: New test.


Re: [Libgomp, Fortran] Fix canadian cross build

2017-08-18 Thread Yvan Roux
On 4 August 2017 at 15:52, Yvan Roux  wrote:
> On 11 July 2017 at 12:25, Yvan Roux  wrote:
>> On 3 July 2017 at 11:21, Yvan Roux  wrote:
>>> On 23 June 2017 at 15:44, Yvan Roux  wrote:
 Hello,

 Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are
 missing in a canadian cross build, at least when target gfortran
 compiler comes from PATH and not from GFORTRAN_FOR_TARGET.

 Back in 2010, executability test of GFORTRAN was added to fix libgomp
 build on cygwin, but when the executable doesn't contain the path,
 "test -x" fails and part of the library are not built.

 This patch fixes the issue by using M4 macro AC_PATH_PROG (which
 returns the absolute name) instead of AC_CHECK_PROG in the function
 defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS.  I renamed it
 into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4.

 Tested by building cross and candian cross toolchain (host:
 i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a
 complete libgomp.

 ok for trunk ?
>>>
>>> ping?
>>
>> ping?
>
> ping

ping

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html

>>>
 Thanks
 Yvan

 config/ChangeLog
 2017-06-23  Yvan Roux  

 * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
 (NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the 
 replacement
 of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the
 program.
 (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.

 ChangeLog
 2017-06-23  Yvan Roux  

 * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
 NCN_STRICT_CHECK_TARGET_TOOLS.
 * configure: Regenerate.


Add a partial_integral_type_p helper function

2017-08-18 Thread Richard Sandiford
This patch adds a partial_integral_type_p function, to go along
with the full_integral_type_p added by the previous patch.

Of the changes that didn't previously have an INTEGRAL_TYPE_P check:

- the convert_to_integer_1 hunks are dominated by a case version
  of INTEGRAL_TYPE_P.

- the merge_ranges hunk is dominated by an ENUMERAL_TYPE case.

- vectorizable_reduction has the comment:

  /* Do not try to vectorize bit-precision reductions.  */

  and so I think was only concerned with integers.

- vectorizable_assignment has the comment:

  /* We do not handle bit-precision changes.  */

  and the later:

  /* But a conversion that does not change the bit-pattern is ok.  */
  && !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
> TYPE_PRECISION (TREE_TYPE (op)))
   && TYPE_UNSIGNED (TREE_TYPE (op)))

  would only make sense if OP is also an integral type.

- vectorizable_shift is inherently restricted to integers.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-08-17  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (partial_integral_type_p): New function.
* convert.c (convert_to_integer_1): Use it.
* expr.c (store_fieldexpand_expr_real_2, expand_expr_real_1): Likewise.
* fold-const.c (merge_ranges): Likewise.
* tree-ssa-math-opts.c (convert_mult_to_fma): Likewise.
* tree-tailcall.c (process_assignment): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_conversion): Likewise.
(vectorizable_assignment, vectorizable_shift): Likewise.

Index: gcc/tree.h
===
--- gcc/tree.h  2017-08-18 08:35:58.031690315 +0100
+++ gcc/tree.h  2017-08-18 08:36:07.208306339 +0100
@@ -5414,4 +5414,13 @@ full_integral_type_p (const_tree t)
   return INTEGRAL_TYPE_P (t) && scalar_type_is_full_p (t);
 }
 
+/* Return true if T is an integral type that has fewer bits than
+   its underlying mode.  */
+
+inline bool
+partial_integral_type_p (const_tree t)
+{
+  return INTEGRAL_TYPE_P (t) && !scalar_type_is_full_p (t);
+}
+
 #endif  /* GCC_TREE_H  */
Index: gcc/convert.c
===
--- gcc/convert.c   2017-08-10 14:36:09.015436664 +0100
+++ gcc/convert.c   2017-08-18 08:36:07.203306339 +0100
@@ -711,8 +711,7 @@ convert_to_integer_1 (tree type, tree ex
 the signed-to-unsigned case the high-order bits have to
 be cleared.  */
  if (TYPE_UNSIGNED (type) != TYPE_UNSIGNED (TREE_TYPE (expr))
- && (TYPE_PRECISION (TREE_TYPE (expr))
- != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (expr)
+ && partial_integral_type_p (TREE_TYPE (expr)))
code = CONVERT_EXPR;
  else
code = NOP_EXPR;
@@ -725,7 +724,7 @@ convert_to_integer_1 (tree type, tree ex
 type corresponding to its mode, then do a nop conversion
 to TYPE.  */
   else if (TREE_CODE (type) == ENUMERAL_TYPE
-  || outprec != GET_MODE_PRECISION (TYPE_MODE (type)))
+  || partial_integral_type_p (type))
{
  expr = convert (lang_hooks.types.type_for_mode
  (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr);
Index: gcc/expr.c
===
--- gcc/expr.c  2017-08-03 10:40:54.807600276 +0100
+++ gcc/expr.c  2017-08-18 08:36:07.204306339 +0100
@@ -6834,8 +6834,7 @@ store_field (rtx target, HOST_WIDE_INT b
   if (nop_def)
{
  tree type = TREE_TYPE (exp);
- if (INTEGRAL_TYPE_P (type)
- && TYPE_PRECISION (type) < GET_MODE_BITSIZE (TYPE_MODE (type))
+ if (partial_integral_type_p (type)
  && bitsize == TYPE_PRECISION (type))
{
  tree op = gimple_assign_rhs1 (nop_def);
@@ -8243,8 +8242,7 @@ #define REDUCE_BIT_FIELD(expr)(reduce_b
   /* An operation in what may be a bit-field type needs the
  result to be reduced to the precision of the bit-field type,
  which is narrower than that of the type's mode.  */
-  reduce_bit_field = (INTEGRAL_TYPE_P (type)
- && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type));
+  reduce_bit_field = partial_integral_type_p (type);
 
   if (reduce_bit_field && modifier == EXPAND_STACK_PARM)
 target = 0;
@@ -9669,9 +9667,7 @@ expand_expr_real_1 (tree exp, rtx target
   /* An operation in what may be a bit-field type needs the
  result to be reduced to the precision of the bit-field type,
  which is narrower than that of the type's mode.  */
-  reduce_bit_field = (!ignore
- && INTEGRAL_TYPE_P (type)
- && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type));
+  

Add a full_integral_type_p helper function

2017-08-18 Thread Richard Sandiford
There are several places that test whether:

TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t))

for some integer type T.  With SVE variable-length modes, this would
need to become:

TYPE_PRECISION (t) == GET_MODE_PRECISION (SCALAR_TYPE_MODE (t))

(or SCALAR_INT_TYPE_MODE, it doesn't matter which in this case).
But rather than add the "SCALAR_" everywhere, it seemed neater to
introduce a new helper function that tests whether T is an integral
type that has the same number of bits as its underlying mode.  This
patch does that, calling it full_integral_type_p.

It isn't possible to use TYPE_MODE in tree.h because vector_type_mode
is defined in stor-layout.h, so for now the function accesses the mode
field directly.  After the 77-patch machine_mode series (thanks again
Jeff for the reviews) it would use SCALAR_TYPE_MODE instead.

Of the changes that didn't previously have an INTEGRAL_TYPE_P check:

- for fold_single_bit_test_into_sign_test it is obvious from the
  integer_foop tests that this is restricted to integral types.

- vect_recog_vector_vector_shift_pattern is inherently restricted
  to integral types.

- the register_edge_assert_for_2 hunk is dominated by:

  TREE_CODE (val) == INTEGER_CST

- the ubsan_instrument_shift hunk is preceded by an early exit:

  if (!INTEGRAL_TYPE_P (type0))
return NULL_TREE;

- the second and third match.pd hunks are from:

/* Fold (X << C1) & C2 into (X << C1) & (C2 | ((1 << C1) - 1))
(X >> C1) & C2 into (X >> C1) & (C2 | ~((type) -1 >> C1))
   if the new mask might be further optimized.  */

I'm a bit confused about:

/* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
   when profitable.
   For bitwise binary operations apply operand conversions to the
   binary operation result instead of to the operands.  This allows
   to combine successive conversions and bitwise binary operations.
   We combine the above two cases by using a conditional convert.  */
(for bitop (bit_and bit_ior bit_xor)
 (simplify
  (bitop (convert @0) (convert? @1))
  (if (((TREE_CODE (@1) == INTEGER_CST
 && INTEGRAL_TYPE_P (TREE_TYPE (@0))
 && int_fits_type_p (@1, TREE_TYPE (@0)))
|| types_match (@0, @1))
   /* ???  This transform conflicts with fold-const.c doing
  Convert (T)(x & c) into (T)x & (T)c, if c is an integer
  constants (if x has signed type, the sign bit cannot be set
  in c).  This folds extension into the BIT_AND_EXPR.
  Restrict it to GIMPLE to avoid endless recursions.  */
   && (bitop != BIT_AND_EXPR || GIMPLE)
   && (/* That's a good idea if the conversion widens the operand, thus
  after hoisting the conversion the operation will be narrower.  */
   TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (type)
   /* It's also a good idea if the conversion is to a non-integer
  mode.  */
   || GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT
   /* Or if the precision of TO is not the same as the precision
  of its mode.  */
   || TYPE_PRECISION (type) != GET_MODE_PRECISION (TYPE_MODE (type
   (convert (bitop @0 (convert @1))

though.  The "INTEGRAL_TYPE_P (TREE_TYPE (@0))" suggests that we can't
rely on @0 and @1 being integral (although conversions from float would
use FLOAT_EXPR), but then what is:

   /* It's also a good idea if the conversion is to a non-integer
  mode.  */
   || GET_MODE_CLASS (TYPE_MODE (type)) != MODE_INT

letting through?  MODE_PARTIAL_INT maybe, but that's a sort of integer
mode too.  MODE_COMPLEX_INT or MODE_VECTOR_INT?  I thought for those
it would be better to apply the scalar rules to the element type.

Either way, having allowed all non-INT modes, using full_integral_type_p
for the remaining condition seems correct.

If the feeling is that this isn't a useful abstraction, I can just update
each site individually to cope with variable-sized modes.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-08-18  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree.h (scalar_type_is_full_p): New function.
(full_integral_type_p): Likewise.
* fold-const.c (fold_single_bit_test_into_sign_test): Likewise.
* match.pd: Likewise.
* tree-ssa-forwprop.c (simplify_rotate): Likewise.
* tree-vect-patterns.c (vect_recog_vector_vector_shift_pattern)
(vect_recog_mult_pattern, vect_recog_divmod_pattern): Likewise.
(adjust_bool_pattern): Likewise.
* tree-vrp.c (register_edge_assert_for_2): Likewise.
* ubsan.c (instrument_si_overflow): Likewise.

gcc/c-family/
* c-ubsan.c (ubsan_instrument_shift): Use full_integral_type_p.

diff --git a/gcc/c-family/c-ubsan.c b/gcc/c-family/c-ubsan.c
index b1386db..20f78e7 100644
--- 

Re: [PATCH v2] Simplify pow with constant

2017-08-18 Thread Richard Biener
On Thu, Aug 17, 2017 at 5:43 PM, Alexander Monakov  wrote:
> On Thu, 17 Aug 2017, Wilco Dijkstra wrote:
>
>> This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C).
>
> Note this changes the outcome for C == +Inf, x == 0 (pow is specified to
> return 1.0 in that case, but x * C1 == NaN).  There's another existing
> transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is
> not a new problem.
>
> The whole set of these match.pd transforms is guarded by
> flag_unsafe_math_optimizations, which is a bit strange, on the one hand
> it does not include -ffinite-math-only, but on the other hand it's
> defined broadly enough to imply that.

No, flag_unsafe_math_optimization doesn't imply -ffinite-math-only.
-funsafe-math-optimization these days should guard transforms that
affect the value of the result (but not its classification).  There are
more specific variants like -fassociative-math which also affects
results but in a more specific way.

> (to be clear, I'm not objecting to the patch, just pointing out an
> existing inconsistency in case someone can offer clarifications)

We shouldn't add such issue knowingly, I guess for this case it's easy
to check that C is not +Inf.

For the existing transforms with the same issue guarding with
-ffinite-math-only is an appropriate fix.

Richard.

> Alexander


Re: [PATCH] correct documentation of attribute ifunc (PR 81882)

2017-08-18 Thread Andreas Schwab
On Aug 17 2017, Martin Sebor  wrote:

> -static void (*resolve_memcpy (void)) (void)
> +static void* (*resolve_memcpy (void))(void *, const void *, size_t)

Please use consistent spacing.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."