Re: Rb_tree constructor optimization

2017-10-18 Thread François Dumont

Hi

    Any feedback regarding this patch ?

Thanks,
François

On 14/09/2017 22:04, François Dumont wrote:
I realized there was no test on the noexcept qualification of the move 
constructor with allocator.


I added some and found out that patch was missing a noexcept 
qualification at _Rb_tree level.


Here is the updated patch fully tested, ok to commit ?

François


On 13/09/2017 21:57, François Dumont wrote:

On 08/09/2017 17:50, Jonathan Wakely wrote:


Since we know __a == __x.get_allocator() we could just do:

 _Rb_tree(_Rb_tree&& __x, _Node_allocator&&, true_type)
noexcept(is_nothrow_move_constructible<_Rb_tree_impl<_Compare>>::value)
 : _M_impl(std::move(__x._M_impl))
 { }

This means we don't need the new constructor.


    You want to consider that a always equal allocator is stateless 
and so that the provided allocator rvalue reference do not need to be 
moved. IMHO you can have allocator with state that do not participate 
in comparison like some monitoring info.


    I'm not confortable with that and prefer to keep current behavior 
so I propose this new patch considering all your other remarks.


    I change noexcept qualification on [multi]map/set constructors to 
just rely on _Rep_type constructor noexcept qualification to not 
duplicate it.


François








diff --git a/libstdc++-v3/include/bits/stl_map.h b/libstdc++-v3/include/bits/stl_map.h
index bad6020..e343f04 100644
--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -235,8 +235,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   map(map&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Pair_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Pair_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_multimap.h b/libstdc++-v3/include/bits/stl_multimap.h
index 6f5cb7a..8c7ae2c 100644
--- a/libstdc++-v3/include/bits/stl_multimap.h
+++ b/libstdc++-v3/include/bits/stl_multimap.h
@@ -232,8 +232,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   multimap(multimap&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Pair_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Pair_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h
index 517e77e..9ab4ab7 100644
--- a/libstdc++-v3/include/bits/stl_multiset.h
+++ b/libstdc++-v3/include/bits/stl_multiset.h
@@ -244,8 +244,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   multiset(multiset&& __m, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Key_alloc_type>())) )
   : _M_t(std::move(__m._M_t), _Key_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h
index e804a7c..6b64bcd 100644
--- a/libstdc++-v3/include/bits/stl_set.h
+++ b/libstdc++-v3/include/bits/stl_set.h
@@ -248,8 +248,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /// Allocator-extended move constructor.
   set(set&& __x, const allocator_type& __a)
-  noexcept(is_nothrow_copy_constructible<_Compare>::value
-	   && _Alloc_traits::_S_always_equal())
+  noexcept( noexcept(
+	_Rep_type(declval<_Rep_type>(), declval<_Key_alloc_type>())) )
   : _M_t(std::move(__x._M_t), _Key_alloc_type(__a)) { }
 
   /// Allocator-extended initialier-list constructor.
diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h
index c2417f1..ebfc3f9 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -704,6 +704,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #else
 	  _Rb_tree_impl(_Rb_tree_impl&&) = default;
 
+	  _Rb_tree_impl(_Rb_tree_impl&& __x, _Node_allocator&& __a)
+	  : _Node_allocator(std::move(__a)),
+	_Base_key_compare(std::move(__x)),
+	_Rb_tree_header(std::move(__x))
+	  { }
+
 	  _Rb_tree_impl(const _Key_compare& __comp, _Node_allocator&& __a)
 	  : _Node_allocator(std::move(__a)), _Base_key_compare(__comp)
 	  { }
@@ -944,10 +950,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Rb_tree(_Rb_tree&&) = default;
 
   _Rb_tree(_Rb_tree&& __x, const allocator_type& __a)
+  noexcept( noexcept(
+	_Rb_tree(declval<_Rb_tree>(), declval<_Node_allocator>())) )
   : 

Re: [patch] implement generic debug() functions for wide_int's

2017-10-18 Thread Martin Sebor

On 10/18/2017 11:08 AM, Aldy Hernandez wrote:

It looks like the generic debug() function for wide_int's is missing.
Instead, we have a wi->dump() method.  I have implemented debug() for
generic wide_int and for widest_int, which should cover the common
cases.  Anything else, can continue using the ->dump() method
templated methods.


Do you by any chance also plan to add support to the pretty printer
for wide_int (and offset_int)?  It would obviate having to convert
them to {s,u}hwi.

Martin


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-18 Thread Martin Sebor

On 10/18/2017 01:15 PM, Sam van Kampen wrote:

On Wed, Oct 18, 2017 at 09:46:08AM -0600, Martin Sebor wrote:

Fair enough, I didn't know whether to change the way it currently was
triggered. Do you think it should fall under -Wextra (I don't think it
falls under -Wall, since it isn't "easy to avoid or modify to prevent
the warning" because it may be valid and wanted behavior), or should it
be enabled by no other flag?


I think it depends on the implementation of the warning.  With
the current (fairly restrictive) behavior I'd say it should be
disabled by default.  But if it were to be changed to more closely
match the Clang behavior and only warn for bit-field declarations
that cannot represent all enumerators of the enumerated type, then
including it in -Wall would seem helpful to me.

I.e., Clang doesn't warn on this and IIUC that's what the reporter
of the bug also expects:

  enum E: unsigned { E3 = 15 };

  struct S { E i: 4; };

(There is value in warning on this as well, but I think most users
will not be interested in it, so making the warning a two-level one
where level 1 warns same as Clang and level 2 same as GCC does now
might give us the best of both worlds).


I see what you mean - that is the behavior I wanted to implement in the
first place, but Jonathan Wakely rightly pointed out that when an
enumeration is scoped, all values of its underlying type are valid
enumeration values, and so the bit-field you declare in 'S' _is_ too
small to hold all values of 'enum E'.

Here's the corresponding text from draft N4659 of the C++17 standard,
§10.2/8 [dcl.enum]

For an enumeration whose underlying type is fixed, the values of the
enumeration are the values of the underlying type. [...] It is possible
to define an enumeration that has values not defined by any of its
enumerators.

Still, warning when a bit-field can't hold all enumerators instead of
all values may be a good idea. I've looked into it, and it does require
recalculating the maximum and minimum enumerator value, since the bounds
of the underlying type are saved in TYPE_MIN_VALUE and TYPE_MAX_VALUE
when the enumeration is scoped, instead of the min/max enumerator value.

Would adding that separate warning level be part of a separate patch, or
should I add it to this one?


I think it makes sense to submit the feature it in its final form,
whatever you decide that will be.

Martin


Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread Jonathan Wakely

On 18/10/17 18:40 -0400, Hans-Peter Nilsson wrote:

On Wed, 18 Oct 2017, FX wrote:


Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
macOS 10.13 often fail (failure rate for ?make -j2? to ?make -j8? is about 60% 
from my own builds and results reported by others): 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797
This is reproducible with several versions of GNU make.

Changing libstdc++?s makefile to mark install-headers with .NOTPARALLEL fixes 
the issue. We've carried that patch in Homebrew (https://brew.sh) for a few 
months now, and have had no report of build issues since then.

Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
platforms). OK to commit?


Maybe moot now, but .NOTPARALLEL doesn't work the way you imply
in the patch.  From TFM of GNU Make 3.81 and 4: "Any
prerequisites on this target are ignored."

To wit, it seems you'd get the effect, but for *every target* in
this Makefile.  (If I were you and the patch is still proposed,
I'd remove the unused target and add a comment on the intent.)

Assuming the documentation and my reading is correct.


I think it is.



Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread Jonathan Wakely

On 18/10/17 22:05 +0100, Jonathan Wakely wrote:

On 18/10/17 16:51 +0200, FX wrote:

Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
macOS 10.13 often fail (failure rate for “make -j2” to “make -j8” is about 60% 
from my own builds and results reported by others): 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797
This is reproducible with several versions of GNU make.

Changing libstdc++’s makefile to mark install-headers with .NOTPARALLEL fixes 
the issue. We've carried that patch in Homebrew (https://brew.sh) for a few 
months now, and have had no report of build issues since then.

Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
platforms). OK to commit?


Could you test using .PHONY: install-headers instead?

That target *is* phony, so telling make that seems sensible.

Presumably the same change is needed for install-freestanding-headers,
since it's very similar.



.NOTPARALLEL ignores any prerequisites, so there's no point listing
install-headers as the prerequisite. Your patch disables parallelism
for all targets in the libstdc++-v3/install directory.

There must be another solution.




[PATCH] Use dlsym to check if libdl is needed for plugin

2017-10-18 Thread H.J. Lu
config/plugins.m4 has

 if test "$plugins" = "yes"; then
AC_SEARCH_LIBS([dlopen], [dl])
  fi

Plugin uses dlsym, but libasan.so only intercepts dlopen, not dlsym:

[hjl@gnu-tools-1 binutils-text]$ nm -D /lib64/libasan.so.4| grep " dl"
00038580 W dlclose
 U dl_iterate_phdr
0004dc50 W dlopen
 U dlsym
 U dlvsym
[hjl@gnu-tools-1 binutils-text]$

Testing dlopen for libdl leads to false negative when -fsanitize=address
is used.  It results in link failure:

../bfd/.libs/libbfd.a(plugin.o): undefined reference to symbol 
'dlsym@@GLIBC_2.16'

dlsym should be used to check if libdl is needed for plugin.

OK for master?

H.J.
---
bfd/

PR gas/22318
* configure: Regenerated.

binutils/

PR gas/22318
* configure: Regenerated.

config/

* plugins.m4 (AC_PLUGINS): Use dlsym to check if libdl is needed.

gas/

PR gas/22318
* configure: Regenerated.

gprof/

PR gas/22318
* configure: Regenerated.

ld/

PR gas/22318
* configure: Regenerated.
---
 bfd/configure  | 24 
 binutils/configure | 24 
 config/plugins.m4  |  2 +-
 gas/configure  | 24 
 gprof/configure| 24 
 ld/configure   | 24 
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/bfd/configure b/bfd/configure
index 32ee062e80..f019a61fb0 100755
--- a/bfd/configure
+++ b/bfd/configure
@@ -11826,9 +11826,9 @@ else
 fi
 
   if test "$plugins" = "yes"; then
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing 
dlopen" >&5
-$as_echo_n "checking for library containing dlopen... " >&6; }
-if test "${ac_cv_search_dlopen+set}" = set; then :
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing 
dlsym" >&5
+$as_echo_n "checking for library containing dlsym... " >&6; }
+if test "${ac_cv_search_dlsym+set}" = set; then :
   $as_echo_n "(cached) " >&6
 else
   ac_func_search_save_LIBS=$LIBS
@@ -11841,11 +11841,11 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 #ifdef __cplusplus
 extern "C"
 #endif
-char dlopen ();
+char dlsym ();
 int
 main ()
 {
-return dlopen ();
+return dlsym ();
   ;
   return 0;
 }
@@ -11858,25 +11858,25 @@ for ac_lib in '' dl; do
 LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
   fi
   if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_search_dlopen=$ac_res
+  ac_cv_search_dlsym=$ac_res
 fi
 rm -f core conftest.err conftest.$ac_objext \
 conftest$ac_exeext
-  if test "${ac_cv_search_dlopen+set}" = set; then :
+  if test "${ac_cv_search_dlsym+set}" = set; then :
   break
 fi
 done
-if test "${ac_cv_search_dlopen+set}" = set; then :
+if test "${ac_cv_search_dlsym+set}" = set; then :
 
 else
-  ac_cv_search_dlopen=no
+  ac_cv_search_dlsym=no
 fi
 rm conftest.$ac_ext
 LIBS=$ac_func_search_save_LIBS
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_dlopen" >&5
-$as_echo "$ac_cv_search_dlopen" >&6; }
-ac_res=$ac_cv_search_dlopen
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_dlsym" >&5
+$as_echo "$ac_cv_search_dlsym" >&6; }
+ac_res=$ac_cv_search_dlsym
 if test "$ac_res" != no; then :
   test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
 
diff --git a/binutils/configure b/binutils/configure
index 22e1b1736e..9691ec23f0 100755
--- a/binutils/configure
+++ b/binutils/configure
@@ -11622,9 +11622,9 @@ else
 fi
 
   if test "$plugins" = "yes"; then
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing 
dlopen" >&5
-$as_echo_n "checking for library containing dlopen... " >&6; }
-if test "${ac_cv_search_dlopen+set}" = set; then :
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing 
dlsym" >&5
+$as_echo_n "checking for library containing dlsym... " >&6; }
+if test "${ac_cv_search_dlsym+set}" = set; then :
   $as_echo_n "(cached) " >&6
 else
   ac_func_search_save_LIBS=$LIBS
@@ -11637,11 +11637,11 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 #ifdef __cplusplus
 extern "C"
 #endif
-char dlopen ();
+char dlsym ();
 int
 main ()
 {
-return dlopen ();
+return dlsym ();
   ;
   return 0;
 }
@@ -11654,25 +11654,25 @@ for ac_lib in '' dl; do
 LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
   fi
   if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_search_dlopen=$ac_res
+  ac_cv_search_dlsym=$ac_res
 fi
 rm -f core conftest.err conftest.$ac_objext \
 conftest$ac_exeext
-  if test "${ac_cv_search_dlopen+set}" = set; then :
+  if test "${ac_cv_search_dlsym+set}" = set; then :
   break
 fi
 done
-if test "${ac_cv_search_dlopen+set}" = set; then :
+if test "${ac_cv_search_dlsym+set}" = set; then :
 
 else
-  ac_cv_search_dlopen=no
+  ac_cv_search_dlsym=no
 fi
 rm conftest.$ac_ext
 LIBS=$ac_func_search_save_LIBS
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_dlopen" >&5
-$as_echo "$ac_cv_search_dlopen" >&6; }

Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-10-18 Thread Martin Sebor

On 10/18/2017 04:48 AM, Richard Biener wrote:

On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor  wrote:

While testing my latest -Wrestrict changes I noticed a number of
opportunities to improve the -Warray-bounds warning.  Attached
is a patch that implements a solution for the following subset
of these:

PR tree-optimization/82596 - missing -Warray-bounds on an out-of
  bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on an excessively
  large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
  inner indices

The patch also adds more detail to the -Warray-bounds diagnostics
to make it easier to see the cause of the problem.

Richard, since the changes touch tree-vrp.c, I look in particular
for your comments.


+  /* Accesses to trailing arrays via pointers may access storage
+beyond the types array bounds.  For such arrays, or for flexible
+array members as well as for other arrays of an unknown size,
+replace the upper bound with a more permissive one that assumes
+the size of the largest object is SSIZE_MAX.  */

I believe handling those cases are somewhat academic, but ...


Thanks for the quick review!

I agree the SSIZE_MAX tests handle corner cases and there are
arguably more important gaps here to plug (e.g., VLAs).  Then
again, most bugs tend to lurk in corner cases of one kind or
another and these seemed like a good way for me to come up to
speed on the implementation before tackling those.  If you
have suggestions for which to dive into next I'm happy to take
them.


+  tree eltype = TREE_TYPE (ref);
+  tree eltsize = TYPE_SIZE_UNIT (eltype);

needs to use array_ref_element_size.  Note that this size can be
non-constant in which case you now ICE so you have to check
this is an INTEGER_CST.


Thanks.  I did reproduce a few ICEs due to VLAs.  I've fixed
the problems and added tests for them.  One-dimensional VLAs
are now handled the same way arrays of unknown bound are.
Handling them more intelligently (i.e., taking into account
the ranges their bounds are in) and especially handling
multidimensional VLAs will take some effort.



+  tree maxbound = TYPE_MAX_VALUE (ssizetype);

please don't use ssizetype - sizetype can be of different precision
than pointers (and thus objects).  size_type_node would come
close (maps to size_t), eventually ptrdiff_type_node is also
defined by all frontends.


Okay, I've changed it to sizetype (it would be nice if there
were a cleaner way of doing it rather than by bit-twiddling.)

ptrdiff_type would have been my first choice but it's only defined
by the C/C++ front ends and not available in the middle-end, so
the other warnings that consider object sizes deal with ssizetype
(e.g., -Walloc-size-larger-than, -Wstringop- overflow, and
-Wformat-overflow).

That said, I'm not sure I understand under what conditions
ssizetype is not the signed equivalent of sizetype.  Can you
clarify?

As an aside, at some point I would like to get away from a type
based limit in all these warnings and instead use one that can
be controlled by an option so that a user can impose a lower limit
on the maximum size of an object and have all size-related warnings
(and perhaps even optimizations) enforce it and benefit from it.


+  up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize);
+

int_const_binop if you insist on using trees...


Sure.  (I think offset_int would be more convenient but the rest
of the function uses trees so I just stuck to those to avoid
converting things back and forth or disrupting more of the code
than I had to.)



+  tree arg = TREE_OPERAND (ref, 0);
+  tree_code code = TREE_CODE (arg);
+  if (code == COMPONENT_REF)
+   {
+ HOST_WIDE_INT off;
+ if (tree base = get_addr_base_and_unit_offset (ref, ))
+   up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
+  TYPE_SIZE_UNIT (TREE_TYPE (base)));
+ else
+   return;

so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false).
simply not subtracting anyhing instead of returning would be conservatively
correct, no?  Likewise subtracting the offset of the array for all "previous"
variably indexed components with assuming the lowest value for the index.
But as above I think compensating for the offset of the array within the object
is academic ... ;)


I was going to say yes (it gives up) but on second thought I don't
think it does.  Only the major index can be unbounded and the code
does consider the size of the sub-array when checking the major
index.  So, IIUC, I think this works correctly as is (*).  What
doesn't work is VLAs but those are a separate problem.  Let me
know if I misunderstood your question.


+  else if (code == STRING_CST)
+   up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));

that one is more interesting -- why's 

Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread Hans-Peter Nilsson
On Wed, 18 Oct 2017, FX wrote:

> Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
> macOS 10.13 often fail (failure rate for ?make -j2? to ?make -j8? is about 
> 60% from my own builds and results reported by others): 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797
> This is reproducible with several versions of GNU make.
>
> Changing libstdc++?s makefile to mark install-headers with .NOTPARALLEL fixes 
> the issue. We've carried that patch in Homebrew (https://brew.sh) for a few 
> months now, and have had no report of build issues since then.
>
> Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
> platforms). OK to commit?

Maybe moot now, but .NOTPARALLEL doesn't work the way you imply
in the patch.  From TFM of GNU Make 3.81 and 4: "Any
prerequisites on this target are ignored."

To wit, it seems you'd get the effect, but for *every target* in
this Makefile.  (If I were you and the patch is still proposed,
I'd remove the unused target and add a comment on the intent.)

Assuming the documentation and my reading is correct.

brgds, H-P


Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-18 Thread Richard Sandiford
Aldy Hernandez  writes:
> On Tue, Oct 17, 2017 at 6:05 PM, Richard Sandiford
>  wrote:
>> Andrew MacLeod  writes:
>>> On 10/17/2017 08:18 AM, Richard Sandiford wrote:
 Aldy Hernandez  writes:
> Hi folks!
>
> Calling print_hex() on a widest_int with the most significant bit turned
> on can lead to a leading zero being printed (0x0). This produces
> confusing dumps to say the least, especially when you incorrectly assume
> an integer is NOT signed :).
 That's the intended behaviour though.  wide_int-based types only use as
 many HWIs as they need to store their current value, with any other bits
 in the value being a sign extension of the top bit.  So if the most
 significant HWI in a widest_int is zero, that HWI is there to say that
 the previous HWI should be zero- rather than sign-extended.

 So:

 0x0  -> (1 << 32) - 1 to infinite precision
 (i.e. a positive value)
 0x   -> -1

 Thanks,
 Richard
>>>
>>> I for one find this very confusing.  If I have a 128 bit value, I don't
>>> expect to see a 132 bits.  And there are enough 0's its not obvious when
>>> I look.
>>
>> But Aldy was talking about widest_int, which is wider than 128 bits.
>> It's an approximation of infinite precision.
>
> IMO, we should document this leading zero in print_hex, as it's not
> inherently obvious.
>
> But yes, I was talking about widest_int.  I should explain what I am
> trying to accomplish, since perhaps there is a better way.
>
> I am printing a a wide_int (bounds[i] below), but I really don't want
> to print the sign extension nonsense, since it's a detail of the
> underlying representation.

Ah!  OK.  Yeah, I agree it doesn't make sense to print sign-extension
bits above the precision.  I think it'd work if print_hex used
extract_uhwi insteead of elt, which would also remove the need
to handle "negative" numbers specially.  I'll try that tomorrow.

> Currently I'm doing this to chop off the unnecessary bits:
>
> /* Wide ints may be sign extended to the full extent of the
>underlying HWI storage, even if the precision we care about
>is smaller.  Chop off the excess bits for prettier output.  */
> signop sign = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
> widest_int val = widest_int::from (bounds[i], sign);
> val &= wi::mask (bounds[i].get_precision (), false);
>
> if (val > 0x)
>   print_hex (val, pp_buffer (buffer)->digit_buffer);
> else
>   print_dec (val, pp_buffer (buffer)->digit_buffer, sign);
>
> Since I am calling print_hex() on the widest_int, I get the leading 0.
>
> Do you recommend another way of accomplishing this?

Is it just print_hex that's the problem?  Does print_dec handle
big negative numbers properly?

I agree we should make it so that wide-int-print.cc does the
right thing for this without the need to switch to widest_int.

Thanks,
Richard

>
>>
>> wide_int is the type to use if you want an N-bit number (for some N).
>>
>>> I don't think a leading 0 should be printed if "precision" bits have
>>> already been printed.
>>
>> Does 0 get printed in that case though?  Aldy's patch skips an upper
>
> No.
>
> Thanks for your input Richard.
> Aldy


[PATCH] v3: C/C++: more stdlib header hints (PR c/81404)

2017-10-18 Thread David Malcolm
On Wed, 2017-10-18 at 20:40 +, Joseph Myers wrote:
> On Wed, 18 Oct 2017, David Malcolm wrote:
> 
> > +{"WINT_MAX", {"", NULL} },
> > +{"WINT_MIN", {"", NULL} }
> 
> These are in  / , not .

Thanks; here's an updated version of the patch which fixes that.

OK for trunk once the prereqs are in place? (assuming bootstrap and
usual testing passes)

gcc/ChangeLog:
PR c/81404
* Makefile.in (C_COMMON_OBJS): Add c-family/known-headers.o.

gcc/c-family/ChangeLog:
PR c/81404
* known-headers.cc: New file, based on material from c/c-decl.c.
(suggest_missing_header): Copied as-is.
(get_stdlib_header_for_name): New, based on get_c_name_hint but
heavily edited to add C++ support.  Add some knowledge about
, , and .
* known-headers.h: Likewise.

gcc/c/ChangeLog:
PR c/81404
* c-decl.c: Include "c-family/known-headers.h".
(get_c_name_hint): Rename to get_stdlib_header_for_name and move
to known-headers.cc.
(class suggest_missing_header): Move to known-header.h.
(lookup_name_fuzzy): Call get_c_stdlib_header_for_name rather
than get_c_name_hint.

gcc/cp/ChangeLog:
PR c/81404
* name-lookup.c: Include "c-family/known-headers.h"
(lookup_name_fuzzy): Call get_cp_stdlib_header_for_name and
potentially return a new suggest_missing_header hint.

gcc/testsuite/ChangeLog:
PR c/81404
* g++.dg/spellcheck-stdlib.C: New.
* gcc.dg/spellcheck-stdlib.c (test_INT_MAX): New.
---
 gcc/Makefile.in  |   2 +-
 gcc/c-family/known-headers.cc| 167 +++
 gcc/c-family/known-headers.h |  41 
 gcc/c/c-decl.c   |  82 +--
 gcc/cp/name-lookup.c |  11 ++
 gcc/testsuite/g++.dg/spellcheck-stdlib.C |  84 
 gcc/testsuite/gcc.dg/spellcheck-stdlib.c |   9 ++
 7 files changed, 317 insertions(+), 79 deletions(-)
 create mode 100644 gcc/c-family/known-headers.cc
 create mode 100644 gcc/c-family/known-headers.h
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-stdlib.C

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2809619..9855919 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1190,7 +1190,7 @@ C_COMMON_OBJS = c-family/c-common.o 
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/c-semantics.o c-family/c-ada-spec.o \
   c-family/c-cilkplus.o \
   c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o \
-  c-family/c-attribs.o c-family/c-warn.o
+  c-family/c-attribs.o c-family/c-warn.o c-family/known-headers.o
 
 # Language-independent object files.
 # We put the *-match.o and insn-*.o files first so that a parallel make
diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
new file mode 100644
index 000..e01f4e8
--- /dev/null
+++ b/gcc/c-family/known-headers.cc
@@ -0,0 +1,167 @@
+/* Support for suggestions about missing #include directives.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "c-family/c-common.h"
+#include "c-family/known-headers.h"
+#include "gcc-rich-location.h"
+
+/* An enum for distinguishing between the C and C++ stdlibs.  */
+
+enum stdlib
+{
+  STDLIB_C,
+  STDLIB_CPLUSPLUS,
+
+  NUM_STDLIBS
+};
+
+/* A struct for associating names in a standard library with the header
+   that should be included to locate them, for each of the C and C++ stdlibs
+   (or NULL, for names that aren't in a header for a particular stdlib).  */
+
+struct stdlib_hint
+{
+  const char *name;
+  const char *header[NUM_STDLIBS];
+};
+
+/* Given non-NULL NAME, return the header name defining it within either
+   the standard library (with '<' and '>'), or NULL.
+   Only handles a subset of the most common names within the stdlibs.  */
+
+static const char *
+get_stdlib_header_for_name (const char *name, enum stdlib lib)
+{
+  gcc_assert (name);
+  gcc_assert (lib < NUM_STDLIBS);
+
+  static const stdlib_hint hints[] = {
+/*  and .  */
+{"errno", {"", ""} },
+
+/*  and .  */
+{"CHAR_BIT", {"", ""} },
+{"CHAR_MAX", {"", ""} },
+{"CHAR_MIN", {"", ""} },
+{"INT_MAX", {"", ""} },
+{"INT_MIN", {"", ""} },
+{"LLONG_MAX", {"", ""} 

Re: Add an alternative vector loop iv mechanism

2017-10-18 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
>  wrote:
>> Normally we adjust the vector loop so that it iterates:
>>
>>(original number of scalar iterations - number of peels) / VF
>>
>> times, enforcing this using an IV that starts at zero and increments
>> by one each iteration.  However, dividing by VF would be expensive
>> for variable VF, so this patch adds an alternative in which the IV
>> increments by VF each iteration instead.  We then need to take care
>> to handle possible overflow in the IV.
>
> Hmm, why do you need to handle possible overflow?  Doesn't the
> original loop have a natural IV that evolves like this?  After all we
> can compute an expression for niters of the scalar loop.

The problem comes with loops like:

   unsigned char i = 0;
   do
 {
   ...
   i--;
 }
   while (i != 0);

The loop statements execute 256 times and the latch executes 255 times.
LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
unsigned char) is 0.

This leads to things like:

  /* Constant case.  */
  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
{
  tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
  tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);

  gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
  gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
  if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
return true;
}

in loop_niters_no_overflow.

>> The new mechanism isn't used yet; a later patch replaces the
>> "if (1)" with a check for variable VF.  If the patch is OK, I'll
>> hold off applying it until the follow-on is ready to go in.
>
> I indeed don't like code that isn't exercised.  Otherwise looks reasonable.

Thanks.

Richard

> Thanks,
> Richard.
>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>> OK to install when the time comes?
>>
>> Richard
>>
>>
>> 2017-10-13  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-loop-manip.c: Include gimple-fold.h.
>> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
>> niters_maybe_zero parameters.  Handle other cases besides a step of 
>> 1.
>> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
>> Add a path that uses a step of VF instead of 1, but disable it
>> for now.
>> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
>> and niters_no_overflow parameters.  Update calls to
>> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
>> Create a new SSA name if the latter choses to use a ste other
>> than zero, and return it via niters_vector_mult_vf_var.
>> * tree-vect-loop.c (vect_transform_loop): Update calls to
>> vect_do_peeling, vect_gen_vector_loop_niters and
>> slpeel_make_loop_iterate_ntimes.
>> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, 
>> vect_do_peeling)
>> (vect_gen_vector_loop_niters): Update declarations after above
> changes.
>>
>> Index: gcc/tree-vect-loop-manip.c
>> ===
>> --- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
>> +++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
>> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o
>>  #include "tree-scalar-evolution.h"
>>  #include "tree-vectorizer.h"
>>  #include "tree-ssa-loop-ivopts.h"
>> +#include "gimple-fold.h"
>>
>>  /*
>>Simple Loop Peeling Utilities
>> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
>> gimple_bb (update_phi));
>>  }
>>
>> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV
>> -   that starts at zero, increases by one and its limit is NITERS.
>> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
>> +   where NITERS is known to be outside the range [1, STEP - 1].
>> +   This is equivalent to making the loop execute NITERS / STEP
>> +   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
>> +   where M is the precision of NITERS.
>> +
>> +   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
>> +   to be >= STEP.  In the latter case N is always NITERS / STEP.
>> +
>> +   If FINAL_IV is nonnull, it is an SSA name that should be set to
>> +   N * STEP on exit from the loop.
>>
>> Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
>>
>>  void
>> -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
>> +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step,
>> +tree final_iv, bool niters_maybe_zero)
>>  {
>>tree indx_before_incr, indx_after_incr;
>>gcond *cond_stmt;
>>gcond *orig_cond;
>> +  

Re: [patch] implement generic debug() functions for wide_int's

2017-10-18 Thread Pedro Alves
On 10/18/2017 06:08 PM, Aldy Hernandez wrote:

> Also, do we have a blessed way of specifying overloaded functions in
> ChangeLog's?  I couldn't find anything in our GCC coding guidelines or
> in the GNU coding guidelines.  For lack of direction, I'm doing the
> following:
> 
> * wide-int.cc (debug) [const wide_int &]: New.
> (debug) [const wide_int *]: New.
> (debug) [const widest_int &]: New.
> (debug) [const widest_int *]: New.
> 
> It seems appropriate, as that is the GNU way of changelogs for a
> conditional change to a function ???.

Doesn't seem that appropriate to me.  Square brackets are used for
conditional compilation (#ifdef etc.), but overloads are not that.

I'd suggest looking in libstdc++'s ChangeLog for precedents.  It's where
I looked at when I had the same question for GDB, FWIW.  E.g., a very
recent libstdc++ commit from Jon had:

* include/bits/stl_map.h (map::insert(value_type&&))
(map::insert(const_iterator, value_type&&)): Add overload for rvalues.

I.e., simply use the full prototype as function name, since it's
really what it is in C++.

Thanks,
Pedro Alves



Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread FX
> Could you test using .PHONY: install-headers instead?
> That target *is* phony, so telling make that seems sensible.

I’ve tried adding it to the existing .PHONY list:

Index: libstdc++-v3/include/Makefile.in
===
--- libstdc++-v3/include/Makefile.in(revision 253855)
+++ libstdc++-v3/include/Makefile.in(working copy)
@@ -1449,6 +1449,7 @@ uninstall-am:
distclean-libtool dvi dvi-am html html-am info info-am install \
install-am install-data install-data-am install-data-local \
install-dvi install-dvi-am install-exec install-exec-am \
+   install-freestanding-headers install-headers \
install-html install-html-am install-info install-info-am \
install-man install-pdf install-pdf-am install-ps \
install-ps-am install-strip installcheck installcheck-am \


But that doesn’t work:

In file included from 
/Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/bits/exception_ptr.h:39:0,
 from 
/Users/fx/devel/gcc/trunk/libstdc++-v3/libsupc++/exception:143,
 from 
/Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/ios:39,
 from 
/Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/istream:38,
 from 
/Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/sstream:38,
 from 
/Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/complex:45,
 from 
/Users/fx/devel/gcc/ibin/x86_64-apple-darwin17.0.0/i386/libstdc++-v3/include/ccomplex:39,
 from 
/Users/fx/devel/gcc/trunk/libstdc++-v3/include/precompiled/stdc++.h:52:
/Users/fx/devel/gcc/trunk/libstdc++-v3/libsupc++/typeinfo:36:10: fatal error: 
bits/hash_bytes.h: No such file or directory
 #include 
  ^~~



Re: [patch, fortran] Fix PR 82567

2017-10-18 Thread Thomas Koenig

Hi Jerry and Steve,


Well I know 42 is the answer to the ultimate question of the universe so this
must be OK.  I just don't know what the question is.

OK and thanks,

Jerry

+#define CONSTR_LEN_MAX 42

Actually, I was wondering about the choice myself.  With
most common hardware having fairly robust L1 and L2 cache
sizes, a double precision array constructor with 42
elements only occupies 336 bytes.  Seems small.


Well, the answer is that I didn't know how to chose a reasonable
constant.  I now actually ran some benchmarks using rdtsc, and
these seem to indicate that the optimum value for CONST_LEN_MAX
is actually quite short, 3 or 4, otherwise I just got a slowdown
or a break even.

So, I committed (r253872) with a length of 4 as a limit.  If anybody
comes up with a better number, we can always change this.

So, thanks for the review and the comments.

Regards

Thomas

If somebody wants to check, here is the test case:

main.f90:

module tick
  interface
 function rdtsc()
   integer(kind=8) :: rdtsc
 end function rdtsc
  end interface
end module tick

program main
  use tick
  use tst
  implicit none
  integer(8) :: t1, t2
  t1 = rdtsc()
  call sub1(2.0)
  t2 = rdtsc()
  !  print *,"sub1 : ", t2-t1

  t1 = rdtsc()
  do i=1,1
 call sub1(2.0)
  end do
  t2 = rdtsc()
  print *,"sub1 : ", t2-t1

  t1 = rdtsc()
  do i=1,1
 call sub2(2.0)
  end do
  t2 = rdtsc()
  print *,"sub2 : ", t2-t1

end program main

tst.f90:
module tst
  integer, parameter :: n=4
  real, dimension(n) :: x
  real, dimension(n), parameter :: s = [(i,i=1,n)]
contains
  subroutine sub1(a)
real, intent(in) :: a
x(1) = a * 1.0
x(2) = a * 2.0
x(3) = a * 3.0
x(4) = a * 3.0

  end subroutine sub1
  subroutine sub2(a)
x(:) = a * s(:)
  end subroutine sub2
end module tst

rdtsc.s:
.file   "rdtsc.s"
.text
.globl  rdtsc_
.type   rdtsc_, @function
rdtsc_:
.LFB0:
.cfi_startproc
rdtsc
shl $32, %rdx
or  %rdx, %rax
ret
.cfi_endproc
.LFE0:
.size   rdtsc_, .-rdtsc_
.section.note.GNU-stack,"",@progbits


[PATCH] PR libgcc/59714 complex division is surprising on aarch64

2017-10-18 Thread vladimir . mezentsev
From: Vladimir Mezentsev 

FMA (floating-point multiply-add) instructions are supported on aarch64.
These instructions can produce different result if two operations executed 
separately.
-ffp-contract=off doesn't allow the FMA instructions.

Tested on two platforms:
  aarch64-unknown-linux-gnu: No regression. Two failed tests now passed.
  sparc64-unknown-linux-gnu: No regression.

ChangeLog:
2017-10-18  Vladimir Mezentsev  

PR libgcc/59714
* libgcc/Makefile.in: Add -ffp-contract=off
---
 libgcc/Makefile.in |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index a1a392d..b771875 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -237,12 +237,16 @@ else
 DECNUMINC =
 endif
 
+# Disable floating-point expression contraction
+LIBGCC2_FFP_CONTRAST_CFLAGS = -ffp-contract=off
+
 # Options to use when compiling libgcc2.a.
 #
 LIBGCC2_DEBUG_CFLAGS = -g
 LIBGCC2_CFLAGS = -O2 $(LIBGCC2_INCLUDES) $(GCC_CFLAGS) $(HOST_LIBGCC2_CFLAGS) \
 $(LIBGCC2_DEBUG_CFLAGS) -DIN_LIBGCC2 \
 -fbuilding-libgcc -fno-stack-protector \
+$(LIBGCC2_FFP_CONTRAST_CFLAGS) \
 $(INHIBIT_LIBC_CFLAGS)
 
 # Additional options to use when compiling libgcc2.a.
-- 
1.7.1



Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread Jonathan Wakely

On 18/10/17 16:51 +0200, FX wrote:

Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
macOS 10.13 often fail (failure rate for “make -j2” to “make -j8” is about 60% 
from my own builds and results reported by others): 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797
This is reproducible with several versions of GNU make.

Changing libstdc++’s makefile to mark install-headers with .NOTPARALLEL fixes 
the issue. We've carried that patch in Homebrew (https://brew.sh) for a few 
months now, and have had no report of build issues since then.

Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
platforms). OK to commit?


Could you test using .PHONY: install-headers instead?

That target *is* phony, so telling make that seems sensible.

Presumably the same change is needed for install-freestanding-headers,
since it's very similar.




Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread FX
> From supplied info not follow that problem is on gcc build side.
> I can suspect that APFS has problem with direntry intensive
> modification, for example.

As part of Homebrew, we have built 4000+ open source codes on this new 
filesystem, with parallel compilation. Some of them pretty intensive (clang, 
llvm, rust, ghc, and a ton of compilers, libraries, databases, etc.). Many of 
them several times (for testing, etc.). Macports has done the same, probably 
other projects as well.

We have not seen any evidence of a generic issue related to this filesystem. 
Apple is not aware of such an issue either, apparently. Yet, libstdc++ parallel 
builds have a very high failure rate. Other GCC libraries build fine, too.

I do not know how to debug parallel makefiles, otherwise I would do it. I have 
asked help on the GNU Make mailing-list 
(http://lists.gnu.org/archive/html/bug-make/2017-08/msg00034.html), but didn’t 
get any. So I cannot prove it (and fix it), but there is overwhelming evidence 
that the problem is in libstdc++.

FX

Re: [libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread Petr Ovtchenkov
On Wed, 18 Oct 2017 16:51:37 +0200
FX  wrote:

> Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
> macOS 10.13 often fail
> (failure rate for “make -j2” to “make -j8” is about 60% from my own builds 
> and results reported
> by others): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797 This is 
> reproducible with several
> versions of GNU make.
> 
> Changing libstdc++’s makefile to mark install-headers with .NOTPARALLEL fixes 
> the issue. We've
> carried that patch in Homebrew (https://brew.sh) for a few months now, and 
> have had no report of
> build issues since then.
> 
> Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
> platforms). OK to commit?
> 
> FX
> 
> 

From supplied info not follow that problem is on gcc build side.
I can suspect that APFS has problem with direntry intensive
modification, for example.

--

  - ptr


Re: [PATCH] v2: C/C++: more stdlib header hints (PR c/81404)

2017-10-18 Thread Joseph Myers
On Wed, 18 Oct 2017, David Malcolm wrote:

> +{"WINT_MAX", {"", NULL} },
> +{"WINT_MIN", {"", NULL} }

These are in  / , not .

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING] [C++ Patch] PR 82307

2017-10-18 Thread Mukesh Kapoor
The bug happens only for enum types with a fixed underlying type. The 
existing code tries to create another type based on it's precision by 
calling c_common_type_for_size(). For the precision value of an unsigned 
long long type, the call to c_common_type_for_size() returns an unsigned 
long type and this causes compilation errors later on. The fix is to 
simply use the fixed underlying type of the enum instead of creating a 
new one.


Mukesh

On 10/18/2017 1:10 PM, Nathan Sidwell wrote:

On 10/18/2017 12:17 PM, Mukesh Kapoor wrote:

On 10/9/2017 12:20 PM, Mukesh Kapoor wrote:

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82307.
For an unscoped enum with a fixed underlying type, the function 
type_promotes_to() does not always return the same type as the 
underlying type. The fix is to use the underlying type of the enum 
instead of creating a new one by calling c_common_type_for_size().



The diff looks wrong.  Just before the changed piece it (attempts to) 
deal with enum types.  Why is that failing?


nathan





[PATCH] v2: C/C++: more stdlib header hints (PR c/81404)

2017-10-18 Thread David Malcolm
On Tue, 2017-10-17 at 20:05 +, Joseph Myers wrote:
> On Tue, 17 Oct 2017, David Malcolm wrote:
> 
> > It also adds generalizes some of the code for this (and for the
> > "std::"
> > namespace hints in the C++ frontend), moving it to a new
> > c-family/known-headers.cc and .h, and introducing a class
> > known_headers.
> > This currently just works by scanning a hardcoded array of known
> > name/header associations, but perhaps in the future could be turned
> > into some kind of symbol database so that the compiler could record
> > API
> > uses and use that to offer suggestions e.g.
> > 
> > foo.cc: error: 'myapi::foo' was not declared in this scope
> > foo.cc: note: 'myapi::foo" was declared in header 'myapi/private.h'
> > (included via 'myapi/public.h') when compiling 'bar.cc'; did you
> > forget to
> > '#include "myapi/public.h"'?
> > 
> > or somesuch.
> > 
> > In any case, moving this to a class gives an easier way to locate
> > the
> > hardcoded knowledge about the stdlib.
> > 
> > The patch also adds similar code to the C++ frontend covering
> > unqualified names in the standard library, so that rather than just
> 
> I'd tend to expect hardcoded standard library knowledge, where it
> relates 
> to symbols present for both C and C++, to be in a common c-family
> file 
> (e.g. listing both C and C++ headers for each symbol, with the
> possibility 
> of some symbols only having a header listed for one C and C++; most
> C 
> symbols would have  and  listed, but some might be
> different, 
> e.g. wchar_t being a keyword in C++ or clog being completely
> different in 
> the two libraries).  That reduces the chance of a symbol being 
> gratuitously listed for one language only when such hints make sense
> for 
> it in both languages.


Here's an updated version of the patch, which moves the data to
c-family/known-headers.cc and unifies the C and C++ data into one array.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk once the prereqs are in place?

gcc/ChangeLog:
PR c/81404
* Makefile.in (C_COMMON_OBJS): Add c-family/known-headers.o.

gcc/c-family/ChangeLog:
PR c/81404
* known-headers.cc: New file, based on material from c/c-decl.c.
(suggest_missing_header): Copied as-is.
(get_stdlib_header_for_name): New, based on get_c_name_hint but
heavily edited to add C++ support.  Add some knowledge about
, , and .
* known-headers.h: Likewise.

gcc/c/ChangeLog:
PR c/81404
* c-decl.c: Include "c-family/known-headers.h".
(get_c_name_hint): Rename to get_stdlib_header_for_name and move
to known-headers.cc.
(class suggest_missing_header): Move to known-header.h.
(lookup_name_fuzzy): Call get_c_stdlib_header_for_name rather
than get_c_name_hint.

gcc/cp/ChangeLog:
PR c/81404
* name-lookup.c: Include "c-family/known-headers.h"
(lookup_name_fuzzy): Call get_cp_stdlib_header_for_name and
potentially return a new suggest_missing_header hint.

gcc/testsuite/ChangeLog:
PR c/81404
* g++.dg/spellcheck-stdlib.C: New.
* gcc.dg/spellcheck-stdlib.c (test_INT_MAX): New.
---
 gcc/Makefile.in  |   2 +-
 gcc/c-family/known-headers.cc| 167 +++
 gcc/c-family/known-headers.h |  41 
 gcc/c/c-decl.c   |  82 +--
 gcc/cp/name-lookup.c |  11 ++
 gcc/testsuite/g++.dg/spellcheck-stdlib.C |  84 
 gcc/testsuite/gcc.dg/spellcheck-stdlib.c |   9 ++
 7 files changed, 317 insertions(+), 79 deletions(-)
 create mode 100644 gcc/c-family/known-headers.cc
 create mode 100644 gcc/c-family/known-headers.h
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-stdlib.C

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2809619..9855919 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1190,7 +1190,7 @@ C_COMMON_OBJS = c-family/c-common.o 
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/c-semantics.o c-family/c-ada-spec.o \
   c-family/c-cilkplus.o \
   c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o \
-  c-family/c-attribs.o c-family/c-warn.o
+  c-family/c-attribs.o c-family/c-warn.o c-family/known-headers.o
 
 # Language-independent object files.
 # We put the *-match.o and insn-*.o files first so that a parallel make
diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
new file mode 100644
index 000..1ac42f1
--- /dev/null
+++ b/gcc/c-family/known-headers.cc
@@ -0,0 +1,167 @@
+/* Support for suggestions about missing #include directives.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later

[PATCH, i386]: Fix PR82580, Optimize double-word comparisons

2017-10-18 Thread Uros Bizjak
Hello!

Attached patch emulates double-word comparisons with a double-word
subtraction. Note that only comparisons that test Carry, Sign and
Overflow flags are valid, so we have to avoid comparisons that test
Zero flag.

2017-10-18  Uros Bizjak  

PR target/82580
* config/i386/i386-modes.def (CCGZ): New CC mode.
* config/i386/i386.md (sub3_carry_ccgz): New insn pattern.
* config/i386/predicates.md (ix86_comparison_operator):
Handle CCGZmode.
* config/i386/i386.c (ix86_expand_branch) :
Emulate LE, LEU, GT, GTU, LT, LTU, GE and GEU double-word comparisons
with double-word subtraction.
(put_condition_code): Handle CCGZmode.

testsuite/ChangeLog:

2017-10-18  Uros Bizjak  
Jakub Jelinek  

PR target/82580
* gcc.target/i386/pr82580.c: New test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386-modes.def
===
--- config/i386/i386-modes.def  (revision 253855)
+++ config/i386/i386-modes.def  (working copy)
@@ -39,19 +39,22 @@ ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ?
For the i386, we need separate modes when floating-point
equality comparisons are being done.
 
-   Add CCNO to indicate comparisons against zero that requires
+   Add CCNO to indicate comparisons against zero that require
Overflow flag to be unset.  Sign bit test is used instead and
thus can be used to form "a>0" type of tests.
 
-   Add CCGC to indicate comparisons against zero that allows
+   Add CCGC to indicate comparisons against zero that allow
unspecified garbage in the Carry flag.  This mode is used
by inc/dec instructions.
 
-   Add CCGOC to indicate comparisons against zero that allows
+   Add CCGOC to indicate comparisons against zero that allow
unspecified garbage in the Carry and Overflow flag. This
mode is used to simulate comparisons of (a-b) and (a+b)
against zero using sub/cmp/add operations.
 
+   Add CCGZ to indicate comparisons that allow unspecified garbage
+   in the Zero flag.  This mode is used in double-word comparisons.
+
Add CCA to indicate that only the Above flag is valid.
Add CCC to indicate that only the Carry flag is valid.
Add CCO to indicate that only the Overflow flag is valid.
@@ -62,6 +65,7 @@ ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ?
 CC_MODE (CCGC);
 CC_MODE (CCGOC);
 CC_MODE (CCNO);
+CC_MODE (CCGZ);
 CC_MODE (CCA);
 CC_MODE (CCC);
 CC_MODE (CCO);
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 253855)
+++ config/i386/i386.c  (working copy)
@@ -16732,6 +16732,7 @@ put_condition_code (enum rtx_code code, machine_mo
   switch (code)
 {
 case EQ:
+  gcc_assert (mode != CCGZmode);
   switch (mode)
{
case E_CCAmode:
@@ -16755,6 +16756,7 @@ put_condition_code (enum rtx_code code, machine_mo
}
   break;
 case NE:
+  gcc_assert (mode != CCGZmode);
   switch (mode)
{
case E_CCAmode:
@@ -16799,6 +16801,7 @@ put_condition_code (enum rtx_code code, machine_mo
 
case E_CCmode:
case E_CCGCmode:
+   case E_CCGZmode:
  suffix = "l";
  break;
 
@@ -16807,7 +16810,7 @@ put_condition_code (enum rtx_code code, machine_mo
}
   break;
 case LTU:
-  if (mode == CCmode)
+  if (mode == CCmode || mode == CCGZmode)
suffix = "b";
   else if (mode == CCCmode)
suffix = fp ? "b" : "c";
@@ -16824,6 +16827,7 @@ put_condition_code (enum rtx_code code, machine_mo
 
case E_CCmode:
case E_CCGCmode:
+   case E_CCGZmode:
  suffix = "ge";
  break;
 
@@ -16832,7 +16836,7 @@ put_condition_code (enum rtx_code code, machine_mo
}
   break;
 case GEU:
-  if (mode == CCmode)
+  if (mode == CCmode || mode == CCGZmode)
suffix = "nb";
   else if (mode == CCCmode)
suffix = fp ? "nb" : "nc";
@@ -21469,6 +21473,8 @@ ix86_match_ccmode (rtx insn, machine_mode req_mode
 case E_CCZmode:
   break;
 
+case E_CCGZmode:
+
 case E_CCAmode:
 case E_CCCmode:
 case E_CCOmode:
@@ -22177,6 +22183,52 @@ ix86_expand_branch (enum rtx_code code, rtx op0, r
  break;
}
 
+   /* Emulate comparisons that do not depend on Zero flag with
+  double-word subtraction.  Note that only Overflow, Sign
+  and Carry flags are valid, so swap arguments and condition
+  of comparisons that would otherwise test Zero flag.  */
+
+   switch (code)
+ {
+ case LE: case LEU: case GT: case GTU:
+   std::swap (lo[0], lo[1]);
+   std::swap (hi[0], hi[1]);
+   code = swap_condition (code);
+   /* FALLTHRU */
+
+ case LT: case LTU: case GE: case GEU:
+ 

Re: [PING] [C++ Patch] PR 82307

2017-10-18 Thread Nathan Sidwell

On 10/18/2017 12:17 PM, Mukesh Kapoor wrote:

On 10/9/2017 12:20 PM, Mukesh Kapoor wrote:

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82307.
For an unscoped enum with a fixed underlying type, the function 
type_promotes_to() does not always return the same type as the 
underlying type. The fix is to use the underlying type of the enum 
instead of creating a new one by calling c_common_type_for_size().



The diff looks wrong.  Just before the changed piece it (attempts to) 
deal with enum types.  Why is that failing?


nathan

--
Nathan Sidwell


Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Vladimir Makarov



On 10/18/2017 06:48 AM, Segher Boessenkool wrote:

A volatile asm statement can not be moved (relative to other volatile
asm, etc.), but IRA could do it nevertheless.  This patch fixes it.

Testing on powerpc64-linux {-m32,-m64}; okay if it succeeds?  Also
for backports?

Although I am not an author of this code, the patch looks ok for me.  
ASM_OPERANDS with volatile memory creates a barrier.  I guess still a 
few such cases are missed for rtx_moveable_p.  The reference for such 
cases should be function sched_deps.c::sched_analyze.


You can commit it to the trunk and backport.  It should be safe.

Segher, thank you for working on the PR.



2017-10-18  Segher Boessenkool  

PR rtl-optimization/82602
* ira.c (rtx_moveable_p): Return false for volatile asm.

---
  gcc/ira.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/gcc/ira.c b/gcc/ira.c
index 046ce3b..8c93d3d 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -4400,6 +4400,12 @@ rtx_moveable_p (rtx *loc, enum op_type type)
 for a reason.  */
return false;
  
+case ASM_OPERANDS:

+  /* The same is true for volatile asm: it has unknown side effects, it
+ cannot be moved at will.  */
+  if (MEM_VOLATILE_P (x))
+   return false;
+
  default:
break;
  }




Re: [patch] implement generic debug() functions for wide_int's

2017-10-18 Thread Mike Stump
On Oct 18, 2017, at 10:08 AM, Aldy Hernandez  wrote:
> 
> It looks like the generic debug() function for wide_int's is missing.

> OK for trunk?

Ok.



Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-18 Thread Sam van Kampen via gcc-patches
On Wed, Oct 18, 2017 at 09:46:08AM -0600, Martin Sebor wrote:
> > Fair enough, I didn't know whether to change the way it currently was
> > triggered. Do you think it should fall under -Wextra (I don't think it
> > falls under -Wall, since it isn't "easy to avoid or modify to prevent
> > the warning" because it may be valid and wanted behavior), or should it
> > be enabled by no other flag?
> 
> I think it depends on the implementation of the warning.  With
> the current (fairly restrictive) behavior I'd say it should be
> disabled by default.  But if it were to be changed to more closely
> match the Clang behavior and only warn for bit-field declarations
> that cannot represent all enumerators of the enumerated type, then
> including it in -Wall would seem helpful to me.
> 
> I.e., Clang doesn't warn on this and IIUC that's what the reporter
> of the bug also expects:
> 
>   enum E: unsigned { E3 = 15 };
> 
>   struct S { E i: 4; };
> 
> (There is value in warning on this as well, but I think most users
> will not be interested in it, so making the warning a two-level one
> where level 1 warns same as Clang and level 2 same as GCC does now
> might give us the best of both worlds).

I see what you mean - that is the behavior I wanted to implement in the
first place, but Jonathan Wakely rightly pointed out that when an
enumeration is scoped, all values of its underlying type are valid
enumeration values, and so the bit-field you declare in 'S' _is_ too
small to hold all values of 'enum E'.

Here's the corresponding text from draft N4659 of the C++17 standard,
§10.2/8 [dcl.enum]

For an enumeration whose underlying type is fixed, the values of the
enumeration are the values of the underlying type. [...] It is possible
to define an enumeration that has values not defined by any of its
enumerators.

Still, warning when a bit-field can't hold all enumerators instead of
all values may be a good idea. I've looked into it, and it does require
recalculating the maximum and minimum enumerator value, since the bounds
of the underlying type are saved in TYPE_MIN_VALUE and TYPE_MAX_VALUE
when the enumeration is scoped, instead of the min/max enumerator value.

Would adding that separate warning level be part of a separate patch, or
should I add it to this one?


Re: [PATCH] Simplify floating point comparisons

2017-10-18 Thread Joseph Myers
On Wed, 18 Oct 2017, Richard Biener wrote:

> When overflow/underflow can be disregarded is there any reason remaining to
> make this guarded by flag_unsafe_math_optimizations?  Are there any cases
> where rounding issues can flip the comparison result?

Yes.  E.g. (in round-to-nearest) 3.0f * 0xfep0f == 3.0f * 0xfdp0f 
(generically, multiplication / division by any non-power-of-2 constant 
should be expected sometimes to map two different values to the same value 
even when overflow and underflow cannot occur).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, i386] Avoid 512-bit mode MOV for prefer-avx256 option in Intel AVX512 configuration

2017-10-18 Thread Kirill Yukhin
Hello Sergey,
On 06 Oct 14:20, Shalnov, Sergey wrote:
> Jakub,
> I completely agree with you. I fixed the patch.
> Currently, TARGET_PREFER256 will work on architectures with 512VL. It will 
> not work otherwise.
> 
> I will try to find better solution for this. I think I need to look into 
> register classes to configure
> available registers for 512F and 512VL in case of TARGET_PREFER_AVX256.
Probably I am missing the point, but IMHO think register classes are loosely 
connected to
preferred modes of operations.

> I would propose to merge this patch as temporal solution.
Why not to implement generic solution right now?
I don't think're in hurry here to push temporal solution unless
we have some reasoning.

I see only few mentions of TARGET_PREFER_AVX256 in i386.[c|md] and nothing
looks suspicious to me.

> 
> Sergey

--
Thanks, K


[patch] implement generic debug() functions for wide_int's

2017-10-18 Thread Aldy Hernandez
It looks like the generic debug() function for wide_int's is missing.
Instead, we have a wi->dump() method.  I have implemented debug() for
generic wide_int and for widest_int, which should cover the common
cases.  Anything else, can continue using the ->dump() method
templated methods.

Also, do we have a blessed way of specifying overloaded functions in
ChangeLog's?  I couldn't find anything in our GCC coding guidelines or
in the GNU coding guidelines.  For lack of direction, I'm doing the
following:

* wide-int.cc (debug) [const wide_int &]: New.
(debug) [const wide_int *]: New.
(debug) [const widest_int &]: New.
(debug) [const widest_int *]: New.

It seems appropriate, as that is the GNU way of changelogs for a
conditional change to a function ???.

OK for trunk?
gcc/

	* wide-int.cc (debug) [const wide_int &]: New.
	(debug) [const wide_int *]: New.
	(debug) [const widest_int &]: New.
	(debug) [const widest_int *]: New.

diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 71e24ec22af..1a1a68c1943 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -2146,6 +2146,39 @@ template void generic_wide_int  >::dump () const;
 template void offset_int::dump () const;
 template void widest_int::dump () const;
 
+/* We could add all the above ::dump variants here, but wide_int and
+   widest_int should handle the common cases.  Besides, you can always
+   call the dump method directly.  */
+
+DEBUG_FUNCTION void
+debug (const wide_int )
+{
+  ref.dump ();
+}
+
+DEBUG_FUNCTION void
+debug (const wide_int *ptr)
+{
+  if (ptr)
+debug (*ptr);
+  else
+fprintf (stderr, "\n");
+}
+
+DEBUG_FUNCTION void
+debug (const widest_int )
+{
+  ref.dump ();
+}
+
+DEBUG_FUNCTION void
+debug (const widest_int *ptr)
+{
+  if (ptr)
+debug (*ptr);
+  else
+fprintf (stderr, "\n");
+}
 
 #if CHECKING_P
 


[PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-18 Thread Egeyar Bagcioglu

Hello,

Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder 
the instructions and cause the value of a variable to be checked before 
its first assignment. The following patch is moving the
break point to the end of the function. Therefore, it ensures that the 
break point is reached after the assignment instruction is executed.


Please review the patch and apply if legitimate.

Egeyar

>From a11fe0b1fcf1867a0fa8c4627e347bda07a4c61b Mon Sep 17 00:00:00 2001
From: Egeyar Bagcioglu 
Date: Thu, 12 Oct 2017 06:16:30 -0700
Subject: [PATCH] Fix nrv-1.c false failure on aarch64.

Reordering instructions due to optimization flags is causing nrv-1.c
to fail on aarch64. The value of the variable is checked before the
assignment instruction is reached. This patch is moving the
breakpoint to the end of the function to prevent such false failures.

	* gcc.dg/guality/nrv-1.c: Move breakpoint from line 20 to 22.
---
 gcc/testsuite/gcc.dg/guality/nrv-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/nrv-1.c b/gcc/testsuite/gcc.dg/guality/nrv-1.c
index 2f4e654..77ad462 100644
--- a/gcc/testsuite/gcc.dg/guality/nrv-1.c
+++ b/gcc/testsuite/gcc.dg/guality/nrv-1.c
@@ -17,9 +17,9 @@ f ()
   a2.i[0] = 42;
   if (a3.i[0] != 0)
 abort ();
-  a2.i[4] = 7;	/* { dg-final { gdb-test 20 "a2.i\[0\]" "42" } } */
+  a2.i[4] = 7;
   return a2;
-}
+}	/* { dg-final { gdb-test 22 "a2.i\[0\]" "42" } } */
 
 int
 main ()
-- 
1.9.1



patch to fix PR82556

2017-10-18 Thread Vladimir Makarov

The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82556

The patch was bootstrapped and tested on x86-64.

Committed to trunk as rev. 253862.

Committed to gcc-7-branch as rev. 253863.

Index: ChangeLog
===
--- ChangeLog	(revision 253862)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2017-10-18  Vladimir Makarov  
+
+	PR middle-end/82556
+	* lra-constraints.c (curr_insn_transform): Use non-input operand
+	instead of output one for matched reload.
+
 2017-10-17  Jakub Jelinek  
 
 	PR tree-optimization/82549
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 253862)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2017-10-18  Vladimir Makarov  
+
+	PR middle-end/82556
+	* gcc.target/i386/pr82556.c: New.
+
 2017-10-17  Eric Botcazou  
 
 	* gcc.dg/attr-alloc_size-11.c: UnXFAIL for visium-*-*.
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 253815)
+++ lra-constraints.c	(working copy)
@@ -4284,7 +4284,13 @@ curr_insn_transform (bool check_only_p)
 	}
   else if (curr_static_id->operand[i].type == OP_IN
 	   && (curr_static_id->operand[goal_alt_matched[i][0]].type
-		   == OP_OUT))
+		   == OP_OUT
+		   || (curr_static_id->operand[goal_alt_matched[i][0]].type
+		   == OP_INOUT
+		   && (operands_match_p
+			   (*curr_id->operand_loc[i],
+			*curr_id->operand_loc[goal_alt_matched[i][0]],
+			-1)
 	{
 	  /* generate reloads for input and matched outputs.  */
 	  match_inputs[0] = i;
@@ -4295,9 +4301,14 @@ curr_insn_transform (bool check_only_p)
 			[goal_alt_number * n_operands + goal_alt_matched[i][0]]
 			.earlyclobber);
 	}
-  else if (curr_static_id->operand[i].type == OP_OUT
+  else if ((curr_static_id->operand[i].type == OP_OUT
+		|| (curr_static_id->operand[i].type == OP_INOUT
+		&& (operands_match_p
+			(*curr_id->operand_loc[i],
+			 *curr_id->operand_loc[goal_alt_matched[i][0]],
+			 -1
 	   && (curr_static_id->operand[goal_alt_matched[i][0]].type
-		   == OP_IN))
+		== OP_IN))
 	/* Generate reloads for output and matched inputs.  */
 	match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], ,
 		  , curr_static_id->operand_alternative
Index: testsuite/gcc.target/i386/pr82556.c
===
--- testsuite/gcc.target/i386/pr82556.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr82556.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-strict-aliasing -fwrapv -fexcess-precision=standard" } */
+extern int foo();
+typedef struct {
+  char id;
+  unsigned char fork_flags;
+  short data_length;
+} Header;
+int a;
+void X() {
+  do {
+char* b;
+Header c;
+if (a)
+  c.fork_flags |= 1;
+__builtin_memcpy(b, , __builtin_offsetof(Header, data_length));
+b += foo();
+  } while (1);
+}


Re: [PING] [C++ Patch] PR 82307

2017-10-18 Thread Mukesh Kapoor

On 10/9/2017 12:20 PM, Mukesh Kapoor wrote:

Hi,

This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82307.
For an unscoped enum with a fixed underlying type, the function 
type_promotes_to() does not always return the same type as the 
underlying type. The fix is to use the underlying type of the enum 
instead of creating a new one by calling c_common_type_for_size().


Bootstrapped and tested with 'make check' on x86_64-linux. New test 
case added.


Mukesh





Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Segher Boessenkool
On Wed, Oct 18, 2017 at 05:17:19PM +0200, Michael Matz wrote:
> On Wed, 18 Oct 2017, Segher Boessenkool wrote:
> 
> > Certainly.  And to work around the bug, it should work to mention some
> > hard register as asm input.  Ideally something that is live anyway;
> > perhaps the stack pointer :-)  Like so:
> > 
> >   __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: "sp");
> > 
> > (I tested this, it does work around the bug).
> > 
> > Or hey, why not the program counter, that should make it even clearer
> > something is funny here:
> > 
> >   __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: "pc");
> > 
> > (also tested, works fine).
> 
> Both of these are not asm inputs but clobbers, though :)

Yeah I typed (and tested) the code after writing that first sentence.
Clobbering was easier :-)

>  And clobbering 
> "pc" could have funny effects if any of our allocators ever would produce 
> save/restore code around such asms for such clobbers.

Yeah -- but sp (and pc, if it exists) is a fixed register, so we don't
do such things :-)


Segher


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-18 Thread Martin Sebor

Fair enough, I didn't know whether to change the way it currently was
triggered. Do you think it should fall under -Wextra (I don't think it
falls under -Wall, since it isn't "easy to avoid or modify to prevent
the warning" because it may be valid and wanted behavior), or should it
be enabled by no other flag?


I think it depends on the implementation of the warning.  With
the current (fairly restrictive) behavior I'd say it should be
disabled by default.  But if it were to be changed to more closely
match the Clang behavior and only warn for bit-field declarations
that cannot represent all enumerators of the enumerated type, then
including it in -Wall would seem helpful to me.

I.e., Clang doesn't warn on this and IIUC that's what the reporter
of the bug also expects:

  enum E: unsigned { E3 = 15 };

  struct S { E i: 4; };

(There is value in warning on this as well, but I think most users
will not be interested in it, so making the warning a two-level one
where level 1 warns same as Clang and level 2 same as GCC does now
might give us the best of both worlds).


The warning points to the bit-field ('field:2' is too small to hold...),
but I've added a note in an updated patch that tells the user what the
precision of the enum's underlying type is.


That will be useful, thanks!


I've changed the description in both c.opt and invoke.texi to be more
specific, it now notes that it warns when
1. An enum in a bit-field is a scoped enum...
2. ...which has an underlying type which has a higher precision
   than the width of the bit-field.


Sounds good.

Martin


Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Michael Matz
Hi,

On Wed, 18 Oct 2017, Segher Boessenkool wrote:

> Certainly.  And to work around the bug, it should work to mention some
> hard register as asm input.  Ideally something that is live anyway;
> perhaps the stack pointer :-)  Like so:
> 
>   __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: "sp");
> 
> (I tested this, it does work around the bug).
> 
> Or hey, why not the program counter, that should make it even clearer
> something is funny here:
> 
>   __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: "pc");
> 
> (also tested, works fine).

Both of these are not asm inputs but clobbers, though :)  And clobbering 
"pc" could have funny effects if any of our allocators ever would produce 
save/restore code around such asms for such clobbers.


Ciao,
Michael.


Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-18 Thread Jan Hubicka
> > Those instructions seems similarly expensive in Intel implementation.
> > http://users.atw.hu/instlatx64/GenuineIntel0050654_SkylakeXeon9_InstLatX64.txt
> > lists latencies ranging from 18 to 32 cycles.
> > 
> > Of course it may also be the case that the utility is measuring gathers 
> > incorrectly.
> > according to Agner's table Skylake has optimized gathers, they used to be
> > 12 to 34 uops on haswell and are no 4 to 5.
> > > 
> > > > > Note the most major source of impreciseness in the cost model
> > > > > is from vec_perm because we lack the information of the
> > > > > permutation mask which means we can't distinguish between
> > > > > cross-lane and intra-lane permutes.
> > > > 
> > > > Besides that we lack information about what operation we do (addition
> > > > or division?) which may be useful to pass down, especially because we do
> > > > have relevant information handy in the x86_cost tables.  So I am 
> > > > thinking
> > > > of adding extra parameter to the hook telling the operation.
> > > 
> > > Not sure.  The costs are all supposed to be relative to scalar cost
> > > and I fear we get nearer to a GIGO syndrome when adding more information
> > > here ;)
> > 
> > Yep, however there is setup cost (like loads/stores) which comes into game
> > as well.  I will see how far i can get by making x86 costs more "realistic"
> 
> I think it should be always counting the cost of n scalar loads plus
> an overhead depending on the microarchitecture.  As you say we're
> not getting rid of any memory latencies (in the worst case).  From
> Agner I read Skylake optimized gathers down to the actual memory
> access cost, the overhead is basically well hidden.

Where did you find it? It does not seem to quite match the instruction latency 
table
above.

Honza
> 
> Richard.
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


[libstdc++, patch] Fix build on APFS file system

2017-10-18 Thread FX
Parallel builds of libstdc++ on APFS filesystem (with 1 ns granularity) on 
macOS 10.13 often fail (failure rate for “make -j2” to “make -j8” is about 60% 
from my own builds and results reported by others): 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797
This is reproducible with several versions of GNU make.

Changing libstdc++’s makefile to mark install-headers with .NOTPARALLEL fixes 
the issue. We've carried that patch in Homebrew (https://brew.sh) for a few 
months now, and have had no report of build issues since then.

Bootstrapped and regtested on x86_64-apple-darwin17 (as well as other 
platforms). OK to commit?

FX




parallel.ChangeLog
Description: Binary data


parallel.diff
Description: Binary data


[PATCH][GRAPHITE] Limit AST code generation, PR82591

2017-10-18 Thread Richard Biener

The following limits ISL operations done during optimized AST generation
as the PR shows it can take quite a bit of time.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-10-18  Richard Biener  

PR tree-optimization/82591
* graphite.c (graphite_transform_loops): Move code gen message
printing ...
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl):
Here.  Handle scop_to_isl_ast failing.
(scop_to_isl_ast): Limit the number of ISL operations.

Index: gcc/graphite.c
===
--- gcc/graphite.c  (revision 253848)
+++ gcc/graphite.c  (working copy)
@@ -378,16 +380,14 @@ graphite_transform_loops (void)
if (!apply_poly_transforms (scop))
  continue;
 
-   location_t loc = find_loop_location
- (scops[i]->scop_info->region.entry->dest->loop_father);
-
changed = true;
-   if (!graphite_regenerate_ast_isl (scop))
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-  "loop nest not optimized, code generation error\n");
-   else
- dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
-  "loop nest optimized\n");
+   if (graphite_regenerate_ast_isl (scop))
+ {
+   location_t loc = find_loop_location
+ (scops[i]->scop_info->region.entry->dest->loop_father);
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+"loop nest optimized\n");
+ }
   }
 
   if (changed)
Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253848)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.
 #include "cfganal.h"
 #include "value-prof.h"
 #include "tree-ssa.h"
+#include "tree-vectorizer.h"
 #include "graphite.h"
 
 struct ast_build_info
@@ -1378,6 +1341,13 @@ ast_build_before_for (__isl_keep isl_ast
 __isl_give isl_ast_node *translate_isl_ast_to_gimple::
 scop_to_isl_ast (scop_p scop)
 {
+  int old_err = isl_options_get_on_error (scop->isl_context);
+  int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
+  int max_operations = PARAM_VALUE (PARAM_MAX_ISL_OPERATIONS);
+  if (max_operations)
+isl_ctx_set_max_operations (scop->isl_context, max_operations);
+  isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
+
   gcc_assert (scop->transformed_schedule);
 
   /* Set the separate option to reduce control flow overhead.  */
@@ -1396,6 +1366,27 @@ scop_to_isl_ast (scop_p scop)
   isl_ast_node *ast_isl = isl_ast_build_node_from_schedule
 (context_isl, schedule);
   isl_ast_build_free (context_isl);
+
+  isl_options_set_on_error (scop->isl_context, old_err);
+  isl_ctx_reset_operations (scop->isl_context);
+  isl_ctx_set_max_operations (scop->isl_context, old_max_operations);
+  if (isl_ctx_last_error (scop->isl_context) != isl_error_none)
+{
+  location_t loc = find_loop_location
+   (scop->scop_info->region.entry->dest->loop_father);
+  if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+"loop nest not optimized, AST generation timed out "
+"after %d operations [--param max-isl-operations]\n",
+max_operations);
+  else
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+"loop nest not optimized, ISL AST generation "
+"signalled an error\n");
+  isl_ast_node_free (ast_isl);
+  return NULL;
+}
+
   return ast_isl;
 }
 
@@ -1444,6 +1435,12 @@ graphite_regenerate_ast_isl (scop_p scop
   timevar_push (TV_GRAPHITE_CODE_GEN);
   t.add_parameters_to_ivs_params (scop, ip);
   root_node = t.scop_to_isl_ast (scop);
+  if (! root_node)
+{
+  ivs_params_clear (ip);
+  timevar_pop (TV_GRAPHITE_CODE_GEN);
+  return false;
+}
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
@@ -1484,10 +1481,10 @@ graphite_regenerate_ast_isl (scop_p scop
 
   if (t.codegen_error_p ())
 {
-  if (dump_file)
-   fprintf (dump_file, "codegen error: "
-"reverting back to the original code.\n");
-  set_ifsese_condition (if_region, integer_zero_node);
+  location_t loc = find_loop_location
+   (scop->scop_info->region.entry->dest->loop_father);
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "loop nest not optimized, code generation error\n");
 
   /* Remove the unreachable region.  */
   remove_edge_and_dominated_blocks (if_region->true_region->region.entry);


Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Segher Boessenkool
On Wed, Oct 18, 2017 at 01:50:31PM +, Bernd Edlinger wrote:
> Yes, the volatile is still necessary.

Certainly.  And to work around the bug, it should work to mention some
hard register as asm input.  Ideally something that is live anyway;
perhaps the stack pointer :-)  Like so:

  __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: "sp");

(I tested this, it does work around the bug).

Or hey, why not the program counter, that should make it even clearer
something is funny here:

  __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: "pc");

(also tested, works fine).


Segher


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-18 Thread Eric Botcazou
> Are we in agreement that I should revert the patch?

I think that you can leave it for now in order to have some feedback (see for 
example PR rtl-optimization/82597) but ideally it should be rewritten so as to 
reuse the existing infrastructure of the pass.

-- 
Eric Botcazou


Re: [PATCH PR/82546] tree node size

2017-10-18 Thread Eric Botcazou
> I'd think so. LANG_TYPE is treated specially in several
> places and Ada debug types are pretty sensitive so this would
> require caution but I don't see/know-of obvious reasons why this
> couldn't be done.

LANG_TYPE is only used in Ada to trigger the specific treatment in 
gen_type_die_with_usage for DW_TAG_unspecified_type, so very minor.

-- 
Eric Botcazou


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-18 Thread Sam van Kampen via gcc-patches
On Mon, Oct 16, 2017 at 11:31:10PM +, Joseph Myers wrote:
> On Mon, 16 Oct 2017, Sam van Kampen via gcc-patches wrote:
> 
> > +Wbitfield-enum-conversion
> > +C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
> > +Warn about struct bit-fields being too small to hold enumerated types.
> 
> Any option supported for C++ should also be supported for ObjC++ unless 
> there is a clear reason it cannot work for ObjC++.
> [...]
> The documentation of the option also needs to indicate that it's for 
> C++/ObjC++ only, similar to other such options.
I've added the ObjC++ flag to the warning declaration above and I've
expanded the documentation to note that it only applies to C++/ObjC++.

> The patch also needs to add a testcase to the testsuite that verifies the 
> warnings issues in appropriate cases (and that warnings are not issued 
> when the bit-field is large enough).
I've added such a testcase as well. I'll send the updated version of the
patch, I'm just waiting to hear back from Martin on whether the flag
should fall under -Wextra or fall under no flag and be disabled unless
explicitly specified.
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Thanks for the feedback,
Sam


Re: [patch] Fix PR debug/82509

2017-10-18 Thread Eric Botcazou
> Hmm.  It makes tracking DIE builds difficult now that not all allocations go
> through new_die anymore.

I wouldn't have created such a precedent though, IOW there is nothing new.

> Can you instead split out a new_die_raw
> function with just the allocation and the die_tag initialization?  Or make
> !parent_die && !t this mode, thus add
> 
>   if (parent_die != NULL)
> add_child_die (parent_die, die);
>   else if (! t)
> return die;
>   else
>  {
> 
> ?  Otherwise the patch looks sensible.

Here's a revision version which makes sure that there is a single call to

  ggc_cleared_alloc ()

in the entire file.  Tested on x86_64-suse-linux.


PR debug/82509
* dwarf2out.c (new_die_raw): New static inline function.
(new_die): Use it to create the DIE.
(add_AT_external_die_ref): Likewise.
(clone_die): Likewise.
(clone_as_declaration): Likewise.
(dwarf2out_vms_debug_main_pointer): Likewise.
(base_type_die): Likewise.  Remove early return for corner cases.
Do not call add_pubtype on the DIE here.
(is_base_type): Remove ERROR_MARK and return 0 for VOID_TYPE.
(modified_type_die): Adjust the lookup for reverse order DIEs.  Skip
typedefs for base types with DW_AT_endianity.  Make sure a DIE with
native order exists for base types, attach the DIE manually and call
add_pubtype on it.  Do not equate a reverse order DIE to the type.

-- 
Eric BotcazouIndex: dwarf2out.c
===
--- dwarf2out.c	(revision 253848)
+++ dwarf2out.c	(working copy)
@@ -5364,6 +5364,16 @@ splice_child_die (dw_die_ref parent, dw_
   reparent_child (child, parent);
 }
 
+/* Create and return a new die with TAG_VALUE as tag.  */
+ 
+static inline dw_die_ref
+new_die_raw (enum dwarf_tag tag_value)
+{
+  dw_die_ref die = ggc_cleared_alloc ();
+  die->die_tag = tag_value;
+  return die;
+}
+
 /* Create and return a new die with a parent of PARENT_DIE.  If
PARENT_DIE is NULL, the new DIE is placed in limbo and an
associated tree T must be supplied to determine parenthood
@@ -5372,9 +5382,7 @@ splice_child_die (dw_die_ref parent, dw_
 static inline dw_die_ref
 new_die (enum dwarf_tag tag_value, dw_die_ref parent_die, tree t)
 {
-  dw_die_ref die = ggc_cleared_alloc ();
-
-  die->die_tag = tag_value;
+  dw_die_ref die = new_die_raw (tag_value);
 
   if (parent_die != NULL)
 add_child_die (parent_die, die);
@@ -5568,8 +5576,7 @@ add_AT_external_die_ref (dw_die_ref die,
 {
   /* Create a fake DIE that contains the reference.  Don't use
  new_die because we don't want to end up in the limbo list.  */
-  dw_die_ref ref = ggc_cleared_alloc ();
-  ref->die_tag = die->die_tag;
+  dw_die_ref ref = new_die_raw (die->die_tag);
   ref->die_id.die_symbol = IDENTIFIER_POINTER (get_identifier (symbol));
   ref->die_offset = offset;
   ref->with_offset = 1;
@@ -7712,13 +7719,10 @@ should_move_die_to_comdat (dw_die_ref di
 static dw_die_ref
 clone_die (dw_die_ref die)
 {
-  dw_die_ref clone;
+  dw_die_ref clone = new_die_raw (die->die_tag);
   dw_attr_node *a;
   unsigned ix;
 
-  clone = ggc_cleared_alloc ();
-  clone->die_tag = die->die_tag;
-
   FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
 add_dwarf_attr (clone, a);
 
@@ -7762,8 +7766,7 @@ clone_as_declaration (dw_die_ref die)
   return clone;
 }
 
-  clone = ggc_cleared_alloc ();
-  clone->die_tag = die->die_tag;
+  clone = new_die_raw (die->die_tag);
 
   FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
 {
@@ -12090,9 +12093,6 @@ base_type_die (tree type, bool reverse)
   struct fixed_point_type_info fpt_info;
   tree type_bias = NULL_TREE;
 
-  if (TREE_CODE (type) == ERROR_MARK || TREE_CODE (type) == VOID_TYPE)
-return 0;
-
   /* If this is a subtype that should not be emitted as a subrange type,
  use the base type.  See subrange_type_for_debug_p.  */
   if (TREE_CODE (type) == INTEGER_TYPE && TREE_TYPE (type) != NULL_TREE)
@@ -12185,7 +12185,7 @@ base_type_die (tree type, bool reverse)
   gcc_unreachable ();
 }
 
-  base_type_result = new_die (DW_TAG_base_type, comp_unit_die (), type);
+  base_type_result = new_die_raw (DW_TAG_base_type);
 
   add_AT_unsigned (base_type_result, DW_AT_byte_size,
 		   int_size_in_bytes (type));
@@ -12241,8 +12241,6 @@ base_type_die (tree type, bool reverse)
 		 | dw_scalar_form_reference,
 		 NULL);
 
-  add_pubtype (type, base_type_result);
-
   return base_type_result;
 }
 
@@ -12270,8 +12268,6 @@ is_base_type (tree type)
 {
   switch (TREE_CODE (type))
 {
-case ERROR_MARK:
-case VOID_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
 case FIXED_POINT_TYPE:
@@ -12280,6 +12276,7 @@ is_base_type (tree type)
 case POINTER_BOUNDS_TYPE:
   return 1;
 
+case VOID_TYPE:
 case ARRAY_TYPE:
 case RECORD_TYPE:
 case UNION_TYPE:
@@ -12485,6 +12482,8 @@ modified_type_die (tree type, int cv_qua
   /* Only these 

Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-18 Thread Sam van Kampen via gcc-patches
On Mon, Oct 16, 2017 at 08:56:05PM -0600, Martin Sebor wrote:
> On 10/16/2017 06:37 AM, Sam van Kampen via gcc-patches wrote:
> > ..I just realised that the clang flag is -Wbitfield-enum-conversion, not
> > -Wenum-bitfield-conversion. Please apply the patch below instead, which
> > has replaced the two words to remove the inconsistency.
> > 
> > 2017-10-16  Sam van Kampen  
> > 
> > * c-family/c.opt: Add a warning flag for struct bit-fields
> > being too small to hold enumerated types.
> > * cp/class.c: Likewise.
> > * doc/invoke.texi: Add documentation for said warning flag.
> > 
> > Index: gcc/c-family/c.opt
> > ===
> > --- gcc/c-family/c.opt  (revision 253784)
> > +++ gcc/c-family/c.opt  (working copy)
> > @@ -346,6 +346,10 @@ Wframe-address
> >  C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++ 
> > ObjC++,Wall)
> >  Warn when __builtin_frame_address or __builtin_return_address is used 
> > unsafely.
> > 
> > +Wbitfield-enum-conversion
> > +C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
> > +Warn about struct bit-fields being too small to hold enumerated types.
> 
> Warning by default is usually reserved for problems that are
> most likely indicative of bugs.  Does this rise to that level?
> (It seems that it should be in the same class as -Wconversion).
> 
Fair enough, I didn't know whether to change the way it currently was
triggered. Do you think it should fall under -Wextra (I don't think it
falls under -Wall, since it isn't "easy to avoid or modify to prevent
the warning" because it may be valid and wanted behavior), or should it
be enabled by no other flag?
> > +
> >  Wbuiltin-declaration-mismatch
> >  C ObjC C++ ObjC++ Var(warn_builtin_declaraion_mismatch) Init(1) Warning
> >  Warn when a built-in function is declared with the wrong signature.
> > Index: gcc/cp/class.c
> > ===
> > --- gcc/cp/class.c  (revision 253784)
> > +++ gcc/cp/class.c  (working copy)
> > @@ -3278,10 +3278,11 @@ check_bitfield_decl (tree field)
> >&& tree_int_cst_lt (TYPE_SIZE (type), w)))
> > warning_at (DECL_SOURCE_LOCATION (field), 0,
> > "width of %qD exceeds its type", field);
> > -  else if (TREE_CODE (type) == ENUMERAL_TYPE
> > +  else if (warn_bitfield_enum_conversion
> > +  && TREE_CODE (type) == ENUMERAL_TYPE
> >&& (0 > (compare_tree_int
> > (w, TYPE_PRECISION (ENUM_UNDERLYING_TYPE (type))
> > -   warning_at (DECL_SOURCE_LOCATION (field), 0,
> > +   warning_at (DECL_SOURCE_LOCATION (field), OPT_Wbitfield_enum_conversion,
> > "%qD is too small to hold all values of %q#T",
> > field, type);
> 
> I would suggest to include a bit more detail in the message, such
> as the smallest number of bits needed to represent every value of
> the enumeration.  It's also often helpful to include a note
> pointing to the relevant declaration (in this case it could be
> the bit-field).
>
The warning points to the bit-field ('field:2' is too small to hold...),
but I've added a note in an updated patch that tells the user what the
precision of the enum's underlying type is.
> >  }
> > Index: gcc/doc/invoke.texi
> > ===
> > --- gcc/doc/invoke.texi (revision 253784)
> > +++ gcc/doc/invoke.texi (working copy)
> > @@ -264,7 +264,7 @@ Objective-C and Objective-C++ Dialects}.
> >  -Walloca  -Walloca-larger-than=@var{n} @gol
> >  -Wno-aggressive-loop-optimizations  -Warray-bounds  -Warray-bounds=@var{n} 
> > @gol
> >  -Wno-attributes  -Wbool-compare  -Wbool-operation @gol
> > --Wno-builtin-declaration-mismatch @gol
> > +-Wbitfield-enum-conversion -Wno-builtin-declaration-mismatch @gol
> >  -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
> >  -Wc++-compat  -Wc++11-compat  -Wc++14-compat  @gol
> >  -Wcast-align  -Wcast-align=strict  -Wcast-qual  @gol
> > @@ -5431,6 +5431,12 @@ Incrementing a boolean is invalid in C++17, and de
> > 
> >  This warning is enabled by @option{-Wall}.
> > 
> > +@item -Wbitfield-enum-conversion
> > +@opindex Wbitfield-enum-conversion
> > +@opindex Wno-bitfield-enum-conversion
> > +Warn about a bit-field potentially being too small to hold all values
> > +of an enumerated type. This warning is enabled by default.
> > +
> 
> The brief description makes me wonder what this really means.  What
> is the meaning of "potentially?"  What (what expressions) triggers
> the warning?  Presumably it's initialization and assignment.  Does
> explicit or implicit conversion also trigger is?  I would suggest
> to expand the documentation to obviate these questions, and perhaps
> also include an example.
>
I've changed the description in both c.opt and invoke.texi to be more
specific, it now notes that 

Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Bernd Edlinger
On 10/18/17 15:46, Segher Boessenkool wrote:
> On Wed, Oct 18, 2017 at 06:30:23AM -0500, Segher Boessenkool wrote:
>> On Wed, Oct 18, 2017 at 11:10:48AM +, Bernd Edlinger wrote:
>>> A memory clobber would also make rtx_moveable_p return false,
>>> thru the following case:
>>>
>>>  case MEM:
>>>if (type == OP_IN && MEM_READONLY_P (x))
>>>  return rtx_moveable_p ( (x, 0), OP_IN);
>>>return false;
>>>
>>> ...
>>>
>>>  case CLOBBER:
>>>return rtx_moveable_p (_DEST (x), OP_OUT);
>>>
>>>
>>> because that memory clobber is in a parallel statement
>>> together with the ASM_OUTPUT.
>>>
>>> Right?
>>
>> Yes, that looks correct.  And a memory clobber of course _should_ make
>> the insn non-moveable as well -- it's an unknown side effect all by
>> itself :-)
> 
> Actually, it is not a side effect...  Just kind of behaves like it.
> But it does *not* prevent the asm from being deleted, for example.
> 
> 

Yes, the volatile is still necessary.


Bernd.


Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Segher Boessenkool
On Wed, Oct 18, 2017 at 06:30:23AM -0500, Segher Boessenkool wrote:
> On Wed, Oct 18, 2017 at 11:10:48AM +, Bernd Edlinger wrote:
> > A memory clobber would also make rtx_moveable_p return false,
> > thru the following case:
> > 
> > case MEM:
> >   if (type == OP_IN && MEM_READONLY_P (x))
> > return rtx_moveable_p ( (x, 0), OP_IN);
> >   return false;
> > 
> > ...
> > 
> > case CLOBBER:
> >   return rtx_moveable_p (_DEST (x), OP_OUT);
> > 
> > 
> > because that memory clobber is in a parallel statement
> > together with the ASM_OUTPUT.
> > 
> > Right?
> 
> Yes, that looks correct.  And a memory clobber of course _should_ make
> the insn non-moveable as well -- it's an unknown side effect all by
> itself :-)

Actually, it is not a side effect...  Just kind of behaves like it.
But it does *not* prevent the asm from being deleted, for example.


Segher


[PATCH][GRAPHITE] More TLC

2017-10-18 Thread Richard Biener

And using range-info to constain parameters.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-10-18  Richard Biener  

* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::set_rename): Simplify.
(translate_isl_ast_to_gimple::set_rename_for_each_def): Inline...
(graphite_copy_stmts_from_block): ... here.
(copy_bb_and_scalar_dependences): Simplify.
(add_parameters_to_ivs_params): Canonicalize.
(generate_entry_out_of_ssa_copies): Simplify.
* graphite-sese-to-poly.c (extract_affine_name): Simplify
by passing in ISL dimension.
(parameter_index_in_region_1): Rename to ...
(parameter_index_in_region): ... this.
(extract_affine): Adjust assert, pass down parameter index.
(add_param_constraints): Use range-info when available.
(build_scop_context): Adjust.
* sese.c (new_sese_info): Adjust.
(free_sese_info): Likewise.
* sese.h (bb_map_t, rename_map_t, phi_rename, init_back_edge_pair_t):
Remove unused typedefs.
(struct sese_info_t): Simplify rename_map, remove incomplete_phis.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253848)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -195,7 +195,6 @@ class translate_isl_ast_to_gimple
   edge copy_bb_and_scalar_dependences (basic_block bb, edge next_e,
   vec iv_map);
   void set_rename (tree old_name, tree expr);
-  void set_rename_for_each_def (gimple *stmt);
   void gsi_insert_earliest (gimple_seq seq);
   bool codegen_error_p () const { return codegen_error; }
 
@@ -932,25 +931,12 @@ set_rename (tree old_name, tree expr)
 {
   fprintf (dump_file, "[codegen] setting rename: old_name = ");
   print_generic_expr (dump_file, old_name);
-  fprintf (dump_file, ", new_name = ");
+  fprintf (dump_file, ", new decl = ");
   print_generic_expr (dump_file, expr);
   fprintf (dump_file, "\n");
 }
-
-  if (old_name == expr)
-return;
-
-  vec  *renames = region->rename_map->get (old_name);
-
-  if (renames)
-renames->safe_push (expr);
-  else
-{
-  vec r;
-  r.create (2);
-  r.safe_push (expr);
-  region->rename_map->put (old_name, r);
-}
+  bool res = region->rename_map->put (old_name, expr);
+  gcc_assert (! res);
 }
 
 /* Return an iterator to the instructions comes last in the execution order.
@@ -1132,21 +1118,6 @@ should_copy_to_new_region (gimple *stmt,
   return true;
 }
 
-/* Create new names for all the definitions created by COPY and add replacement
-   mappings for each new name.  */
-
-void translate_isl_ast_to_gimple::
-set_rename_for_each_def (gimple *stmt)
-{
-  def_operand_p def_p;
-  ssa_op_iter op_iter;
-  FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
-{
-  tree old_name = DEF_FROM_PTR (def_p);
-  create_new_def_for (old_name, stmt, def_p);
-}
-}
-
 /* Duplicates the statements of basic block BB into basic block NEW_BB
and compute the new induction variables according to the IV_MAP.  */
 
@@ -1192,7 +1163,13 @@ graphite_copy_stmts_from_block (basic_bl
   gimple_duplicate_stmt_histograms (cfun, copy, cfun, stmt);
 
   /* Crete new names for each def in the copied stmt.  */
-  set_rename_for_each_def (copy);
+  def_operand_p def_p;
+  ssa_op_iter op_iter;
+  FOR_EACH_SSA_DEF_OPERAND (def_p, copy, op_iter, SSA_OP_ALL_DEFS)
+   {
+ tree old_name = DEF_FROM_PTR (def_p);
+ create_new_def_for (old_name, copy, def_p);
+   }
 
   if (codegen_error_p ())
return false;
@@ -1244,17 +1221,14 @@ copy_bb_and_scalar_dependences (basic_bl
continue;
 
   tree new_phi_def;
-  vec  *renames = region->rename_map->get (res);
-  if (! renames || renames->is_empty ())
+  tree *rename = region->rename_map->get (res);
+  if (! rename)
{
  new_phi_def = create_tmp_reg (TREE_TYPE (res));
  set_rename (res, new_phi_def);
}
   else
-   {
- gcc_assert (renames->length () == 1);
- new_phi_def = (*renames)[0];
-   }
+   new_phi_def = *rename;
 
   gassign *ass = gimple_build_assign (NULL_TREE, new_phi_def);
   create_new_def_for (res, ass, NULL);
@@ -1291,17 +1265,14 @@ copy_bb_and_scalar_dependences (basic_bl
continue;
 
  tree new_phi_def;
- vec  *renames = region->rename_map->get (res);
- if (! renames || renames->is_empty ())
+ tree *rename = region->rename_map->get (res);
+ if (! rename)
{
  new_phi_def = create_tmp_reg (TREE_TYPE (res));
  set_rename (res, new_phi_def);
}
  else
-   {
- 

Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Bernd Edlinger
On 10/18/17 13:30, Segher Boessenkool wrote:
> On Wed, Oct 18, 2017 at 11:10:48AM +, Bernd Edlinger wrote:
>> A memory clobber would also make rtx_moveable_p return false,
>> thru the following case:
>>
>>  case MEM:
>>if (type == OP_IN && MEM_READONLY_P (x))
>>  return rtx_moveable_p ( (x, 0), OP_IN);
>>return false;
>>
>> ...
>>
>>  case CLOBBER:
>>return rtx_moveable_p (_DEST (x), OP_OUT);
>>
>>
>> because that memory clobber is in a parallel statement
>> together with the ASM_OUTPUT.
>>
>> Right?
> 
> Yes, that looks correct.  And a memory clobber of course _should_ make
> the insn non-moveable as well -- it's an unknown side effect all by
> itself :-)
> 
> 

Yeah, that's probably not what they wanted to hear,
but supposed this is the only place where the volatility
of ASM_OUTPUT is ignored, then adding a memory clobber
to the asm stmt is the way to go if they want to play
safe, especially when implementing spin-locks.


Bernd.


Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-10-18 Thread Martin Liška
On 10/18/2017 02:52 PM, Marek Polacek wrote:
> On Wed, Oct 18, 2017 at 02:46:23PM +0200, Martin Liška wrote:
>> On 10/12/2017 10:48 AM, Jakub Jelinek wrote:
>>> On Thu, Oct 12, 2017 at 10:40:42AM +0200, Martin Liška wrote:
 --- a/gcc/cp/constexpr.c
 +++ b/gcc/cp/constexpr.c
 @@ -1175,7 +1175,12 @@ cxx_eval_builtin_function_call (const constexpr_ctx 
 *ctx, tree t, tree fun,
{
  new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
   CALL_EXPR_FN (t), nargs, args);
 -error ("%q+E is not a constant expression", new_call);
 +
 +/* Do not allow__builtin_unreachable in constexpr function.  */
 +if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE)
>>>
>>> As I said earlier, I think it would be better to differentiate between
>>> explicit __builtin_unreachable and the implicitly added one from the patch.
>>> So this could be done as
>>> if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE
>>> && EXPR_LOCATION (t) == BUILTINS_LOCATION)
>>>
 +  location_t loc = DECL_SOURCE_LOCATION (fndecl);
 +  if (sanitize_flags_p (SANITIZE_RETURN, fndecl))
 +t = ubsan_instrument_return (loc);
 +  else
 +t = build_call_expr_loc (loc, builtin_decl_explicit 
 (BUILT_IN_UNREACHABLE),
>>>
>>> and here use BUILTINS_LOCATION instead of loc.
>>> The code might be more readable by doing:
>>> {
>>>   tree fndecl = builtin_decl_explicit (BUILT_IN_UNREACHABLE);
>>>   t = build_call_expr_loc (BUILTINS_LOCATION, fndecl, 0);
>>> }
>>>
 +   0);
 +
>>>
>>> Jakub
>>>
>>
>> Hi.
>>
>> I'm sending updated version of the patch that should address it.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
> 
>> From 36f3f45d9fa42344261faf60bb3cfbe22ed262ac Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Thu, 12 Oct 2017 10:14:59 +0200
>> Subject: [PATCH 1/3] Instrument function exit with __builtin_unreachable in
>>  C++
>>
>> gcc/c-family/ChangeLog:
>>
>> 2017-10-12  Martin Liska  
>>
>>  PR middle-end/82404
>>  * c-opts.c (c_common_post_options): Set -Wreturn-type for C++
>>  FE.
>>  * c.opt: Set default value of warn_return_type.
>>
>> gcc/cp/ChangeLog:
>>
>> 2017-10-12  Martin Liska  
>>
>>  PR middle-end/82404
>>  * constexpr.c (cxx_eval_builtin_function_call): Handle
>>  __builtin_unreachable call.
>>  * cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
>>  ...
>>  (cp_maybe_instrument_return): ... this.
>>  (cp_genericize): Call the function unconditionally.
>>
>> gcc/fortran/ChangeLog:
>>
>> 2017-10-12  Martin Liska  
>>
>>  PR middle-end/82404
>>  * options.c (gfc_post_options): Set default value of
>>  -Wreturn-type to false.
>> ---
>>  gcc/c-family/c-opts.c |  3 +++
>>  gcc/c-family/c.opt|  2 +-
>>  gcc/cp/constexpr.c|  8 +++-
>>  gcc/cp/cp-gimplify.c  | 20 ++--
>>  gcc/fortran/options.c |  3 +++
>>  5 files changed, 28 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
>> index 6bd535532d3..682d7a83ec5 100644
>> --- a/gcc/c-family/c-opts.c
>> +++ b/gcc/c-family/c-opts.c
>> @@ -978,6 +978,9 @@ c_common_post_options (const char **pfilename)
>>  flag_extern_tls_init = 1;
>>  }
>>  
>> +  if (warn_return_type == -1)
>> +warn_return_type = c_dialect_cxx () ? 1 : 0;
> 
> Here you can simply
> 
>   warn_return_type = c_dialect_cxx ();
> 
> no?
> 
>   Marek
> 

Yes, thanks for the nit.

Martin
>From 8bbb75392d13430ce43cc1c0572ec5506d8a4353 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 12 Oct 2017 10:14:59 +0200
Subject: [PATCH 1/3] Instrument function exit with __builtin_unreachable in
 C++

gcc/c-family/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* c-opts.c (c_common_post_options): Set -Wreturn-type for C++
	FE.
	* c.opt: Set default value of warn_return_type.

gcc/cp/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* constexpr.c (cxx_eval_builtin_function_call): Handle
	__builtin_unreachable call.
	* cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
	...
	(cp_maybe_instrument_return): ... this.
	(cp_genericize): Call the function unconditionally.

gcc/fortran/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* options.c (gfc_post_options): Set default value of
	-Wreturn-type to false.
---
 gcc/c-family/c-opts.c |  3 +++
 gcc/c-family/c.opt|  2 +-
 gcc/cp/constexpr.c|  8 +++-
 gcc/cp/cp-gimplify.c  | 20 ++--
 gcc/fortran/options.c |  3 +++
 5 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 6bd535532d3..2b94128e941 100644
--- a/gcc/c-family/c-opts.c
+++ 

Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-18 Thread Richard Biener
On Wed, 18 Oct 2017, Jan Hubicka wrote:

> > > According to Agner's tables, gathers range from 12 ops (vgatherdpd)
> > > to 66 ops (vpgatherdd).  I assume that CPU needs to do following:
> > > 
> > > 1) transfer the offsets sse->ALU unit for address generation (3 cycles
> > >each, 2 ops)
> > > 2) do the address calcualtion (2 ops, probably 4 ops because it does not 
> > > map naturally
> > >  to AGU)
> > > 2) do the load (7 cycles each, 2 ops)
> > > 3) merge results (1 ops)
> > > 
> > > so I get 7 ops, not sure what remaining 5 do.
> > > 
> > > Agner does not account time, but According to
> > > http://users.atw.hu/instlatx64/AuthenticAMD0800F11_K17_Zen_InstLatX64.txt 
> > > the
> > > gather time ranges from 14 cycles (vgatherpd) to 20 cycles.  Here I guess 
> > > it is
> > > 3+1+7+1=12 so it seems to work.
> > > 
> > > If you implement gather by hand, you save the SSE->address caluclation 
> > > path and
> > > thus you can get faster.
> > 
> > I see.  It looks to me Zen should disable gather/scatter then completely
> > and we should implement manual gather/scatter code-generation in the
> > vectorizer (or lower it in vector lowering).  It sounds like they
> > only implemented it to have "complete" AVX2 support (ISTR scatter
> > is only in AVX512f).
> 
> Those instructions seems similarly expensive in Intel implementation.
> http://users.atw.hu/instlatx64/GenuineIntel0050654_SkylakeXeon9_InstLatX64.txt
> lists latencies ranging from 18 to 32 cycles.
> 
> Of course it may also be the case that the utility is measuring gathers 
> incorrectly.
> according to Agner's table Skylake has optimized gathers, they used to be
> 12 to 34 uops on haswell and are no 4 to 5.
> > 
> > > > Note the most major source of impreciseness in the cost model
> > > > is from vec_perm because we lack the information of the
> > > > permutation mask which means we can't distinguish between
> > > > cross-lane and intra-lane permutes.
> > > 
> > > Besides that we lack information about what operation we do (addition
> > > or division?) which may be useful to pass down, especially because we do
> > > have relevant information handy in the x86_cost tables.  So I am thinking
> > > of adding extra parameter to the hook telling the operation.
> > 
> > Not sure.  The costs are all supposed to be relative to scalar cost
> > and I fear we get nearer to a GIGO syndrome when adding more information
> > here ;)
> 
> Yep, however there is setup cost (like loads/stores) which comes into game
> as well.  I will see how far i can get by making x86 costs more "realistic"

I think it should be always counting the cost of n scalar loads plus
an overhead depending on the microarchitecture.  As you say we're
not getting rid of any memory latencies (in the worst case).  From
Agner I read Skylake optimized gathers down to the actual memory
access cost, the overhead is basically well hidden.

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-10-18 Thread Marek Polacek
On Wed, Oct 18, 2017 at 02:46:23PM +0200, Martin Liška wrote:
> On 10/12/2017 10:48 AM, Jakub Jelinek wrote:
> > On Thu, Oct 12, 2017 at 10:40:42AM +0200, Martin Liška wrote:
> >> --- a/gcc/cp/constexpr.c
> >> +++ b/gcc/cp/constexpr.c
> >> @@ -1175,7 +1175,12 @@ cxx_eval_builtin_function_call (const constexpr_ctx 
> >> *ctx, tree t, tree fun,
> >>{
> >>  new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
> >>   CALL_EXPR_FN (t), nargs, args);
> >> -error ("%q+E is not a constant expression", new_call);
> >> +
> >> +/* Do not allow__builtin_unreachable in constexpr function.  */
> >> +if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE)
> > 
> > As I said earlier, I think it would be better to differentiate between
> > explicit __builtin_unreachable and the implicitly added one from the patch.
> > So this could be done as
> > if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE
> > && EXPR_LOCATION (t) == BUILTINS_LOCATION)
> > 
> >> +  location_t loc = DECL_SOURCE_LOCATION (fndecl);
> >> +  if (sanitize_flags_p (SANITIZE_RETURN, fndecl))
> >> +t = ubsan_instrument_return (loc);
> >> +  else
> >> +t = build_call_expr_loc (loc, builtin_decl_explicit 
> >> (BUILT_IN_UNREACHABLE),
> > 
> > and here use BUILTINS_LOCATION instead of loc.
> > The code might be more readable by doing:
> > {
> >   tree fndecl = builtin_decl_explicit (BUILT_IN_UNREACHABLE);
> >   t = build_call_expr_loc (BUILTINS_LOCATION, fndecl, 0);
> > }
> > 
> >> +   0);
> >> +
> > 
> > Jakub
> > 
> 
> Hi.
> 
> I'm sending updated version of the patch that should address it.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> From 36f3f45d9fa42344261faf60bb3cfbe22ed262ac Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 12 Oct 2017 10:14:59 +0200
> Subject: [PATCH 1/3] Instrument function exit with __builtin_unreachable in
>  C++
> 
> gcc/c-family/ChangeLog:
> 
> 2017-10-12  Martin Liska  
> 
>   PR middle-end/82404
>   * c-opts.c (c_common_post_options): Set -Wreturn-type for C++
>   FE.
>   * c.opt: Set default value of warn_return_type.
> 
> gcc/cp/ChangeLog:
> 
> 2017-10-12  Martin Liska  
> 
>   PR middle-end/82404
>   * constexpr.c (cxx_eval_builtin_function_call): Handle
>   __builtin_unreachable call.
>   * cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
>   ...
>   (cp_maybe_instrument_return): ... this.
>   (cp_genericize): Call the function unconditionally.
> 
> gcc/fortran/ChangeLog:
> 
> 2017-10-12  Martin Liska  
> 
>   PR middle-end/82404
>   * options.c (gfc_post_options): Set default value of
>   -Wreturn-type to false.
> ---
>  gcc/c-family/c-opts.c |  3 +++
>  gcc/c-family/c.opt|  2 +-
>  gcc/cp/constexpr.c|  8 +++-
>  gcc/cp/cp-gimplify.c  | 20 ++--
>  gcc/fortran/options.c |  3 +++
>  5 files changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
> index 6bd535532d3..682d7a83ec5 100644
> --- a/gcc/c-family/c-opts.c
> +++ b/gcc/c-family/c-opts.c
> @@ -978,6 +978,9 @@ c_common_post_options (const char **pfilename)
>   flag_extern_tls_init = 1;
>  }
>  
> +  if (warn_return_type == -1)
> +warn_return_type = c_dialect_cxx () ? 1 : 0;

Here you can simply

  warn_return_type = c_dialect_cxx ();

no?

Marek


[PATCH] Fix all tests that fail with -sanitize=return.

2017-10-18 Thread Martin Liška
Hello.

This is first patch that addresses test-suite fallout. All these tests fail in 
runtime.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From f945460bccf6d54e790bf7c4cacac7cb5b915a28 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 3 Oct 2017 17:28:43 +0200
Subject: [PATCH 2/3] Fix all tests that fail with -sanitize=return.

gcc/testsuite/ChangeLog:

2017-10-18  Martin Liska  

	* c-c++-common/dfp/call-by-value.c (foo32): Return a default
	value of change return type to void.
	(foo64): Likewise.
	(foo128): Likewise.
	* g++.dg/bprob/g++-bprob-1.C: Likewise.
	* g++.dg/cpp0x/lambda/lambda-template.C (f): Likewise.
	* g++.dg/cpp0x/range-for6.C (foo): Likewise.
	* g++.dg/cpp0x/udlit-template.C: Likewise.
	* g++.dg/cpp1z/eval-order3.C (struct A): Likewise.
	(operator>>): Likewise.
	* g++.dg/expr/cond12.C (struct X): Likewise.
	(X::operator=): Likewise.
	* g++.dg/gcov/gcov-1.C: Likewise.
	* g++.dg/gcov/gcov-threads-1.C (ContentionNoDeadlock_thread): Likewise.
	* g++.dg/ipa/devirt-21.C: Likewise.
	* g++.dg/ipa/devirt-23.C: Likewise.
	* g++.dg/ipa/devirt-34.C (t): Likewise.
	* g++.dg/missing-return.C: New test. Likewise.
	* g++.dg/opt/20050511-1.C (bar): Likewise.
	* g++.dg/opt/const3.C (A::foo1): Likewise.
	(A::foo2): Likewise.
	* g++.dg/opt/pr23299.C (E::c): Likewise.
	* g++.dg/other/copy2.C (A::operator=): Likewise.
	* g++.dg/overload/addr1.C: Likewise.
	* g++.dg/pr48484.C: Likewise.
	* g++.dg/tls/thread_local3.C (thread_main): Likewise.
	* g++.dg/tls/thread_local3g.C (thread_main): Likewise.
	* g++.dg/tls/thread_local5.C (thread_main): Likewise.
	* g++.dg/tls/thread_local5g.C (thread_main): Likewise.
	* g++.dg/tls/thread_local6.C (thread_main): Likewise.
	* g++.dg/tls/thread_local6g.C (thread_main): Likewise.
	* g++.dg/torture/pr34850.C (OctetString::operator^=): Likewise.
	* g++.dg/tree-prof/pr79259.C (fn2): Likewise.
	* g++.dg/tree-ssa/pr33604.C (struct Value): Likewise.
	* g++.dg/tree-ssa/pr81408.C (struct p): Likewise.
	(av): Likewise.
	* g++.dg/warn/string1.C (test): Likewise.
---
 gcc/testsuite/c-c++-common/dfp/call-by-value.c  |  6 +++---
 gcc/testsuite/g++.dg/bprob/g++-bprob-1.C|  2 +-
 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-template.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/range-for6.C |  2 ++
 gcc/testsuite/g++.dg/cpp0x/udlit-template.C |  2 +-
 gcc/testsuite/g++.dg/cpp1z/eval-order3.C|  4 ++--
 gcc/testsuite/g++.dg/expr/cond12.C  |  8 +++-
 gcc/testsuite/g++.dg/gcov/gcov-1.C  |  2 +-
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C  |  2 ++
 gcc/testsuite/g++.dg/ipa/devirt-21.C|  2 +-
 gcc/testsuite/g++.dg/ipa/devirt-23.C|  2 +-
 gcc/testsuite/g++.dg/ipa/devirt-34.C|  2 ++
 gcc/testsuite/g++.dg/missing-return.C   |  8 
 gcc/testsuite/g++.dg/opt/20050511-1.C   |  2 ++
 gcc/testsuite/g++.dg/opt/const3.C   |  4 ++--
 gcc/testsuite/g++.dg/opt/pr23299.C  |  2 ++
 gcc/testsuite/g++.dg/other/copy2.C  | 10 +++---
 gcc/testsuite/g++.dg/overload/addr1.C   |  2 +-
 gcc/testsuite/g++.dg/pr48484.C  |  3 +++
 gcc/testsuite/g++.dg/tls/thread_local3.C|  1 +
 gcc/testsuite/g++.dg/tls/thread_local3g.C   |  1 +
 gcc/testsuite/g++.dg/tls/thread_local5.C|  1 +
 gcc/testsuite/g++.dg/tls/thread_local5g.C   |  1 +
 gcc/testsuite/g++.dg/tls/thread_local6.C|  1 +
 gcc/testsuite/g++.dg/tls/thread_local6g.C   |  1 +
 gcc/testsuite/g++.dg/torture/pr34850.C  |  6 ++
 gcc/testsuite/g++.dg/tree-prof/pr79259.C|  2 ++
 gcc/testsuite/g++.dg/tree-ssa/pr33604.C |  2 +-
 gcc/testsuite/g++.dg/tree-ssa/pr81408.C | 12 ++--
 gcc/testsuite/g++.dg/warn/string1.C |  2 ++
 30 files changed, 72 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/missing-return.C

diff --git a/gcc/testsuite/c-c++-common/dfp/call-by-value.c b/gcc/testsuite/c-c++-common/dfp/call-by-value.c
index 74aec53aefb..e7aea3076cf 100644
--- a/gcc/testsuite/c-c++-common/dfp/call-by-value.c
+++ b/gcc/testsuite/c-c++-common/dfp/call-by-value.c
@@ -5,17 +5,17 @@
 
 #include "dfp-dbg.h"
 
-int foo32 (_Decimal32 z)
+void foo32 (_Decimal32 z)
 {
   z = z + 1.0df;
 }
 
-int foo64 (_Decimal64 z)
+void foo64 (_Decimal64 z)
 {
   z = z + 1.0dd;
 }
 
-int foo128 (_Decimal128 z)
+void foo128 (_Decimal128 z)
 {
   z = z + 1.0dl;
 }
diff --git a/gcc/testsuite/g++.dg/bprob/g++-bprob-1.C b/gcc/testsuite/g++.dg/bprob/g++-bprob-1.C
index b1a1de77e98..3aafc06d51d 100644
--- a/gcc/testsuite/g++.dg/bprob/g++-bprob-1.C
+++ b/gcc/testsuite/g++.dg/bprob/g++-bprob-1.C
@@ -35,7 +35,7 @@ test_for2 (int m, int n, int o)
   return for_temp;			/* count(6) */
 }
 
-int
+void
 call_for ()
 {
   for_val1 += 

Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-10-18 Thread Martin Liška
On 10/12/2017 10:48 AM, Jakub Jelinek wrote:
> On Thu, Oct 12, 2017 at 10:40:42AM +0200, Martin Liška wrote:
>> --- a/gcc/cp/constexpr.c
>> +++ b/gcc/cp/constexpr.c
>> @@ -1175,7 +1175,12 @@ cxx_eval_builtin_function_call (const constexpr_ctx 
>> *ctx, tree t, tree fun,
>>  {
>>new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
>> CALL_EXPR_FN (t), nargs, args);
>> -  error ("%q+E is not a constant expression", new_call);
>> +
>> +  /* Do not allow__builtin_unreachable in constexpr function.  */
>> +  if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE)
> 
> As I said earlier, I think it would be better to differentiate between
> explicit __builtin_unreachable and the implicitly added one from the patch.
> So this could be done as
> if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE
> && EXPR_LOCATION (t) == BUILTINS_LOCATION)
> 
>> +  location_t loc = DECL_SOURCE_LOCATION (fndecl);
>> +  if (sanitize_flags_p (SANITIZE_RETURN, fndecl))
>> +t = ubsan_instrument_return (loc);
>> +  else
>> +t = build_call_expr_loc (loc, builtin_decl_explicit 
>> (BUILT_IN_UNREACHABLE),
> 
> and here use BUILTINS_LOCATION instead of loc.
> The code might be more readable by doing:
> {
>   tree fndecl = builtin_decl_explicit (BUILT_IN_UNREACHABLE);
>   t = build_call_expr_loc (BUILTINS_LOCATION, fndecl, 0);
> }
> 
>> + 0);
>> +
> 
>   Jakub
> 

Hi.

I'm sending updated version of the patch that should address it.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 36f3f45d9fa42344261faf60bb3cfbe22ed262ac Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 12 Oct 2017 10:14:59 +0200
Subject: [PATCH 1/3] Instrument function exit with __builtin_unreachable in
 C++

gcc/c-family/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* c-opts.c (c_common_post_options): Set -Wreturn-type for C++
	FE.
	* c.opt: Set default value of warn_return_type.

gcc/cp/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* constexpr.c (cxx_eval_builtin_function_call): Handle
	__builtin_unreachable call.
	* cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
	...
	(cp_maybe_instrument_return): ... this.
	(cp_genericize): Call the function unconditionally.

gcc/fortran/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* options.c (gfc_post_options): Set default value of
	-Wreturn-type to false.
---
 gcc/c-family/c-opts.c |  3 +++
 gcc/c-family/c.opt|  2 +-
 gcc/cp/constexpr.c|  8 +++-
 gcc/cp/cp-gimplify.c  | 20 ++--
 gcc/fortran/options.c |  3 +++
 5 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 6bd535532d3..682d7a83ec5 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -978,6 +978,9 @@ c_common_post_options (const char **pfilename)
 	flag_extern_tls_init = 1;
 }
 
+  if (warn_return_type == -1)
+warn_return_type = c_dialect_cxx () ? 1 : 0;
+
   if (num_in_fnames > 1)
 error ("too many filenames given.  Type %s --help for usage",
 	   progname);
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 13d2a59b8a5..e26fba734c0 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -960,7 +960,7 @@ C++ ObjC++ Var(warn_reorder) Warning LangEnabledBy(C++ ObjC++,Wall)
 Warn when the compiler reorders code.
 
 Wreturn-type
-C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
+C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Init(-1)
 Warn whenever a function's return type defaults to \"int\" (C), or about inconsistent return types (C++).
 
 Wscalar-storage-order
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 59192829d71..15253ffad9d 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1182,7 +1182,13 @@ cxx_eval_builtin_function_call (const constexpr_ctx *ctx, tree t, tree fun,
 	{
 	  new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
 	   CALL_EXPR_FN (t), nargs, args);
-	  error ("%q+E is not a constant expression", new_call);
+
+	  /* Do not allow__builtin_unreachable in constexpr function.  */
+	  if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE
+	  && EXPR_LOCATION (t) == BUILTINS_LOCATION)
+	error ("constexpr call flows off the end of the function");
+	  else
+	error ("%q+E is not a constant expression", new_call);
 	}
   *non_constant_p = true;
   return t;
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 262485a5c1f..014c1ee7231 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1556,10 +1556,11 @@ cp_genericize_tree (tree* t_p, bool handle_invisiref_parm_p)
 
 /* If a function that should end with a return in non-void
function doesn't obviously end with 

Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-18 Thread Aldy Hernandez
On Tue, Oct 17, 2017 at 6:05 PM, Richard Sandiford
 wrote:
> Andrew MacLeod  writes:
>> On 10/17/2017 08:18 AM, Richard Sandiford wrote:
>>> Aldy Hernandez  writes:
 Hi folks!

 Calling print_hex() on a widest_int with the most significant bit turned
 on can lead to a leading zero being printed (0x0). This produces
 confusing dumps to say the least, especially when you incorrectly assume
 an integer is NOT signed :).
>>> That's the intended behaviour though.  wide_int-based types only use as
>>> many HWIs as they need to store their current value, with any other bits
>>> in the value being a sign extension of the top bit.  So if the most
>>> significant HWI in a widest_int is zero, that HWI is there to say that
>>> the previous HWI should be zero- rather than sign-extended.
>>>
>>> So:
>>>
>>> 0x0  -> (1 << 32) - 1 to infinite precision
>>> (i.e. a positive value)
>>> 0x   -> -1
>>>
>>> Thanks,
>>> Richard
>>
>> I for one find this very confusing.  If I have a 128 bit value, I don't
>> expect to see a 132 bits.  And there are enough 0's its not obvious when
>> I look.
>
> But Aldy was talking about widest_int, which is wider than 128 bits.
> It's an approximation of infinite precision.

IMO, we should document this leading zero in print_hex, as it's not
inherently obvious.

But yes, I was talking about widest_int.  I should explain what I am
trying to accomplish, since perhaps there is a better way.

I am printing a a wide_int (bounds[i] below), but I really don't want
to print the sign extension nonsense, since it's a detail of the
underlying representation.  Currently I'm doing this to chop off the
unnecessary bits:

/* Wide ints may be sign extended to the full extent of the
   underlying HWI storage, even if the precision we care about
   is smaller.  Chop off the excess bits for prettier output.  */
signop sign = TYPE_UNSIGNED (type) ? UNSIGNED : SIGNED;
widest_int val = widest_int::from (bounds[i], sign);
val &= wi::mask (bounds[i].get_precision (), false);

if (val > 0x)
  print_hex (val, pp_buffer (buffer)->digit_buffer);
else
  print_dec (val, pp_buffer (buffer)->digit_buffer, sign);

Since I am calling print_hex() on the widest_int, I get the leading 0.

Do you recommend another way of accomplishing this?

>
> wide_int is the type to use if you want an N-bit number (for some N).
>
>> I don't think a leading 0 should be printed if "precision" bits have
>> already been printed.
>
> Does 0 get printed in that case though?  Aldy's patch skips an upper

No.

Thanks for your input Richard.
Aldy


Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-18 Thread Jan Hubicka
> > According to Agner's tables, gathers range from 12 ops (vgatherdpd)
> > to 66 ops (vpgatherdd).  I assume that CPU needs to do following:
> > 
> > 1) transfer the offsets sse->ALU unit for address generation (3 cycles
> >each, 2 ops)
> > 2) do the address calcualtion (2 ops, probably 4 ops because it does not 
> > map naturally
> >to AGU)
> > 2) do the load (7 cycles each, 2 ops)
> > 3) merge results (1 ops)
> > 
> > so I get 7 ops, not sure what remaining 5 do.
> > 
> > Agner does not account time, but According to
> > http://users.atw.hu/instlatx64/AuthenticAMD0800F11_K17_Zen_InstLatX64.txt 
> > the
> > gather time ranges from 14 cycles (vgatherpd) to 20 cycles.  Here I guess 
> > it is
> > 3+1+7+1=12 so it seems to work.
> > 
> > If you implement gather by hand, you save the SSE->address caluclation path 
> > and
> > thus you can get faster.
> 
> I see.  It looks to me Zen should disable gather/scatter then completely
> and we should implement manual gather/scatter code-generation in the
> vectorizer (or lower it in vector lowering).  It sounds like they
> only implemented it to have "complete" AVX2 support (ISTR scatter
> is only in AVX512f).

Those instructions seems similarly expensive in Intel implementation.
http://users.atw.hu/instlatx64/GenuineIntel0050654_SkylakeXeon9_InstLatX64.txt
lists latencies ranging from 18 to 32 cycles.

Of course it may also be the case that the utility is measuring gathers 
incorrectly.
according to Agner's table Skylake has optimized gathers, they used to be
12 to 34 uops on haswell and are no 4 to 5.
> 
> > > Note the most major source of impreciseness in the cost model
> > > is from vec_perm because we lack the information of the
> > > permutation mask which means we can't distinguish between
> > > cross-lane and intra-lane permutes.
> > 
> > Besides that we lack information about what operation we do (addition
> > or division?) which may be useful to pass down, especially because we do
> > have relevant information handy in the x86_cost tables.  So I am thinking
> > of adding extra parameter to the hook telling the operation.
> 
> Not sure.  The costs are all supposed to be relative to scalar cost
> and I fear we get nearer to a GIGO syndrome when adding more information
> here ;)

Yep, however there is setup cost (like loads/stores) which comes into game
as well.  I will see how far i can get by making x86 costs more "realistic"

Honza


Re: [PATCH] Fix failing test-case

2017-10-18 Thread Jan Hubicka
> Hi.
> 
> This fixes the test-case. Reason is that switch statement is not yet expanded 
> as decision tree,
> which also contains a BB with count == 2000.
> 
> Ready for trunk?
OK,
thanks

Honza
> Thanks,
> Martin
> 
> 
> 
> 

> >From 9845016dbcb4cd3160d49280c8d74b4851477496 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 18 Oct 2017 13:56:44 +0200
> Subject: [PATCH] Fix failing test-case
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-10-18  Martin Liska  
> 
>   * gcc.dg/tree-prof/switch-case-2.c: Scan IPA profile dump
>   file instead of expand. Reason is that switch statement is
>   not yet expanded as decision tree, which also contains a BB
>   with count == 2000.
> ---
>  gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c 
> b/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
> index cfedc9c6b76..9b0dfc2dbb5 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-options "-O2 -fdump-rtl-expand-all" } */
> +/* { dg-options "-O2 -fdump-ipa-profile-all" } */
>  int g;
>  
>  __attribute__((noinline)) void foo (int  n)
> @@ -36,5 +36,5 @@ int main ()
>   return 0;
>  }
>  /* autofdo cannot do that precise execution numbers: */
> -/* { dg-final-use-not-autofdo { scan-rtl-dump-times ";; basic 
> block\[^\\n\]*count 4000" 2 "expand"} } */
> -/* { dg-final-use-not-autofdo { scan-rtl-dump-times ";; basic 
> block\[^\\n\]*count 2000" 1 "expand"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump-times ";;   basic 
> block\[^\\n\]*count 4000" 2 "profile"} } */
> +/* { dg-final-use-not-autofdo { scan-ipa-dump-times ";;   basic 
> block\[^\\n\]*count 2000" 1 "profile"} } */
> -- 
> 2.14.2
> 



[PATCH] Fix failing test-case

2017-10-18 Thread Martin Liška
Hi.

This fixes the test-case. Reason is that switch statement is not yet expanded 
as decision tree,
which also contains a BB with count == 2000.

Ready for trunk?
Thanks,
Martin




>From 9845016dbcb4cd3160d49280c8d74b4851477496 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 18 Oct 2017 13:56:44 +0200
Subject: [PATCH] Fix failing test-case

gcc/testsuite/ChangeLog:

2017-10-18  Martin Liska  

	* gcc.dg/tree-prof/switch-case-2.c: Scan IPA profile dump
	file instead of expand. Reason is that switch statement is
	not yet expanded as decision tree, which also contains a BB
	with count == 2000.
---
 gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c b/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
index cfedc9c6b76..9b0dfc2dbb5 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-rtl-expand-all" } */
+/* { dg-options "-O2 -fdump-ipa-profile-all" } */
 int g;
 
 __attribute__((noinline)) void foo (int  n)
@@ -36,5 +36,5 @@ int main ()
  return 0;
 }
 /* autofdo cannot do that precise execution numbers: */
-/* { dg-final-use-not-autofdo { scan-rtl-dump-times ";; basic block\[^\\n\]*count 4000" 2 "expand"} } */
-/* { dg-final-use-not-autofdo { scan-rtl-dump-times ";; basic block\[^\\n\]*count 2000" 1 "expand"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times ";;   basic block\[^\\n\]*count 4000" 2 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times ";;   basic block\[^\\n\]*count 2000" 1 "profile"} } */
-- 
2.14.2



Re: [PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598, take 2)

2017-10-18 Thread Ian Lance Taylor
Jakub Jelinek  writes:

> 2017-10-18  Jakub Jelinek  
>
>   PR lto/82598
>   * simple-object.c (handle_lto_debug_sections): Copy over also
>   .note.GNU-stack section with unchanged name.
>   * simple-object-elf.c (SHF_EXECINSTR): Define.
>   (simple_object_elf_copy_lto_debug_section): Drop SHF_EXECINSTR bit
>   on .note.GNU-stack section.

This is OK.

Thanks.

Ian


Re: [openacc, testsuite, committed] Enable libgomp.oacc-*/declare-*.{c,f90} for non-nvidia devices

2017-10-18 Thread Tom de Vries

On 10/17/2017 06:33 PM, Mike Stump wrote:

On Oct 17, 2017, at 8:34 AM, Tom de Vries  wrote:



OK, if full testing is ok?

I believe this was fully intentional and the presence/absence of
explicit dg-do run can then be used to decide if it should loop through
options or not.


I don't see an explicit mention of ignoring dg-do-what-default in the original 
submission


So, my guidance would be for fortran to behave generally speaking, like other 
languages and for this type of code to be predictable across languages and 
library components.  That's as far I can go, so, I'll have to punt the decision 
to domain experts to decide which direction they _want_ to go in.  If they 
don't know (or don't care or can't agree and want someone to break a tie), my 
inclination is to approve it.



Hi Mike,

the problem with the patch in the current form, is that it changes 
behaviour for f.i. libgomp.fortran/condinc1.f.


The test-case contains no dg-do directive, so:
- without the patch, it uses -O
- with the patch, it uses the torture options

Specifying dg-options "-O" fixes things in the sense -O will override 
the -O options specified by the torture options, but we'll still 
cycle over the options (and also report them as -O0, -O1, etc).


This updated patch introduces a directive dg-no-torture-options, and 
uses that (51 times) to keep the same behaviour in f.i. 
libgomp.fortran/condinc1.f. [ And I've got a follow-up patch removing 
the now superfluous 'dg-do run' 282 times. ]


[ If a new directive is not desirable, another way to fix this could be 
to support '{ dg-do run-no-torture }'. ]


Furthermore, we could f.i. add an argument, and rename the new directive 
into dg-override-torture-options or some such, and allow 
dg-override-torture-options "-O2", which we could use instead of 
dg-options "-O2", and which would fix the problem related to dg-options 
described above.


Tested on x86_64.

WDYT?

Thanks,
- Tom
Add dg-no-torture-options

files=$(find libgomp/testsuite/*fortran -type f \
	   | egrep -v '\.exp' \
	   | xargs grep -L dg-do)

for f in $files; do
sed -i '1 i\! { dg-no-torture-options }' $f
done

2017-10-18  Tom de Vries  

	* lib/gfortran-dg.exp (gfortran-dg-runtest): Handle
	dg-no-torture-options.  Handle dg-do-what-default.

	* testsuite/libgomp.fortran/condinc1.f: Add dg-no-torture-options.
	* testsuite/libgomp.fortran/condinc1.inc: Same.
	* testsuite/libgomp.fortran/condinc2.f: Same.
	* testsuite/libgomp.fortran/condinc3.f90: Same.
	* testsuite/libgomp.fortran/condinc4.f90: Same.
	* testsuite/libgomp.fortran/lastprivate1.f90: Same.
	* testsuite/libgomp.fortran/lastprivate2.f90: Same.
	* testsuite/libgomp.fortran/nestedfn4.f90: Same.
	* testsuite/libgomp.fortran/omp_cond1.f: Same.
	* testsuite/libgomp.fortran/omp_cond2.f: Same.
	* testsuite/libgomp.fortran/omp_cond3.F90: Same.
	* testsuite/libgomp.fortran/omp_cond4.F90: Same.
	* testsuite/libgomp.fortran/omp_hello.f: Same.
	* testsuite/libgomp.fortran/omp_orphan.f: Same.
	* testsuite/libgomp.fortran/omp_reduction.f: Same.
	* testsuite/libgomp.fortran/omp_workshare1.f: Same.
	* testsuite/libgomp.fortran/omp_workshare2.f: Same.
	* testsuite/libgomp.fortran/pr25219.f90: Same.
	* testsuite/libgomp.fortran/pr28390.f: Same.
	* testsuite/libgomp.fortran/pr35130.f90: Same.
	* testsuite/libgomp.fortran/strassen.f90: Same.
	* testsuite/libgomp.fortran/tabs1.f90: Same.
	* testsuite/libgomp.fortran/tabs2.f: Same.
	* testsuite/libgomp.fortran/task2.f90: Same.
	* testsuite/libgomp.fortran/taskgroup1.f90: Same.
	* testsuite/libgomp.fortran/taskloop1.f90: Same.
	* testsuite/libgomp.fortran/workshare1.f90: Same.
	* testsuite/libgomp.fortran/workshare2.f90: Same.
	* testsuite/libgomp.oacc-fortran/abort-1.f90: Same.
	* testsuite/libgomp.oacc-fortran/abort-2.f90: Same.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Same.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Same.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-1.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-2.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-3.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-4.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-5.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-6.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-7.f: Same.
	* testsuite/libgomp.oacc-fortran/data-already-8.f: Same.
	* testsuite/libgomp.oacc-fortran/lib-1.f90: Same.
	* testsuite/libgomp.oacc-fortran/lib-2.f: Same.
	* testsuite/libgomp.oacc-fortran/lib-3.f: Same.
	* testsuite/libgomp.oacc-fortran/map-1.f90: Same.
	* testsuite/libgomp.oacc-fortran/pointer-align-1.f90: Same.
	* testsuite/libgomp.oacc-fortran/pr68813.f90: Same.
	* testsuite/libgomp.oacc-fortran/pr70289.f90: Same.
	* testsuite/libgomp.oacc-fortran/pr70643.f90: Same.
	* testsuite/libgomp.oacc-fortran/subarrays-1.f90: Same.
	* 

Re: [patch, fortran] Fix PR 82567

2017-10-18 Thread Mikael Morin

Le 18/10/2017 à 04:05, Steve Kargl a écrit :

On Tue, Oct 17, 2017 at 06:14:16PM -0700, Jerry DeLisle wrote:

On 10/17/2017 03:36 PM, Thomas Koenig wrote:

Hello world,

this patch fixes a regression with long compile times,
which came about due to our handling of array constructors
at compile time.  This, togeteher with a simplification in
front end optimization, led to long compile times and large
code.

Regression-tested. OK for trunk and the other affected branches?



Well I know 42 is the answer to the ultimate question of the universe so this
must be OK.  I just don't know what the question is.

OK and thanks,

Jerry

+#define CONSTR_LEN_MAX 42


Actually, I was wondering about the choice myself.  With
most common hardware having fairly robust L1 and L2 cache
sizes, a double precision array constructor with 42
elements only occupies 336 bytes.  Seems small.


There is a -fmax-array-constructor=n option. Can’t we use it for the limit?


Re: [PATCH] Canonicalize constant multiplies in division

2017-10-18 Thread Richard Biener
On Tue, Oct 17, 2017 at 6:32 PM, Wilco Dijkstra  wrote:
> This patch implements some of the optimizations discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Canonicalize x / (C1 * y) into (x * C2) / y.
>
> This moves constant multiplies out of the RHS of a division in order
> to allow further simplifications (such as (C1 * x) / (C2 * y) ->
> (C3 * x) / y) and to enable more reciprocal CSEs.
>
> OK for commit?
>
> ChangeLog
> 2017-10-17  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Canonicalize constant multiplies in division.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/cse_recip.c: New test.
> --
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> ade851f78fb9ac6ce03b752f63e03f3b5a19cda9..532fabf51ce8a45d54147a3ae0b3917e22b1a4d0
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -342,10 +342,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(negate @0)))
>
>  (if (flag_reciprocal_math)
> - /* Convert (A/B)/C to A/(B*C)  */
> + /* Convert (A/B)/C to A/(B*C). */
>   (simplify
>(rdiv (rdiv:s @0 @1) @2)
> -   (rdiv @0 (mult @1 @2)))
> +  (rdiv @0 (mult @1 @2)))
> +
> + /* Canonicalize x / (C1 * y) to (x * C2) / y.  */
> + (if (optimize)

why if (optimize) here?  The pattern you removed has no
such check.  As discussed this may undo CSE of C1 * y
so please check for a single-use on the mult with :s

> +  (simplify
> +   (rdiv @0 (mult @1 REAL_CST@2))
> +   (if (!real_zerop (@1))

why this check?  The pattern below didn't have it.

Richard.

> +(with
> + { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @2); }
> + (if (tem)
> +  (rdiv (mult @0 { tem; } ) @1))
>
>   /* Convert A/(B/C) to (A/B)*C  */
>   (simplify
> @@ -628,15 +638,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (tem)
>   (rdiv { tem; } @1)
>
> -/* Convert C1/(X*C2) into (C1/C2)/X  */
> -(simplify
> - (rdiv REAL_CST@0 (mult @1 REAL_CST@2))
> -  (if (flag_reciprocal_math)
> -   (with
> -{ tree tem = const_binop (RDIV_EXPR, type, @0, @2); }
> -(if (tem)
> - (rdiv { tem; } @1)
> -
>  /* Simplify ~X & X as zero.  */
>  (simplify
>   (bit_and:c (convert? @0) (convert? (bit_not @0)))
> diff --git a/gcc/testsuite/gcc.dg/cse_recip.c 
> b/gcc/testsuite/gcc.dg/cse_recip.c
> new file mode 100644
> index 
> ..20ed529c33ebecc911fb540a8b2b597bba0023e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/cse_recip.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fdump-tree-optimized" } */
> +
> +void
> +cse_recip (float x, float y, float *a)
> +{
> +  a[0] = y / (5 * x);
> +  a[1] = y / (3 * x);
> +  a[2] = y / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " / " 1 "optimized" } } */
>


Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Segher Boessenkool
On Wed, Oct 18, 2017 at 11:10:48AM +, Bernd Edlinger wrote:
> A memory clobber would also make rtx_moveable_p return false,
> thru the following case:
> 
> case MEM:
>   if (type == OP_IN && MEM_READONLY_P (x))
> return rtx_moveable_p ( (x, 0), OP_IN);
>   return false;
> 
> ...
> 
> case CLOBBER:
>   return rtx_moveable_p (_DEST (x), OP_OUT);
> 
> 
> because that memory clobber is in a parallel statement
> together with the ASM_OUTPUT.
> 
> Right?

Yes, that looks correct.  And a memory clobber of course _should_ make
the insn non-moveable as well -- it's an unknown side effect all by
itself :-)


Segher


Re: [PATCH] Canonicalize negates in division

2017-10-18 Thread Richard Biener
On Tue, Oct 17, 2017 at 6:30 PM, Wilco Dijkstra  wrote:
> This patch implements some of the optimizations discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Canonicalize x / (- y) into (-x) / y.
>
> This moves negates out of the RHS of a division in order to
> allow further simplifications and potentially more reciprocal CSEs.
>
> OK for commit?

Ok.

Richard.

> ChangeLog
> 2017-10-17  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Canonicalize negate in division.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/div_neg: New test.
> --
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> cb48f079b4a310272e49cc319a1b3b0ff2023ba4..ade851f78fb9ac6ce03b752f63e03f3b5a19cda9
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -352,6 +352,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(rdiv @0 (rdiv:s @1 @2))
> (mult (rdiv @0 @1) @2)))
>
> +/* Simplify x / (- y) to -x / y.  */
> +(simplify
> + (rdiv @0 (negate @1))
> + (rdiv (negate @0) @1))
> +
>  (if (flag_unsafe_math_optimizations)
>/* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
>(for op (lt le gt ge)
> diff --git a/gcc/testsuite/gcc.dg/div_neg.c b/gcc/testsuite/gcc.dg/div_neg.c
> new file mode 100644
> index 
> ..da499cda2fba6c943ec99c55cae2ea389f9e1cca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/div_neg.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +float
> +div_neg (float x, float y)
> +{
> +  return (-x / y) * (x / -y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times " / " 1 "optimized" } } */


Re: [PATCH PR82574]Check that datref must be executed exactly once per iteration against outermost loop in nest

2017-10-18 Thread Richard Biener
On Tue, Oct 17, 2017 at 4:50 PM, Bin Cheng  wrote:
> Hi,
> The patch fixes ICE reported in PR82574.  In order to distribute builtin 
> partition, we need
> to check that data reference must be executed exactly once per iteration.  In 
> distribution
> for loop nest, this has to be checked against each loop in the nest.  One 
> optimization can
> be done is we only need to check against the outermost loop for perfect nest.
> Bootstrap and test on x86_64.  Is it OK?

+  /* Data reference must be executed exactly once per iteration of each
+ loop in the loop nest.  We only need to check dominant information

dominance

Ok with that fixed.

Richard.

> Thanks,
> bin
> 2017-10-17  Bin Cheng  
>
> PR tree-optimization/82574
> * tree-loop-distribution.c (find_single_drs): New parameter.  Check
> that data reference must be executed exactly once per iteration
> against the outermost loop in nest.
> (classify_partition): Update call to above function.
>
> gcc/testsuite
> 2017-10-17  Bin Cheng  
>
> PR tree-optimization/82574
> * gcc.dg/tree-ssa/pr82574.c: New test.


Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions

2017-10-18 Thread Richard Biener
On Tue, Oct 17, 2017 at 4:28 PM, Bin.Cheng  wrote:
> On Mon, Oct 16, 2017 at 5:27 PM, Bin.Cheng  wrote:
>> On Mon, Oct 16, 2017 at 5:00 PM, Bin.Cheng  wrote:
>>> On Mon, Oct 16, 2017 at 2:56 PM, Bin.Cheng  wrote:
 On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener
  wrote:
> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
>> Hi,
>> This patch merges adjacent memset builtin partitions if possible.  It is
>> a useful special case optimization transforming below code:
>>
>> #define M (256)
>> #define N (512)
>>
>> struct st
>> {
>>   int a[M][N];
>>   int c[M];
>>   int b[M][N];
>> };
>>
>> void
>> foo (struct st *p)
>> {
>>   for (unsigned i = 0; i < M; ++i)
>> {
>>   p->c[i] = 0;
>>   for (unsigned j = N; j > 0; --j)
>> {
>>   p->a[i][j - 1] = 0;
>>   p->b[i][j - 1] = 0;
>> }
>> }
>>
>> into a single memset function call, rather than three calls initializing
>> the structure field by field.
>>
>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>
> +  /* Insertion sort is good enough for the small sub-array.  */
> +  for (k = i + 1; k < j; ++k)
> +   {
> + part1 = (*partitions)[k];
> + for (l = k; l > i; --l)
> +   {
> + part2 = (*partitions)[l - 1];
> + if (part1->builtin->dst_base_offset
> +   >= part2->builtin->dst_base_offset)
> +   break;
> +
> + (*partitions)[l] = part2;
> +   }
> + (*partitions)[l] = part1;
> +   }
>
> so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
> to write a qsort helper for this but I can't wrap my head around the above
> in 5 minutes so ... please!
 Hmm, I thought twice about this and now I believe stable sorting (thus
 insertion sort)
 is required here.  Please see below for explanation.

>
> You don't seem to check the sizes of the memsets given that they
> obviously cannot overlap(!?)
 Yes, given it's quite special case transformation, I did add code
 checking overlap cases.
>
> Also why handle memset and not memcpy/memmove or combinations of
> them (for sorting)?
>
>   for ()
>{
>   p->a[i] = 0;
>   p->c[i] = q->c[i];
>   p->b[i] = 0;
>}
>
> with a and b adjacent.  I suppose p->c could be computed by a
> non-builtin partition as well.
 Yes, the two memset builtin partitions can be merged in this case, but...
> So don't we want to see if dependences allow sorting all builtin
> partitions next to each other
> as much as possible?  (maybe we do that already?)
 The answer for this, above partition merging and use of qsort is no.
 I think all the three are the same question here.  For now we only do
 topological sort for partitions.  To maximize parallelism (either by 
 merging
 normal parallel partitions or merging builtin partitions) requires 
 fine-tuned
 sorting between partitions that doesn't dependence on each other.
 In order to sort all memset/memcpy/memmove, we need check dependence
 between all data references between different partitions.  For example, I
 created new test ldist-36.c illustrating sorting memcpy along with memset
 would generate wrong code because dependence is broken.  It's the same
 for qsort.  In extreme case, if the same array is set twice with different 
 rhs
 value, the order between the two sets needs to be preserved.  
 Unfortunately,
 qsort is unstable and could reorder different sets.  This would break 
 output
 dependence.
 At the point of this function, dependence graph is destroyed, we can't do
 much in addition to special case handling for memset.  Full solution would
 require a customized topological sorting process.

 So, this updated patch keeps insertion sort with additional comment 
 explaining
 why.  Also two test cases added showing when memset partitions should be
 merged (we can't for now) and when memset partitions should not be merged.
>>> Hmm, I can factor out the sorting loop into a function, that might
>>> make it easier
>>> to read.
>> Okay, I will use std::stable_sort in this case, that's exactly what we
>> want for this case.
> Hi Richard,
> This is the 3rd version of patch, it uses std::stable_sort which gives
> stable sort and code simplicity.
> Bootstrap and test.  Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-10-17  Bin Cheng  
>
> * tree-loop-distribution.c (INCLUDE_ALGORITHM): New header file.
> 

Re: Add an alternative vector loop iv mechanism

2017-10-18 Thread Richard Biener
On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
 wrote:
> Normally we adjust the vector loop so that it iterates:
>
>(original number of scalar iterations - number of peels) / VF
>
> times, enforcing this using an IV that starts at zero and increments
> by one each iteration.  However, dividing by VF would be expensive
> for variable VF, so this patch adds an alternative in which the IV
> increments by VF each iteration instead.  We then need to take care
> to handle possible overflow in the IV.

Hmm, why do you need to handle possible overflow?  Doesn't the
original loop have a natural IV that evolves like this?  After all we
can compute an expression for niters of the scalar loop.

> The new mechanism isn't used yet; a later patch replaces the
> "if (1)" with a check for variable VF.  If the patch is OK, I'll
> hold off applying it until the follow-on is ready to go in.

I indeed don't like code that isn't exercised.  Otherwise looks reasonable.

Thanks,
Richard.

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install when the time comes?
>
> Richard
>
>
> 2017-10-13  Richard Sandiford  
>
> gcc/
> * tree-vect-loop-manip.c: Include gimple-fold.h.
> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
> niters_maybe_zero parameters.  Handle other cases besides a step of 1.
> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
> Add a path that uses a step of VF instead of 1, but disable it
> for now.
> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
> and niters_no_overflow parameters.  Update calls to
> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
> Create a new SSA name if the latter choses to use a ste other
> than zero, and return it via niters_vector_mult_vf_var.
> * tree-vect-loop.c (vect_transform_loop): Update calls to
> vect_do_peeling, vect_gen_vector_loop_niters and
> slpeel_make_loop_iterate_ntimes.
> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling)
> (vect_gen_vector_loop_niters): Update declarations after above 
> changes.
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
> +++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
>  #include "tree-ssa-loop-ivopts.h"
> +#include "gimple-fold.h"
>
>  /*
>Simple Loop Peeling Utilities
> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
> gimple_bb (update_phi));
>  }
>
> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV
> -   that starts at zero, increases by one and its limit is NITERS.
> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
> +   where NITERS is known to be outside the range [1, STEP - 1].
> +   This is equivalent to making the loop execute NITERS / STEP
> +   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
> +   where M is the precision of NITERS.
> +
> +   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
> +   to be >= STEP.  In the latter case N is always NITERS / STEP.
> +
> +   If FINAL_IV is nonnull, it is an SSA name that should be set to
> +   N * STEP on exit from the loop.
>
> Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
>
>  void
> -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
> +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step,
> +tree final_iv, bool niters_maybe_zero)
>  {
>tree indx_before_incr, indx_after_incr;
>gcond *cond_stmt;
>gcond *orig_cond;
> +  edge pe = loop_preheader_edge (loop);
>edge exit_edge = single_exit (loop);
>gimple_stmt_iterator loop_cond_gsi;
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
> -  tree init = build_int_cst (TREE_TYPE (niters), 0);
> -  tree step = build_int_cst (TREE_TYPE (niters), 1);
>source_location loop_loc;
>enum tree_code code;
> +  tree niters_type = TREE_TYPE (niters);
>
>orig_cond = get_loop_exit_condition (loop);
>gcc_assert (orig_cond);
>loop_cond_gsi = gsi_for_stmt (orig_cond);
>
> +  tree init, limit;
> +  if (!niters_maybe_zero && integer_onep (step))
> +{
> +  /* In this case we can use a simple 0-based IV:
> +
> +A:
> +  x = 0;
> +  do
> +{
> +  ...
> +  x += 1;
> +}
> +  while (x < NITERS);  */
> +  code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR;
> +  

Re: [patch] Fix PR debug/82509

2017-10-18 Thread Richard Biener
On Thu, Oct 12, 2017 at 10:51 PM, Eric Botcazou  wrote:
> Hi,
>
> this PR reports a couple of problems with the support of the DW_AT_endianity
> attribute associated with the scalar_storage_order source attribute: it does
> not persist through typedefs and it can contaminate native order DIEs.
>
> The attached patch revamps it by associating native order DIEs and reverse
> order DIEs into adjacent pairs for base types, as well as looking through
> typedefs for base types with reverse order.  This makes it possible to have a
> single reverse order DIE for each base type and look it up efficiently.
>
> Tested on x86_64-suse-linux, OK for the mainline?  What about the 7 branch?

Hmm.  It makes tracking DIE builds difficult now that not all allocations go
through new_die anymore.  Can you instead split out a new_die_raw
function with just the allocation and the die_tag initialization?  Or make
!parent_die && !t this mode, thus add

  if (parent_die != NULL)
add_child_die (parent_die, die);
  else if (! t)
return die;
  else
 {

?  Otherwise the patch looks sensible.

Thanks,
Richard.

>
> 2017-10-12  Eric Botcazou  
>
> PR debug/82509
> * dwarf2out.c (base_type_die): Remove early return for corner cases.
> Allocate the new DIE manually and do not call add_pubtype on it.
> (is_base_type): Remove ERROR_MARK and return 0 for VOID_TYPE.
> (modified_type_die): Adjust the lookup for reverse order DIEs.  Skip
> typedefs for base types with DW_AT_endianity.  Make sure a DIE with
> native order exists for base types, attach the DIE manually and call
> add_pubtype on it.  Do not equate a reverse order DIE to the type.
>
>
> 2017-10-12  Eric Botcazou  
>
> * gcc.dg/debug/dwarf2/sso.c: Rename into...
> * gcc.dg/debug/dwarf2/sso-1.c: ...this.
> * gcc.dg/debug/dwarf2/sso-2.c: New test.
> * gcc.dg/debug/dwarf2/sso-3.c: Likewise.
>
> --
> Eric Botcazou


[PATCH][OBVIOUS] Fix -Wimplicit-fallthrough in combine.c

2017-10-18 Thread Martin Liška
Hi.

I'm going to install the patch that suppresses -O0 warning:

../../gcc/combine.c: In function ‘rtx_code simplify_compare_const(rtx_code, 
machine_mode, rtx, rtx_def**)’:
../../gcc/combine.c:11789:7: warning: this statement may fall through 
[-Wimplicit-fallthrough=]
   if (const_op > 0)
   ^~
../../gcc/combine.c:11808:5: note: here
 case LEU:
 ^~~~
../../gcc/combine.c:11826:7: warning: this statement may fall through 
[-Wimplicit-fallthrough=]
   if (const_op > 1)
   ^~
../../gcc/combine.c:11846:5: note: here
 case GTU:
 ^~~~

Martin

gcc/ChangeLog:

2017-10-18  Martin Liska  

* combine.c (simplify_compare_const): Add gcc_fallthrough.
---
 gcc/combine.c | 2 ++
 1 file changed, 2 insertions(+)


diff --git a/gcc/combine.c b/gcc/combine.c
index 3b96d86bdb3..757ae6fc93e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11791,6 +11791,7 @@ simplify_compare_const (enum rtx_code code, machine_mode mode,
 	  const_op -= 1;
 	  code = LEU;
 	  /* ... fall through ...  */
+	  gcc_fallthrough ();
 	}
   /* (unsigned) < 0x8000 is equivalent to >= 0.  */
   else if (is_a  (mode, _mode)
@@ -11828,6 +11829,7 @@ simplify_compare_const (enum rtx_code code, machine_mode mode,
 	  const_op -= 1;
 	  code = GTU;
 	  /* ... fall through ...  */
+	  gcc_fallthrough ();
 	}
 
   /* (unsigned) >= 0x8000 is equivalent to < 0.  */



Re: [PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Bernd Edlinger
Hi Segher,

the patch looks ok for me.
Just for my understanding:
A memory clobber would also make rtx_moveable_p return false,
thru the following case:

case MEM:
  if (type == OP_IN && MEM_READONLY_P (x))
return rtx_moveable_p ( (x, 0), OP_IN);
  return false;

...

case CLOBBER:
  return rtx_moveable_p (_DEST (x), OP_OUT);


because that memory clobber is in a parallel statement
together with the ASM_OUTPUT.

Right?


Bernd.

[C++ Patch] PR 80991 ("ICE with __is_trivially_constructible in template")

2017-10-18 Thread Paolo Carlini

Hi,

a rather straightforward issue that we didn't notice so far only because 
in all our uses of __is_trivially_constructible either neither type is 
dependent or both are (eg, in library uses). The below handles in the 
obvious way a possible TREE_LIST - built node by node by 
cp_parser_trait_expr in the variadic case - as TRAIT_EXPR_TYPE2. Tested 
x86_64-linux. Seems suitable for the branches too.


Thanks, Paolo.

///

/cp
2017-10-18  Paolo Carlini  

PR c++/80991
* pt.c (value_dependent_expression_p, [TRAIT_EXPR]): Handle
a TREE_LIST as TRAIT_EXPR_TYPE2.

/testsuite
2017-10-18  Paolo Carlini  

PR c++/80991
* g++.dg/ext/is_trivially_constructible5.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 253842)
+++ cp/pt.c (working copy)
@@ -24019,8 +24019,21 @@ value_dependent_expression_p (tree expression)
 case TRAIT_EXPR:
   {
tree type2 = TRAIT_EXPR_TYPE2 (expression);
-   return (dependent_type_p (TRAIT_EXPR_TYPE1 (expression))
-   || (type2 ? dependent_type_p (type2) : false));
+
+   if (dependent_type_p (TRAIT_EXPR_TYPE1 (expression)))
+ return true;
+
+   if (!type2)
+ return false;
+
+   if (TREE_CODE (type2) != TREE_LIST)
+ return dependent_type_p (type2);
+
+   for (; type2; type2 = TREE_CHAIN (type2))
+ if (dependent_type_p (TREE_VALUE (type2)))
+   return true;
+
+   return false;
   }
 
 case MODOP_EXPR:
Index: testsuite/g++.dg/ext/is_trivially_constructible5.C
===
--- testsuite/g++.dg/ext/is_trivially_constructible5.C  (nonexistent)
+++ testsuite/g++.dg/ext/is_trivially_constructible5.C  (working copy)
@@ -0,0 +1,12 @@
+// PR c++/80991
+// { dg-do compile { target c++11 } }
+
+template void foo()
+{
+  static_assert(__is_trivially_constructible(int, int), "");
+}
+
+void bar()
+{
+  foo();
+}


Re: [PATCH] Simplify floating point comparisons

2017-10-18 Thread Richard Biener
On Tue, Oct 17, 2017 at 6:28 PM, Wilco Dijkstra  wrote:
> This patch implements some of the optimizations discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Simplify (C / x > 0.0) into x > 0.0.
>
> If C is negative the comparison is reversed.
>
> Simplify (x * C1) > C2 into x > (C2 / C1).
>
> Again, if C1 is negative the comparison is reversed.
> Both transformations are only done with -funsafe-math-optimizations,
> the constant is non-zero, and not a NaN.
>
> OK for commit?
>
> ChangeLog
> 2017-10-17  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Simplify floating point comparisons.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/associate_comparison_1.c: New test.
> --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> e58a65af59b44a6b82ed8705f62966c5e6f251ac..cb48f079b4a310272e49cc319a1b3b0ff2023ba4
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -352,6 +352,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(rdiv @0 (rdiv:s @1 @2))
> (mult (rdiv @0 @1) @2)))
>
> +(if (flag_unsafe_math_optimizations)
> +  /* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
> +  (for op (lt le gt ge)
> +   neg_op (gt ge lt le)
> +(simplify
> +  (op (rdiv REAL_CST@0 @1) real_zerop@2)
> +  (switch
> +   (if (real_less (, TREE_REAL_CST_PTR (@0)))

Note that real_less (0., +Inf) so I think you either need to check C is 'normal'
or ! HONOR_INFINITIES.

There's also the underflow issue I guess this is what -funsafe-math-optimization
is for.  I think ignoring underflows is dangerous though.

> +   (op @1 @2))
> +   /* For C < 0, use the inverted operator.  */
> +   (if (real_less (TREE_REAL_CST_PTR (@0), ))
> +   (neg_op @1 @2))
> +
>  /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
>  (for div (trunc_div ceil_div floor_div round_div exact_div)
>   (simplify
> @@ -3546,6 +3559,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (rdiv @2 @1))
> (rdiv (op @0 @2) @1)))
>
> + (for cmp (lt le gt ge)
> +  neg_cmp (gt ge lt le)
> +  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
> +  (simplify
> +   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
> +   (with
> +{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
> +(if (tem)
> + (switch
> +  (if (real_less (, TREE_REAL_CST_PTR (@1)))
> +   (cmp @0 { tem; }))
> +  (if (real_less (TREE_REAL_CST_PTR (@1), ))
> +   (neg_cmp @0 { tem; })))
> +

Drops possible overflow/underflow in x * C1 and may create underflow
or overflow with C2/C1
which you should detect here at least.

Existing overflows may be guarded against with a HONOR_INFINITIES check.

When overflow/underflow can be disregarded is there any reason remaining to
make this guarded by flag_unsafe_math_optimizations?  Are there any cases
where rounding issues can flip the comparison result?

Richard.

>   /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
>   (for root (SQRT CBRT)
>(simplify
> diff --git a/gcc/testsuite/gcc.dg/associate_comparison_1.c 
> b/gcc/testsuite/gcc.dg/associate_comparison_1.c
> new file mode 100644
> index 
> ..d051f052e13812c91cbd2d559bf2af8fae128ee1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/associate_comparison_1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized" } */
> +
> +int
> +cmp_mul_1 (float x)
> +{
> +  return x * 3 <= 100;
> +}
> +
> +int
> +cmp_mul_2 (float x)
> +{
> +  return x * -5 > 100;
> +}
> +
> +int
> +div_cmp_1 (float x, float y)
> +{
> +  return x / 3 <= y;
> +}
> +
> +int
> +div_cmp_2 (float x, float y)
> +{
> +  return x / 3 <= 1;
> +}
> +
> +int
> +inv_cmp (float x)
> +{
> +  return 5 / x >= 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */
>


[PATCH] ira: volatile asm's are not moveable (PR82602)

2017-10-18 Thread Segher Boessenkool
A volatile asm statement can not be moved (relative to other volatile
asm, etc.), but IRA could do it nevertheless.  This patch fixes it.

Testing on powerpc64-linux {-m32,-m64}; okay if it succeeds?  Also
for backports?


Segher


2017-10-18  Segher Boessenkool  

PR rtl-optimization/82602
* ira.c (rtx_moveable_p): Return false for volatile asm.

---
 gcc/ira.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/ira.c b/gcc/ira.c
index 046ce3b..8c93d3d 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -4400,6 +4400,12 @@ rtx_moveable_p (rtx *loc, enum op_type type)
 for a reason.  */
   return false;
 
+case ASM_OPERANDS:
+  /* The same is true for volatile asm: it has unknown side effects, it
+ cannot be moved at will.  */
+  if (MEM_VOLATILE_P (x))
+   return false;
+
 default:
   break;
 }
-- 
1.8.3.1



Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-10-18 Thread Richard Biener
On Wed, Oct 18, 2017 at 5:34 AM, Martin Sebor  wrote:
> While testing my latest -Wrestrict changes I noticed a number of
> opportunities to improve the -Warray-bounds warning.  Attached
> is a patch that implements a solution for the following subset
> of these:
>
> PR tree-optimization/82596 - missing -Warray-bounds on an out-of
>   bounds index into string literal
> PR tree-optimization/82588 - missing -Warray-bounds on an excessively
>   large index
> PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
>   inner indices
>
> The patch also adds more detail to the -Warray-bounds diagnostics
> to make it easier to see the cause of the problem.
>
> Richard, since the changes touch tree-vrp.c, I look in particular
> for your comments.

+  /* Accesses to trailing arrays via pointers may access storage
+beyond the types array bounds.  For such arrays, or for flexible
+array members as well as for other arrays of an unknown size,
+replace the upper bound with a more permissive one that assumes
+the size of the largest object is SSIZE_MAX.  */

I believe handling those cases are somewhat academic, but ...

+  tree eltype = TREE_TYPE (ref);
+  tree eltsize = TYPE_SIZE_UNIT (eltype);

needs to use array_ref_element_size.  Note that this size can be
non-constant in which case you now ICE so you have to check
this is an INTEGER_CST.

+  tree maxbound = TYPE_MAX_VALUE (ssizetype);

please don't use ssizetype - sizetype can be of different precision
than pointers (and thus objects).  size_type_node would come
close (maps to size_t), eventually ptrdiff_type_node is also
defined by all frontends.

+  up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize);
+

int_const_binop if you insist on using trees...

+  tree arg = TREE_OPERAND (ref, 0);
+  tree_code code = TREE_CODE (arg);
+  if (code == COMPONENT_REF)
+   {
+ HOST_WIDE_INT off;
+ if (tree base = get_addr_base_and_unit_offset (ref, ))
+   up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
+  TYPE_SIZE_UNIT (TREE_TYPE (base)));
+ else
+   return;

so this gives up on a.b[i].c.d[k] (ok, array_at_struct_end_p will be false).
simply not subtracting anyhing instead of returning would be conservatively
correct, no?  Likewise subtracting the offset of the array for all "previous"
variably indexed components with assuming the lowest value for the index.
But as above I think compensating for the offset of the array within the object
is academic ... ;)

+  else if (code == STRING_CST)
+   up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));

that one is more interesting -- why's the TYPE_DOMAIN of the STRING_CST lacking
a max value?  Iff we use build_string_literal it should have the proper type.

Thanks,
Richard.

>
> Thanks
> Martin


Re: [RFC] Sanitizers difference in between GCC and LLVM

2017-10-18 Thread Martin Liška
On 10/18/2017 11:16 AM, Jakub Jelinek wrote:
> On Wed, Oct 18, 2017 at 11:00:30AM +0200, Martin Liška wrote:
>> Hi.
>>
>> I would like to use this thread to slightly describe differences in GCC and 
>> LLVM.
>> I compared options support by both and:
>>
>> UBSAN:
>>
>> 1)
>> gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-arg’
>> gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-assign’
>> gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-return’
> 
> I believe those are for diagnostic of some Objective-C addition, some
> _Nullable and _Nonnull keywords on (pointer) types.  Seems LLVM supports it 
> as an
> extension even for C/C++.

Ok.

> 
>> I guess it's covered by -fsanitize=nonnull-attribute and 
>> -fsanitize=returns-nonnull-attribute.
>> One can't have in GCC a local variable with non-null attribute 
>> (nullability-assign), right?
>>
>> 2) unsigned-integer-overflow
>>
>> As documented, not being a real UBSAN. Do we want that or seen as not useful?
> 
> This one is not implemented on purpose, it doesn't make any sense to me.

Works for me.

> 
>> 3) function
>>
>> Indirect function pointer comparison using RTTI in C++. Would it be useful? 
>> Ideas?
> 
> Dunno what this one is about.

$ cat indir.cpp 
void foo(void) { }

struct A
{
  static int bar(void) { return 0; }
};

void call (int (*f) (void))
{
  f();
}

int main()
{
  A a;
  call ((int (*) (void)));
}

$ ~/bin/llvm/bin/clang++ -fsanitize=function /tmp/indir.cpp && ./a.out
/tmp/indir.cpp:10:3: runtime error: call to function foo() through pointer to 
incorrect function type 'int (*)()'
(/tmp/a.out+0x42b9b0): note: foo() defined here

> 
> 4) builtin
> 
> Which I've posted yesterday patch for.

I noticed, thanks!

> 
>> ASAN:
>>
>> For ASAN, there's quite up-to-date page: 
>> https://github.com/google/sanitizers/wiki/AddressSanitizerClangVsGCC-(5.0-vs-7.1)
>>
>> The page is quite up-to-date. Currently we should cover all what LLVM 
>> supports. Am I right? Or is there any interesting
>> feature we miss?
> 
> As I said on IRC, we probably should be redirecting at sanopt or asan pass
> __builtin_memcpy etc. calls to __asan_memcpy etc.

I'll take a look.

Martin

> 
>   Jakub
> 



Re: [PATCH] Simplify floating point comparisons

2017-10-18 Thread Prathamesh Kulkarni
On 17 October 2017 at 21:58, Wilco Dijkstra  wrote:
> This patch implements some of the optimizations discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Simplify (C / x > 0.0) into x > 0.0.
>
> If C is negative the comparison is reversed.
>
> Simplify (x * C1) > C2 into x > (C2 / C1).
>
> Again, if C1 is negative the comparison is reversed.
> Both transformations are only done with -funsafe-math-optimizations,
> the constant is non-zero, and not a NaN.
>
> OK for commit?
>
> ChangeLog
> 2017-10-17  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Simplify floating point comparisons.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/associate_comparison_1.c: New test.
> --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> e58a65af59b44a6b82ed8705f62966c5e6f251ac..cb48f079b4a310272e49cc319a1b3b0ff2023ba4
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -352,6 +352,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(rdiv @0 (rdiv:s @1 @2))
> (mult (rdiv @0 @1) @2)))
>
> +(if (flag_unsafe_math_optimizations)
> +  /* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
> +  (for op (lt le gt ge)
> +   neg_op (gt ge lt le)
> +(simplify
> +  (op (rdiv REAL_CST@0 @1) real_zerop@2)
> +  (switch
> +   (if (real_less (, TREE_REAL_CST_PTR (@0)))
> +   (op @1 @2))
> +   /* For C < 0, use the inverted operator.  */
> +   (if (real_less (TREE_REAL_CST_PTR (@0), ))
> +   (neg_op @1 @2))
> +
>  /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
>  (for div (trunc_div ceil_div floor_div round_div exact_div)
>   (simplify
> @@ -3546,6 +3559,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (rdiv @2 @1))
> (rdiv (op @0 @2) @1)))
>
> + (for cmp (lt le gt ge)
> +  neg_cmp (gt ge lt le)
> +  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
> +  (simplify
> +   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
> +   (with
> +{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
> +(if (tem)
> + (switch
> +  (if (real_less (, TREE_REAL_CST_PTR (@1)))
> +   (cmp @0 { tem; }))
> +  (if (real_less (TREE_REAL_CST_PTR (@1), ))
> +   (neg_cmp @0 { tem; })))
> +
>   /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
>   (for root (SQRT CBRT)
>(simplify
> diff --git a/gcc/testsuite/gcc.dg/associate_comparison_1.c 
> b/gcc/testsuite/gcc.dg/associate_comparison_1.c
> new file mode 100644
> index 
> ..d051f052e13812c91cbd2d559bf2af8fae128ee1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/associate_comparison_1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized" } */
> +
> +int
> +cmp_mul_1 (float x)
> +{
> +  return x * 3 <= 100;
> +}
> +
> +int
> +cmp_mul_2 (float x)
> +{
> +  return x * -5 > 100;
> +}
> +
> +int
> +div_cmp_1 (float x, float y)
> +{
> +  return x / 3 <= y;
> +}
> +
> +int
> +div_cmp_2 (float x, float y)
> +{
> +  return x / 3 <= 1;
> +}
> +
> +int
> +inv_cmp (float x)
> +{
> +  return 5 / x >= 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */
Hi Wilco,
Since the test checks for div operator, I wonder if it would be better
to scan for rdiv_expr in raw optimized dump instead ?

Thanks,
Prathamesh
>


Re: [PATCH PR/82546] tree node size

2017-10-18 Thread Olivier Hainque

> On Oct 17, 2017, at 16:10 , Nathan Sidwell  wrote:
> 
> On 10/17/2017 05:26 AM, Richard Biener wrote:
> 
>> Sorry for not looking at the patch before replying.  The patch looks ok
>> but shouldn't LANG_TYPE be also handled by the FE?  LANG_TYPE itself
>> is an odd beast if I may say that - it's only used by the C++ and Ada FEs
>> and the Ada FE does only
> 
> I agree.  I think LANG_TYPE may be from when there were no FE-specific nodes. 
>  It should probably be killed and resurrected as appropriate FE nodes.
> 
> Olivier, as an ADA person, is that something that could be done?

I'd think so. LANG_TYPE is treated specially in several
places and Ada debug types are pretty sensitive so this would
require caution but I don't see/know-of obvious reasons why this
couldn't be done.

Eric and Pierre-Marie (cc'ed) might have more precise insights.

Olivier



Re: [PATCH, testsuite] Add dg-require-stack-size

2017-10-18 Thread Tom de Vries

On 10/17/2017 06:14 PM, Mike Stump wrote:

On Oct 16, 2017, at 3:16 AM, Tom de Vries  wrote:


I noticed gcc.dg/tree-ssa/ldist-27.c failing for nvptx due to a too large stack 
size.



OK for trunk?


Hum.  There is an existing mechanism (find-grep STACK_SIZE) in the tree to 
handle the same issue.  Did you consider using it?


Yes (As I mentioned, I considered using dg-add-options stack_size, which 
sets STACK_SIZE).




I think I like the, trim the test case down solution better and STACK_SIZE can 
drive that.  Maybe a tree-saa person to weigh in on the test case.



I see. I dropped the test-case from this version of the patch.


But, if the optimization pass likes large things, certainly the patch is Ok.



I've used the new directive on existing test-cases that use STACK_SIZE 
but do no trimming, replacing f.i.:

...
#define STACK_REQUIREMENT (4 * 4 + 256)
#if defined (STACK_SIZE) && STACK_SIZE < STACK_REQUIREMENT
main () { exit (0); }
#else
/* program */
#endif
...
with:
...
/* { dg-require-stack-size "4 * 4 + 256" } */

/* program */
...

I'll commit after testing, unless there are further objections.

Thanks,
- Tom
Add dg-require-stack-size

2017-10-16  Tom de Vries  

	* lib/target-supports-dg.exp (dg-require-stack-size): New proc.
	* gcc.c-torture/execute/20030209-1.c: Use dg-require-stack-size.
	* gcc.c-torture/execute/20040805-1.c: Same.
	* gcc.c-torture/execute/920410-1.c: Same.
	* gcc.c-torture/execute/921113-1.c: Same.
	* gcc.c-torture/execute/921208-2.c: Same.
	* gcc.c-torture/execute/comp-goto-1.c: Same.
	* gcc.c-torture/execute/pr20621-1.c: Same.
	* gcc.c-torture/execute/pr28982b.c: Same.
	* gcc.dg/tree-prof/comp-goto-1.c: Same.

	* doc/sourcebuild.texi (Test Directives, Variants of
	dg-require-support): Add dg-require-stack-size.

---
 gcc/doc/sourcebuild.texi  |  3 +++
 gcc/testsuite/gcc.c-torture/execute/20030209-1.c  | 16 +---
 gcc/testsuite/gcc.c-torture/execute/20040805-1.c  |  4 ++--
 gcc/testsuite/gcc.c-torture/execute/920410-1.c|  8 ++--
 gcc/testsuite/gcc.c-torture/execute/921113-1.c|  8 +---
 gcc/testsuite/gcc.c-torture/execute/921208-2.c|  9 +
 gcc/testsuite/gcc.c-torture/execute/comp-goto-1.c |  4 ++--
 gcc/testsuite/gcc.c-torture/execute/pr20621-1.c   |  7 ++-
 gcc/testsuite/gcc.c-torture/execute/pr28982b.c|  6 +-
 gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c  |  4 ++--
 gcc/testsuite/lib/target-supports-dg.exp  | 15 +++
 11 files changed, 32 insertions(+), 52 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index a2f0429..7d6d4a3 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2358,6 +2358,9 @@ Skip the test if the target does not support the @code{-fstack-check}
 option.  If @var{check} is @code{""}, support for @code{-fstack-check}
 is checked, for @code{-fstack-check=("@var{check}")} otherwise.
 
+@item dg-require-stack-size @var{size}
+Skip the test if the target does not support a stack size of @var{size}.
+
 @item dg-require-visibility @var{vis}
 Skip the test if the target does not support the @code{visibility} attribute.
 If @var{vis} is @code{""}, support for @code{visibility("hidden")} is
diff --git a/gcc/testsuite/gcc.c-torture/execute/20030209-1.c b/gcc/testsuite/gcc.c-torture/execute/20030209-1.c
index 8f076ec..52f71ec 100644
--- a/gcc/testsuite/gcc.c-torture/execute/20030209-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/20030209-1.c
@@ -1,12 +1,5 @@
-/* { dg-add-options stack_size } */
+/* { dg-require-stack-size "8*100*100" } */
 
-#ifdef STACK_SIZE
-#if STACK_SIZE < 8*100*100
-#define SKIP
-#endif
-#endif
-
-#ifndef SKIP
 double x[100][100];
 int main ()
 {
@@ -18,10 +11,3 @@ int main ()
 abort ();
   exit (0);
 }
-#else
-int
-main ()
-{
-  exit (0);
-}
-#endif
diff --git a/gcc/testsuite/gcc.c-torture/execute/20040805-1.c b/gcc/testsuite/gcc.c-torture/execute/20040805-1.c
index d3208d6..f311092 100644
--- a/gcc/testsuite/gcc.c-torture/execute/20040805-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/20040805-1.c
@@ -1,6 +1,6 @@
-/* { dg-add-options stack_size } */
+/* { dg-require-stack-size "0x12000" } */
 
-#if __INT_MAX__ < 32768 || (defined(STACK_SIZE) && STACK_SIZE < 0x12000)
+#if __INT_MAX__ < 32768
 int main () { exit (0); }
 #else
 int a[2] = { 2, 3 };
diff --git a/gcc/testsuite/gcc.c-torture/execute/920410-1.c b/gcc/testsuite/gcc.c-torture/execute/920410-1.c
index 44a72bd..daeff5e 100644
--- a/gcc/testsuite/gcc.c-torture/execute/920410-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/920410-1.c
@@ -1,8 +1,4 @@
-/* { dg-add-options stack_size } */
+/* { dg-require-stack-size "4 * 4 + 256" } */
 
-#define STACK_REQUIREMENT (4 * 4 + 256)
-#if defined (STACK_SIZE) && STACK_SIZE < STACK_REQUIREMENT
-main () { exit (0); }
-#else
 main(){int d[4];d[0]=0;exit(0);}
-#endif
+
diff --git a/gcc/testsuite/gcc.c-torture/execute/921113-1.c 

Re: [PATCH 2/2] S/390: Do not end groups after fallthru edge

2017-10-18 Thread Robin Dapp
> Preserving the sched state across basic blocks for your case works only if 
> the BBs are traversed
> with the fall through edges coming first. Is that the case? We probably 
> should have a description
> for s390_last_sched_state stating this.

Committed as attached with an additional comment and a check for >= z13.

Regards
 Robin
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index c1a144e..6f1e793 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -83,6 +83,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "symbol-summary.h"
 #include "ipa-prop.h"
 #include "ipa-fnsummary.h"
+#include "sched-int.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -14346,6 +14347,28 @@ s390_z10_prevent_earlyload_conflicts (rtx_insn **ready, int *nready_p)
   ready[0] = tmp;
 }
 
+/* Returns TRUE if BB is entered via a fallthru edge and all other
+   incoming edges are less than unlikely.  */
+static bool
+s390_bb_fallthru_entry_likely (basic_block bb)
+{
+  edge e, fallthru_edge;
+  edge_iterator ei;
+
+  if (!bb)
+return false;
+
+  fallthru_edge = find_fallthru_edge (bb->preds);
+  if (!fallthru_edge)
+return false;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+if (e != fallthru_edge
+	&& e->probability >= profile_probability::unlikely ())
+  return false;
+
+  return true;
+}
 
 /* The s390_sched_state variable tracks the state of the current or
the last instruction group.
@@ -14354,7 +14377,7 @@ s390_z10_prevent_earlyload_conflicts (rtx_insn **ready, int *nready_p)
3 the last group is complete - normal insns
4 the last group was a cracked/expanded insn */
 
-static int s390_sched_state;
+static int s390_sched_state = 0;
 
 #define S390_SCHED_STATE_NORMAL  3
 #define S390_SCHED_STATE_CRACKED 4
@@ -14764,7 +14787,21 @@ s390_sched_init (FILE *file ATTRIBUTE_UNUSED,
 {
   last_scheduled_insn = NULL;
   memset (last_scheduled_unit_distance, 0, MAX_SCHED_UNITS * sizeof (int));
-  s390_sched_state = 0;
+
+  /* If the next basic block is most likely entered via a fallthru edge
+ we keep the last sched state.  Otherwise we start a new group.
+ The scheduler traverses basic blocks in "instruction stream" ordering
+ so if we see a fallthru edge here, s390_sched_state will be of its
+ source block.
+
+ current_sched_info->prev_head is the insn before the first insn of the
+ block of insns to be scheduled.
+ */
+  rtx_insn *insn = current_sched_info->prev_head
+? NEXT_INSN (current_sched_info->prev_head) : NULL;
+  basic_block bb = insn ? BLOCK_FOR_INSN (insn) : NULL;
+  if (s390_tune < PROCESSOR_2964_Z13 || !s390_bb_fallthru_entry_likely (bb))
+s390_sched_state = 0;
 }
 
 /* This target hook implementation for TARGET_LOOP_UNROLL_ADJUST calculates


Re: [RFC] Sanitizers difference in between GCC and LLVM

2017-10-18 Thread Jakub Jelinek
On Wed, Oct 18, 2017 at 11:00:30AM +0200, Martin Liška wrote:
> Hi.
> 
> I would like to use this thread to slightly describe differences in GCC and 
> LLVM.
> I compared options support by both and:
> 
> UBSAN:
> 
> 1)
> gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-arg’
> gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-assign’
> gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-return’

I believe those are for diagnostic of some Objective-C addition, some
_Nullable and _Nonnull keywords on (pointer) types.  Seems LLVM supports it as 
an
extension even for C/C++.

> I guess it's covered by -fsanitize=nonnull-attribute and 
> -fsanitize=returns-nonnull-attribute.
> One can't have in GCC a local variable with non-null attribute 
> (nullability-assign), right?
> 
> 2) unsigned-integer-overflow
> 
> As documented, not being a real UBSAN. Do we want that or seen as not useful?

This one is not implemented on purpose, it doesn't make any sense to me.

> 3) function
> 
> Indirect function pointer comparison using RTTI in C++. Would it be useful? 
> Ideas?

Dunno what this one is about.

4) builtin

Which I've posted yesterday patch for.

> ASAN:
> 
> For ASAN, there's quite up-to-date page: 
> https://github.com/google/sanitizers/wiki/AddressSanitizerClangVsGCC-(5.0-vs-7.1)
> 
> The page is quite up-to-date. Currently we should cover all what LLVM 
> supports. Am I right? Or is there any interesting
> feature we miss?

As I said on IRC, we probably should be redirecting at sanopt or asan pass
__builtin_memcpy etc. calls to __asan_memcpy etc.

Jakub


[RFC] Sanitizers difference in between GCC and LLVM

2017-10-18 Thread Martin Liška
Hi.

I would like to use this thread to slightly describe differences in GCC and 
LLVM.
I compared options support by both and:

UBSAN:

1)
gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-arg’
gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-assign’
gcc: error: unrecognized argument to -fsanitize= option: ‘nullability-return’

I guess it's covered by -fsanitize=nonnull-attribute and 
-fsanitize=returns-nonnull-attribute.
One can't have in GCC a local variable with non-null attribute 
(nullability-assign), right?

2) unsigned-integer-overflow

As documented, not being a real UBSAN. Do we want that or seen as not useful?

3) function

Indirect function pointer comparison using RTTI in C++. Would it be useful? 
Ideas?

ASAN:

For ASAN, there's quite up-to-date page: 
https://github.com/google/sanitizers/wiki/AddressSanitizerClangVsGCC-(5.0-vs-7.1)

The page is quite up-to-date. Currently we should cover all what LLVM supports. 
Am I right? Or is there any interesting
feature we miss?

Thanks for ideas,
Martin




Re: [Patch, fortran] PR82550 - program using submodules fails to link

2017-10-18 Thread Paul Richard Thomas
Dear Jerry,

Thanks. I Committed as revision 253848 a still simpler patch that
achieves the same end. Please see the attached and the ChangeLogs
below.

Regards

Paul

2017-10-18  Paul Thomas  

PR fortran/82550
* trans_decl.c (gfc_get_symbol_decl): Procedure symbols that
have the 'used_in_submodule' attribute should be processed by
'gfc_get_extern_function_decl'.

2017-10-18  Paul Thomas  

PR fortran/82550
* gfortran.dg/submodule_30.f08 : New test.

On 18 October 2017 at 01:52, Jerry DeLisle  wrote:
> On 10/17/2017 11:33 AM, Paul Richard Thomas wrote:
>> The attached patch has a comment that explains what is going on.
>>
>> Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 7-branch?
>>
>
> Yes, looks OK for both. Thanks.
>
> Jerry



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
Index: /home/pault/svn/trunk/gcc/fortran/trans-decl.c
===
*** /home/pault/svn/trunk/gcc/fortran/trans-decl.c  (revision 253748)
--- /home/pault/svn/trunk/gcc/fortran/trans-decl.c  (working copy)
*** gfc_get_symbol_decl (gfc_symbol * sym)
*** 1670,1676 
  {
/* Catch functions. Only used for actual parameters,
 procedure pointers and procptr initialization targets.  */
!   if (sym->attr.use_assoc || sym->attr.intrinsic
  || sym->attr.if_source != IFSRC_DECL)
{
  decl = gfc_get_extern_function_decl (sym);
--- 1670,1678 
  {
/* Catch functions. Only used for actual parameters,
 procedure pointers and procptr initialization targets.  */
!   if (sym->attr.use_assoc
! || sym->attr.used_in_submodule
! || sym->attr.intrinsic
  || sym->attr.if_source != IFSRC_DECL)
{
  decl = gfc_get_extern_function_decl (sym);
Index: /home/pault/svn/trunk/gcc/testsuite/gfortran.dg/submodule_30.f08
===
*** /home/pault/svn/trunk/gcc/testsuite/gfortran.dg/submodule_30.f08
(nonexistent)
--- /home/pault/svn/trunk/gcc/testsuite/gfortran.dg/submodule_30.f08
(working copy)
***
*** 0 
--- 1,42 
+ ! { dg-do run }
+ !
+ ! Test the fix for PR82550 in which the reference to 'p' in 'foo'
+ ! was not being correctly handled.
+ !
+ ! Contributed by Reinhold Bader  
+ !
+ module m_subm_18_pos
+   implicit none
+   integer :: i = 0
+   interface
+ module subroutine foo(fun_ptr)
+   procedure(p), pointer, intent(out) :: fun_ptr
+ end subroutine
+   end interface
+ contains
+   subroutine p()
+ i = 1
+   end subroutine p
+ end module m_subm_18_pos
+ submodule (m_subm_18_pos) subm_18_pos
+ implicit none
+ contains
+ module subroutine foo(fun_ptr)
+   procedure(p), pointer, intent(out) :: fun_ptr
+   fun_ptr => p
+ end subroutine
+ end submodule
+ program p_18_pos
+   use m_subm_18_pos
+   implicit none
+   procedure(), pointer :: x
+   call foo(x)
+   call x()
+   if (i == 1) then
+  write(*,*) 'OK'
+   else
+  write(*,*) 'FAIL'
+  call abort
+   end if
+ end program p_18_pos
+ 


Re: [PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598, take 2)

2017-10-18 Thread Jakub Jelinek
On Wed, Oct 18, 2017 at 10:17:49AM +0200, Richard Biener wrote:
> Works for me but as said, why's the linker even caring about
> .note.GNU-stack in objects that do not contain executable code?

The linker simply collects the notes from all *.o files.  One case is
when none of the objects have the notes, then it is considered to be
the legacy state, another when then there is at least one object with that
note, then any object without the note or with X note forces the result
to be RWE, otherwise if all objects have non-X note the result is RW.
The linker doesn't have information what all kinds of sections could contain
trampolines.

If I were to design that stuff now, I'd probably do it differently, but
it behaves this way for 14 years already...

> When we link an object with just .rodata and no proper note will
> the final executable also have an executable stack?

If any other object in the link contains the note, yes.

Jakub


[Committed] S/390: Fix vec-cmp-2 testcase

2017-10-18 Thread Andreas Krebbel
The functions all call foo and therefore need a stack frame what makes
them subject to shrink wrapping.  Also all the additional instructions
in the function body makes it fragile wrt instruction scheduling.  Just
set a global variable instead to circumvent this.

Committed to mainline.

gcc/testsuite/ChangeLog:

2017-10-18  Andreas Krebbel  

* gcc.target/s390/zvector/vec-cmp-2.c
(all_eq_double, all_ne_double, all_gt_double)
(all_lt_double, all_ge_double, all_le_double)
(any_eq_double, any_ne_double, any_gt_double)
(any_lt_double, any_ge_double, any_le_double)
(all_eq_int, all_ne_int, all_gt_int)
(all_lt_int, all_ge_int, all_le_int)
(any_eq_int, any_ne_int, any_gt_int)
(any_lt_int, any_ge_int, any_le_int): Set global variable instead
of calling foo().  Fix return type.
---
 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c | 98 +++
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c 
b/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c
index 0711f9c..1e63def 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c
@@ -7,197 +7,197 @@
 
 #include 
 
-extern void foo (void);
+int g = 1;
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 all_eq_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_all_eq (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
all_eq_double:\n\tvfcedbs\t%v\[0-9\]*,%v24,%v26\n\tjne 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 all_ne_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_all_ne (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
all_ne_double:\n\tvfcedbs\t%v\[0-9\]*,%v24,%v26\n\tjle 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 all_gt_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_all_gt (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
all_gt_double:\n\tvfchdbs\t%v\[0-9\]*,%v24,%v26\n\tjne 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 all_lt_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_all_lt (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
all_lt_double:\n\tvfchdbs\t%v\[0-9\]*,%v26,%v24\n\tjne 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 all_ge_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_all_ge (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
all_ge_double:\n\tvfchedbs\t%v\[0-9\]*,%v24,%v26\n\tjne 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 all_le_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_all_le (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
all_le_double:\n\tvfchedbs\t%v\[0-9\]*,%v26,%v24\n\tjne 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 any_eq_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_any_eq (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
any_eq_double:\n\tvfcedbs\t%v\[0-9\]*,%v24,%v26\n\tjnle 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 any_ne_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_any_ne (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
any_ne_double:\n\tvfcedbs\t%v\[0-9\]*,%v24,%v26\n\tje 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 any_gt_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_any_gt (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
any_gt_double:\n\tvfchdbs\t%v\[0-9\]*,%v24,%v26\n\tjnle 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 any_lt_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_any_lt (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
any_lt_double:\n\tvfchdbs\t%v\[0-9\]*,%v26,%v24\n\tjnle 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 any_ge_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_any_ge (a, b), 1))
-foo ();
+g = 2;
 }
 /* { dg-final { scan-assembler-times 
any_ge_double:\n\tvfchedbs\t%v\[0-9\]*,%v24,%v26\n\tjnle 1 } } */
 
-int __attribute__((noinline,noclone))
+void __attribute__((noinline,noclone))
 any_le_double (vector double a, vector double b)
 {
   if (__builtin_expect (vec_any_le (a, b), 1))
-foo ();

Re: [PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598)

2017-10-18 Thread Jakub Jelinek
On Wed, Oct 18, 2017 at 10:15:21AM +0200, Richard Biener wrote:
> On Wed, 18 Oct 2017, Jakub Jelinek wrote:
> > When creating lto debugobj, we copy over just the debug sections,
> > and from the lack of .note.GNU-stack section then on various targets
> > the linker implies RWE PT_GNU_STACK segment header.
> 
> Uh.  But those objects don't even have a .text section...
> 
> > Fixed by copying over also the .note.GNU-stack section if present.
> > It is not 100% perfect solution if .note.GNU-stack in the original
> > indicates executable stack, in the debugobj we really don't need
> > it, so could get away with clearing the SHF_EXECINSTR bit, but in reality
> > it shouldn't be that bad, if the source had trampolines, then likely the
> > LTO objects will have them too.
> > 
> > Tested on x86_64-linux, ok for trunk?
> 
> Can't we simply always append a .note.GNU-stack to indicate a

No.  .note.GNU-stack is only added on some targets, on other targets it
isn't, so if the linker will see the note on some target where it isn't
normally seen, it will force different behavior than it used to have
normally upon all the other objects.

> non-executable stack on debug objects?  Or as you say clear
> SHF_EXECINSTR (whatever that means)?

See the second patch I've posted which does that.

Jakub


Re: [PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598, take 2)

2017-10-18 Thread Richard Biener
On Wed, 18 Oct 2017, Jakub Jelinek wrote:

> On Wed, Oct 18, 2017 at 09:24:04AM +0200, Jakub Jelinek wrote:
> > When creating lto debugobj, we copy over just the debug sections,
> > and from the lack of .note.GNU-stack section then on various targets
> > the linker implies RWE PT_GNU_STACK segment header.
> > 
> > Fixed by copying over also the .note.GNU-stack section if present.
> > It is not 100% perfect solution if .note.GNU-stack in the original
> > indicates executable stack, in the debugobj we really don't need
> > it, so could get away with clearing the SHF_EXECINSTR bit, but in reality
> > it shouldn't be that bad, if the source had trampolines, then likely the
> > LTO objects will have them too.
> 
> This updated version does that clearing.
> 
> Tested with
> cat pr82598-1.c
> extern void bar (void (*) (void));
> 
> int
> main ()
> {
>   volatile int i = 0;
>   auto void foo (void) { i++; }
>   bar (foo);
>   return 0;
> }
> cat pr82598-2.c
> void
> bar (void (*fn) (void))
> {
>   fn ();
>   fn ();
> }
> ./xgcc -B ./ -g -O3 -flto -ffat-lto-objects -fuse-linker-plugin -o 
> pr82598{-1,-1.c,-2.c} -save-temps -v 2>&1 | less
> readelf -Wl pr82598-1 | grep STACK
> and readelf -WS on the debugobj objects mentioned in the -v output.
> While LTO inlines it, it keeps the trampoline in, so the second
> part of the patch doesn't change anything right now, but guess it would
> at least if the function with the trampoline is optimized away.

Works for me but as said, why's the linker even caring about
.note.GNU-stack in objects that do not contain executable code?
When we link an object with just .rodata and no proper note will
the final executable also have an executable stack?

Richard.

> 2017-10-18  Jakub Jelinek  
> 
>   PR lto/82598
>   * simple-object.c (handle_lto_debug_sections): Copy over also
>   .note.GNU-stack section with unchanged name.
>   * simple-object-elf.c (SHF_EXECINSTR): Define.
>   (simple_object_elf_copy_lto_debug_section): Drop SHF_EXECINSTR bit
>   on .note.GNU-stack section.
> 
> --- libiberty/simple-object.c.jj  2017-09-01 09:27:00.0 +0200
> +++ libiberty/simple-object.c 2017-10-18 09:15:51.088756028 +0200
> @@ -273,6 +273,9 @@ handle_lto_debug_sections (const char **
>*name = *name + sizeof (".gnu.lto_") - 1;
>return 1;
>  }
> +  /* Copy over .note.GNU-stack section under the same name if present.  */
> +  else if (strcmp (*name, ".note.GNU-stack") == 0)
> +return 1;
>return 0;
>  }
>  
> --- libiberty/simple-object-elf.c.jj  2017-09-15 18:09:31.0 +0200
> +++ libiberty/simple-object-elf.c 2017-10-18 09:45:47.421383452 +0200
> @@ -196,6 +196,7 @@ typedef struct {
>  
>  /* Values for sh_flags field.  */
>  
> +#define SHF_EXECINSTR0x0004  /* Executable section.  */
>  #define SHF_EXCLUDE  0x8000  /* Link editor is to exclude this
>  section from executable and
>  shared library that it builds
> @@ -1403,7 +1404,14 @@ simple_object_elf_copy_lto_debug_section
>flags = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
>  shdr, sh_flags, Elf_Addr);
>if (ret == 0)
> - flags &= ~SHF_EXCLUDE;
> + {
> +   /* The debugobj doesn't contain any code, thus no trampolines.
> +  Even when the original object needs trampolines, debugobj
> +  doesn't.  */
> +   if (strcmp (name, ".note.GNU-stack") == 0)
> + flags &= ~SHF_EXECINSTR;
> +   flags &= ~SHF_EXCLUDE;
> + }
>else if (ret == -1)
>   flags = SHF_EXCLUDE;
>ELF_SET_FIELD (type_functions, ei_class, Shdr,
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598)

2017-10-18 Thread Richard Biener
On Wed, 18 Oct 2017, Jakub Jelinek wrote:

> Hi!
> 
> When creating lto debugobj, we copy over just the debug sections,
> and from the lack of .note.GNU-stack section then on various targets
> the linker implies RWE PT_GNU_STACK segment header.

Uh.  But those objects don't even have a .text section...

> Fixed by copying over also the .note.GNU-stack section if present.
> It is not 100% perfect solution if .note.GNU-stack in the original
> indicates executable stack, in the debugobj we really don't need
> it, so could get away with clearing the SHF_EXECINSTR bit, but in reality
> it shouldn't be that bad, if the source had trampolines, then likely the
> LTO objects will have them too.
> 
> Tested on x86_64-linux, ok for trunk?

Can't we simply always append a .note.GNU-stack to indicate a
non-executable stack on debug objects?  Or as you say clear
SHF_EXECINSTR (whatever that means)?

That would also be more portable to non-GNU targets?

After all the note doesn't really tell anything about the debug part
but about the fat part (or the pointless .text section we always emit
in slim objects).

Richard.

> 2017-10-18  Jakub Jelinek  
> 
>   PR lto/82598
>   * simple-object.c (handle_lto_debug_sections): Copy over also
>   .note.GNU-stack section with unchanged name.
> 
> --- libiberty/simple-object.c.jj  2017-09-01 09:27:00.0 +0200
> +++ libiberty/simple-object.c 2017-10-18 09:15:51.088756028 +0200
> @@ -273,6 +273,9 @@ handle_lto_debug_sections (const char **
>*name = *name + sizeof (".gnu.lto_") - 1;
>return 1;
>  }
> +  /* Copy over .note.GNU-stack section under the same name if present.  */
> +  else if (strcmp (*name, ".note.GNU-stack") == 0)
> +return 1;
>return 0;
>  }
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598, take 2)

2017-10-18 Thread Jakub Jelinek
On Wed, Oct 18, 2017 at 09:24:04AM +0200, Jakub Jelinek wrote:
> When creating lto debugobj, we copy over just the debug sections,
> and from the lack of .note.GNU-stack section then on various targets
> the linker implies RWE PT_GNU_STACK segment header.
> 
> Fixed by copying over also the .note.GNU-stack section if present.
> It is not 100% perfect solution if .note.GNU-stack in the original
> indicates executable stack, in the debugobj we really don't need
> it, so could get away with clearing the SHF_EXECINSTR bit, but in reality
> it shouldn't be that bad, if the source had trampolines, then likely the
> LTO objects will have them too.

This updated version does that clearing.

Tested with
cat pr82598-1.c
extern void bar (void (*) (void));

int
main ()
{
  volatile int i = 0;
  auto void foo (void) { i++; }
  bar (foo);
  return 0;
}
cat pr82598-2.c
void
bar (void (*fn) (void))
{
  fn ();
  fn ();
}
./xgcc -B ./ -g -O3 -flto -ffat-lto-objects -fuse-linker-plugin -o 
pr82598{-1,-1.c,-2.c} -save-temps -v 2>&1 | less
readelf -Wl pr82598-1 | grep STACK
and readelf -WS on the debugobj objects mentioned in the -v output.
While LTO inlines it, it keeps the trampoline in, so the second
part of the patch doesn't change anything right now, but guess it would
at least if the function with the trampoline is optimized away.

2017-10-18  Jakub Jelinek  

PR lto/82598
* simple-object.c (handle_lto_debug_sections): Copy over also
.note.GNU-stack section with unchanged name.
* simple-object-elf.c (SHF_EXECINSTR): Define.
(simple_object_elf_copy_lto_debug_section): Drop SHF_EXECINSTR bit
on .note.GNU-stack section.

--- libiberty/simple-object.c.jj2017-09-01 09:27:00.0 +0200
+++ libiberty/simple-object.c   2017-10-18 09:15:51.088756028 +0200
@@ -273,6 +273,9 @@ handle_lto_debug_sections (const char **
   *name = *name + sizeof (".gnu.lto_") - 1;
   return 1;
 }
+  /* Copy over .note.GNU-stack section under the same name if present.  */
+  else if (strcmp (*name, ".note.GNU-stack") == 0)
+return 1;
   return 0;
 }
 
--- libiberty/simple-object-elf.c.jj2017-09-15 18:09:31.0 +0200
+++ libiberty/simple-object-elf.c   2017-10-18 09:45:47.421383452 +0200
@@ -196,6 +196,7 @@ typedef struct {
 
 /* Values for sh_flags field.  */
 
+#define SHF_EXECINSTR  0x0004  /* Executable section.  */
 #define SHF_EXCLUDE0x8000  /* Link editor is to exclude this
   section from executable and
   shared library that it builds
@@ -1403,7 +1404,14 @@ simple_object_elf_copy_lto_debug_section
   flags = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
   shdr, sh_flags, Elf_Addr);
   if (ret == 0)
-   flags &= ~SHF_EXCLUDE;
+   {
+ /* The debugobj doesn't contain any code, thus no trampolines.
+Even when the original object needs trampolines, debugobj
+doesn't.  */
+ if (strcmp (name, ".note.GNU-stack") == 0)
+   flags &= ~SHF_EXECINSTR;
+ flags &= ~SHF_EXCLUDE;
+   }
   else if (ret == -1)
flags = SHF_EXCLUDE;
   ELF_SET_FIELD (type_functions, ei_class, Shdr,


Jakub


Re: [PATCH] Update -ffunction/data-sections documentation

2017-10-18 Thread Sebastian Huber

On 17/10/17 21:39, Sandra Loosemore wrote:


I think the patch is OK with those nits fixed. 


Thanks, for the review. Committed as:

https://gcc.gnu.org/viewcvs/gcc?view=revision=253842

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-18 Thread Richard Biener
On Tue, 17 Oct 2017, Jan Hubicka wrote:

> > On Tue, 17 Oct 2017, Jan Hubicka wrote:
> > 
> > > Hi,
> > > gether/scatter loads tends to be expensive (at least for x86) while we 
> > > now account them
> > > as vector loads/stores which are cheap.  This patch adds vectorizer cost 
> > > entry for these
> > > so this can be modelled more realistically.
> > > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > 
> > Ok.  gather and load is somewhat redundant, likewise
> > scatter and store.  So you might want to change it to just
> > vector_gather and vector_scatter.  Even vector_ is redundant...
> 
> Hehe, comming from outside of vectorizer world, I did not know what
> scatter/gather is and thus I wanted to keep load/store and vec in so it will 
> be
> easier to google for those who will need to fill in the numbers in future :)
> > 
> > Best available implementations manage to hide the vector build
> > cost and just expose the latency of the load(s).  I wonder what
> > Zen does here ;)
> 
> According to Agner's tables, gathers range from 12 ops (vgatherdpd)
> to 66 ops (vpgatherdd).  I assume that CPU needs to do following:
> 
> 1) transfer the offsets sse->ALU unit for address generation (3 cycles
>each, 2 ops)
> 2) do the address calcualtion (2 ops, probably 4 ops because it does not map 
> naturally
>  to AGU)
> 2) do the load (7 cycles each, 2 ops)
> 3) merge results (1 ops)
> 
> so I get 7 ops, not sure what remaining 5 do.
> 
> Agner does not account time, but According to
> http://users.atw.hu/instlatx64/AuthenticAMD0800F11_K17_Zen_InstLatX64.txt the
> gather time ranges from 14 cycles (vgatherpd) to 20 cycles.  Here I guess it 
> is
> 3+1+7+1=12 so it seems to work.
> 
> If you implement gather by hand, you save the SSE->address caluclation path 
> and
> thus you can get faster.

I see.  It looks to me Zen should disable gather/scatter then completely
and we should implement manual gather/scatter code-generation in the
vectorizer (or lower it in vector lowering).  It sounds like they
only implemented it to have "complete" AVX2 support (ISTR scatter
is only in AVX512f).

> > Note the most major source of impreciseness in the cost model
> > is from vec_perm because we lack the information of the
> > permutation mask which means we can't distinguish between
> > cross-lane and intra-lane permutes.
> 
> Besides that we lack information about what operation we do (addition
> or division?) which may be useful to pass down, especially because we do
> have relevant information handy in the x86_cost tables.  So I am thinking
> of adding extra parameter to the hook telling the operation.

Not sure.  The costs are all supposed to be relative to scalar cost
and I fear we get nearer to a GIGO syndrome when adding more information
here ;)

> What info we need to pass for permutations?

The full constant permutation vector ...

Note that I think this particular cost hook isn't the best one.  We've
added TARGET_VECTORIZE_ADD_STMT_COST and friends to be the more
"powerful" ones but unfortunately the vectorizer itself doesn't
always use that and it's somewhat too powerful and at the same time
the vectorizer doesn't provide all information in a convenient way
through the passed stmt_info but the hook would have to reverse-engineer
things (like a permutation mask).  At least the operation code is
readily available here.

I think it only needs some minor refactoring to make the vectorizer
always use the "proper" hook though.

Richard.

> Honza
> > 
> > Richard.
> > 
> > > Honza
> > > 
> > > 2017-10-17  Jan Hubicka  
> > > 
> > >   * target.h (enum vect_cost_for_stmt): Add vec_gather_load and
> > >   vec_scatter_store
> > >   * tree-vect-stmts.c (record_stmt_cost): Make difference between normal
> > >   and scatter/gather ops.
> > > 
> > >   * aarch64/aarch64.c (aarch64_builtin_vectorization_cost): Add
> > >   vec_gather_load and vec_scatter_store.
> > >   * arm/arm.c (arm_builtin_vectorization_cost): Likewise.
> > >   * powerpcspe/powerpcspe.c (rs6000_builtin_vectorization_cost): Likewise.
> > >   * rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Likewise.
> > >   * s390/s390.c (s390_builtin_vectorization_cost): Likewise.
> > >   * spu/spu.c (spu_builtin_vectorization_cost): Likewise.
> > > 
> > > Index: config/aarch64/aarch64.c
> > > ===
> > > --- config/aarch64/aarch64.c  (revision 253789)
> > > +++ config/aarch64/aarch64.c  (working copy)
> > > @@ -8547,9 +8547,10 @@ aarch64_builtin_vectorization_cost (enum
> > >   return fp ? costs->vec_fp_stmt_cost : costs->vec_int_stmt_cost;
> > >  
> > >case vector_load:
> > > +  case vector_gather_load:
> > >   return costs->vec_align_load_cost;
> > >  
> > > -  case vector_store:
> > > +  case vector_scatter_store:
> > >   return costs->vec_store_cost;
> > >  
> > >case vec_to_scalar:
> > > Index: config/arm/arm.c
> > > 

Re: [PATCH v3 1/14] D: The front-end (DMD) language implementation and license.

2017-10-18 Thread Iain Buclaw
On 6 October 2017 at 14:51, Ian Lance Taylor  wrote:
> On Fri, Oct 6, 2017 at 1:34 AM, Iain Buclaw  wrote:
>>
>> Out of curiosity, I did have a look at some of the tops of gofrontend
>> sources this morning.  They are all copyright the Go Authors, and are
>> licensed as BSD.  So I'm not sure if having copyright FSF and
>> distributing under GPL is strictly required.  And from a maintenance
>> point of view, it would be easier to merge in upstream changes as-is
>> without some diff/merging tool.
>
> The GCC steering committee accepted the gofrontend code under a
> non-GPL license with the understanding that the master code would live
> in a separate repository that would be mirrored into the GCC repo (the
> master repository for gofrontend is currently at
> https://go.googlesource.com/gofrontend/).  Personally I don't see a
> problem with doing the same for the D frontend.
>
> Ian

Should I request that maybe Donald from FSF chime in here?  I'd rather
avoid another stalemate on this.

Regards
Iain.


[PATCH] Fix PT_GNU_STACK on LTO compiled binaries with debug info (PR lto/82598)

2017-10-18 Thread Jakub Jelinek
Hi!

When creating lto debugobj, we copy over just the debug sections,
and from the lack of .note.GNU-stack section then on various targets
the linker implies RWE PT_GNU_STACK segment header.

Fixed by copying over also the .note.GNU-stack section if present.
It is not 100% perfect solution if .note.GNU-stack in the original
indicates executable stack, in the debugobj we really don't need
it, so could get away with clearing the SHF_EXECINSTR bit, but in reality
it shouldn't be that bad, if the source had trampolines, then likely the
LTO objects will have them too.

Tested on x86_64-linux, ok for trunk?

2017-10-18  Jakub Jelinek  

PR lto/82598
* simple-object.c (handle_lto_debug_sections): Copy over also
.note.GNU-stack section with unchanged name.

--- libiberty/simple-object.c.jj2017-09-01 09:27:00.0 +0200
+++ libiberty/simple-object.c   2017-10-18 09:15:51.088756028 +0200
@@ -273,6 +273,9 @@ handle_lto_debug_sections (const char **
   *name = *name + sizeof (".gnu.lto_") - 1;
   return 1;
 }
+  /* Copy over .note.GNU-stack section under the same name if present.  */
+  else if (strcmp (*name, ".note.GNU-stack") == 0)
+return 1;
   return 0;
 }
 

Jakub