date:20180725

PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-07-25 Thread Martin Sebor


Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:

In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

  const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin

Re: [PATCH] Add initial version of C++17 header

2018-07-25 Thread Jonathan Wakely


On 25/07/18 21:23 +0100, Jonathan Wakely wrote:

On 25/07/18 12:01 +0100, Jonathan Wakely wrote:

On 24/07/18 22:12 +0100, Jonathan Wakely wrote:

This is missing the synchronized_pool_resource and
unsynchronized_pool_resource classes but is otherwise complete.

This is a new implementation, not based on the existing code in
, but memory_resource and
polymorphic_allocator ended up looking almost the same anyway.

The constant_init kluge in src/c++17/memory_resource.cc is apparently
due to Richard Smith and ensures that the objects are constructed during
constant initialiation phase and not destroyed (because the
constant_init destructor doesn't destroy the union member and the
storage is not reused).

* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include  for C++17.
* include/std/memory_resource: New header.
(memory_resource, polymorphic_allocator, new_delete_resource)
(null_memory_resource, set_default_resource, get_default_resource)
(pool_options, monotonic_buffer_resource): Define.
* src/Makefile.am: Add c++17 directory.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.am: Fix comment.
* src/c++17/Makefile.am: Add makefile for new sub-directory.
* src/c++17/Makefile.in: Generate.
* src/c++17/memory_resource.cc: New.
(newdel_res_t, null_res_t, constant_init, newdel_res, null_res)
(default_res, new_delete_resource, null_memory_resource)
(set_default_resource, get_default_resource): Define.
* testsuite/20_util/memory_resource/1.cc: New test.
* testsuite/20_util/memory_resource/2.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/1.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/release.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
New test.
* testsuite/20_util/polymorphic_allocator/1.cc: New test.
* testsuite/20_util/polymorphic_allocator/resource.cc: New test.
* testsuite/20_util/polymorphic_allocator/select.cc: New test.
* testsuite/util/testsuite_allocator.h (__gnu_test::memory_resource):
Define concrete memory resource for testing.
(__gnu_test::default_resource_mgr): Define RAII helper for changing
default resource.

Tested powerpc64le-linux, committed to trunk.


I missed a change to acinclude.m4 that should have gone with this
patch. Now also committed to trunk.


One of the tests also needs this fix.


And another correction to the same test.

Tested powerpc-ibm-aix7.2.0.0, committed to trunk.


commit 0dbdb55154fee2af4c02e45a373f5a2dc2985856
Author: Jonathan Wakely 
Date:   Thu Jul 26 00:35:57 2018 +0100

PR libstdc++/86676 another alignment fix for test

PR libstdc++/86676
* testsuite/20_util/monotonic_buffer_resource/release.cc: Request
same alignment for post-release allocation.

diff --git a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc
index ac70385961d..8aab4692d52 100644
--- a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc
+++ b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc
@@ -127,7 +127,7 @@ test04()
   VERIFY( mbr.upstream_resource() ==  );
   VERIFY( r.number_of_active_allocations() == 0 );
   // initial buffer should be used again now:
-  p = mbr.allocate(1000);
+  p = mbr.allocate(1000, 16);
   VERIFY( p == p_in_buffer );
   VERIFY( r.allocate_calls == 1 );
 }

[PATCH] RFC: Prototype of "rich vectorization hints" idea

2018-07-25 Thread David Malcolm

This patch is a rough prototype of how GCC could offer what I'm calling
"rich optimization hints" to the user, focussing on vectorization.

The idea is to provide *actionable* information to the user on how GCC
is optimizing their code, and how they could modify their source code
(or command-line options) to help GCC generate better assembler.

Rich optimization hints can contain a mixture of:
* text
* diagrams
* highlighted source locations/ranges,
* proposed patches
etc.

They can be printed to stderr, or saved as part of the JSON optimization
record, so that they can prioritized by code hotness, browsed in an IDE
etc.  The diagrams are printed as ASCII art when printed to stderr, or
serialized in a form from which HTML/SVG can be generated (this last
part is a work-in-progress).

Rich optimization hints are considerably more verbose than diagnostics,
by design.  I anticipate the primary UI being via a report or IDE that
only shows them for the hottest loops in the user's codebase (via the
JSON serialization).

For example, given the following code:

void
my_example (int n, int *a, int *b, int *c)
{
  int i;

  for (i=0; i.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "pretty-print.h"
+#include "diagram.h"
+#include "selftest.h"
+#include "selftest-diagram.h"
+
+/* class text_buffer.  */
+
+/* text_buffer's ctor.  The buffer is initialized to spaces.  */
+
+text_buffer::text_buffer (size wh)
+  : m_wh (wh),
+m_area (wh.get_area ()),
+m_buffer (new cell[m_area])
+{
+  for (int i = 0; i < m_area; i++)
+m_buffer[i] = cell (' ');
+}
+
+/* Print the content of this text_buffer to PP, eliding trailing
+   whitespace on each line.  */
+
+void
+text_buffer::print (pretty_printer *pp) const
+{
+  for (int y = 0; y < m_wh.h; y++)
+{
+  int x;
+  for (x = m_wh.w - 1; x >= 0; x--)
+   if (get_cell (point (x, y)).ch != ' ')
+ break;
+  int rightmost_non_ws = x;
+  for (x = 0; x <= rightmost_non_ws; x++)
+   pp_character (pp, get_cell (point (x,y)).ch);
+  pp_newline (pp);
+}
+}
+
+/* Get the content of XY.  */
+
+cell
+text_buffer::get_cell (point xy) const
+{
+  return m_buffer[xy_to_index (xy)];
+}
+
+/* Set the content of XY to CH.  */
+
+void
+text_buffer::set_cell (point xy, char ch)
+{
+  m_buffer[xy_to_index (xy)] = cell (ch);
+}
+
+/* Write STR as horizontal left-aligned text at XY, so that the first
+   character of STR is at XY.  The buffer must be large enough.  */
+
+void
+text_buffer::left_aligned_text_at (point xy, const char *str)
+{
+  while (char ch = *(str++))
+{
+  set_cell (xy, ch);
+  ++xy.x;
+}
+}
+
+/* Write STR as horizontal right-aligned text at XY, so that the final
+   character of STR is at XY.  The buffer must be large enough.  */
+
+void
+text_buffer::right_aligned_text_at (point xy, const char *str)
+{
+  size_t len = strlen (str);
+  left_aligned_text_at (point (xy.x + 1 - len, xy.y), str);
+}
+
+/* Convert XY to an index within the buffer.  */
+
+int
+text_buffer::xy_to_index (point xy) const
+{
+  gcc_assert (xy.x >= 0);
+  gcc_assert (xy.y >= 0);
+  gcc_assert (xy.x < m_wh.w);
+  gcc_assert (xy.y < m_wh.h);
+  return (xy.y * m_wh.w) + xy.x;
+}
+
+/* class element.  */
+
+/* Print this element to a TEXT_BUFFER that's exactly big enough to hold it,
+   and print the result to PP, eliding trailing whitespace on each line.  */
+
+void
+element::print_to_pp (pretty_printer *pp)
+{
+  size wh = get_requisition ();
+  set_allocation (wh);
+
+  text_buffer buf (wh);
+  print (, point (0,0));
+  buf.print (pp);
+}
+
+/* class text_element : public element.  */
+
+/* text_element's ctor.  */
+
+text_element::text_element (const char *str)
+: m_str (xstrdup (str)),
+  m_len (strlen (str))
+{}
+
+/* text_element's dtor.  */
+
+text_element::~text_element ()
+{
+  free (m_str);
+}
+
+/* Implementation of element::print for text_element.  */
+
+void
+text_element::print (text_buffer *buf, point xy)
+{
+  buf->left_aligned_text_at (xy, m_str);
+}
+
+/* Implementation of element::get_requisition for text_element.  */
+
+size
+text_element::get_requisition ()
+{
+  return size (m_len, 1);
+}
+
+/* Implementation of element::handle_allocation for text_element.  */
+
+void
+text_element::handle_allocation ()
+{
+  /* Empty.  */
+}
+
+/* Implementation of

committed: removed directives ignored by DejaGnu

2018-07-25 Thread Martin Sebor


Aldy pointed out that the runtime test I added in r261705 to
exercise the new strnlen() built-in makes use of directives
that are ignored by the test harness.

I have removed the directives via r262981.

Martin

Re: [PATCH] Add initial version of C++17 header

2018-07-25 Thread Jonathan Wakely


On 25/07/18 12:01 +0100, Jonathan Wakely wrote:

On 24/07/18 22:12 +0100, Jonathan Wakely wrote:

This is missing the synchronized_pool_resource and
unsynchronized_pool_resource classes but is otherwise complete.

This is a new implementation, not based on the existing code in
, but memory_resource and
polymorphic_allocator ended up looking almost the same anyway.

The constant_init kluge in src/c++17/memory_resource.cc is apparently
due to Richard Smith and ensures that the objects are constructed during
constant initialiation phase and not destroyed (because the
constant_init destructor doesn't destroy the union member and the
storage is not reused).

* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include  for C++17.
* include/std/memory_resource: New header.
(memory_resource, polymorphic_allocator, new_delete_resource)
(null_memory_resource, set_default_resource, get_default_resource)
(pool_options, monotonic_buffer_resource): Define.
* src/Makefile.am: Add c++17 directory.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.am: Fix comment.
* src/c++17/Makefile.am: Add makefile for new sub-directory.
* src/c++17/Makefile.in: Generate.
* src/c++17/memory_resource.cc: New.
(newdel_res_t, null_res_t, constant_init, newdel_res, null_res)
(default_res, new_delete_resource, null_memory_resource)
(set_default_resource, get_default_resource): Define.
* testsuite/20_util/memory_resource/1.cc: New test.
* testsuite/20_util/memory_resource/2.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/1.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/release.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
New test.
* testsuite/20_util/polymorphic_allocator/1.cc: New test.
* testsuite/20_util/polymorphic_allocator/resource.cc: New test.
* testsuite/20_util/polymorphic_allocator/select.cc: New test.
* testsuite/util/testsuite_allocator.h (__gnu_test::memory_resource):
Define concrete memory resource for testing.
(__gnu_test::default_resource_mgr): Define RAII helper for changing
default resource.

Tested powerpc64le-linux, committed to trunk.


I missed a change to acinclude.m4 that should have gone with this
patch. Now also committed to trunk.


One of the tests also needs this fix.

Committed to trunk.


commit 4875590bba5e5b77878264870671071016ab7ca2
Author: Jonathan Wakely 
Date:   Wed Jul 25 21:22:19 2018 +0100

PR libstdc++/86676 Do not assume stack buffer is aligned

PR libstdc++/86676
* testsuite/20_util/monotonic_buffer_resource/release.cc: Allow for
buffer being misaligned and so returned pointer not being at start.

diff --git a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc
index 0c7f31789e6..ac70385961d 100644
--- a/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc
+++ b/libstdc++-v3/testsuite/20_util/monotonic_buffer_resource/release.cc
@@ -115,10 +115,12 @@ test04()
   unsigned char buffer[1024];
   std::pmr::monotonic_buffer_resource mbr(buffer, sizeof(buffer), );
   void* p = mbr.allocate(800, 16);
-  VERIFY( p == buffer );
+  VERIFY( p >= buffer && p < (buffer + 16) );
+  void* const p_in_buffer = p;
   VERIFY( r.allocate_calls == 0 );
   p = mbr.allocate(300, 1);
   VERIFY( p != buffer );
+  VERIFY( p != buffer );
   VERIFY( r.allocate_calls == 1 );
   mbr.release();
   VERIFY( r.deallocate_calls == 1 );
@@ -126,7 +128,7 @@ test04()
   VERIFY( r.number_of_active_allocations() == 0 );
   // initial buffer should be used again now:
   p = mbr.allocate(1000);
-  VERIFY( p == buffer );
+  VERIFY( p == p_in_buffer );
   VERIFY( r.allocate_calls == 1 );
 }

Re: Share ebo helper throughout lib

2018-07-25 Thread Jonathan Wakely


On 25/07/18 21:53 +0200, Marc Glisse wrote:

On Wed, 25 Jul 2018, François Dumont wrote:

    It has already been noticed that there are 2 ebo helpers in the 
lib. Here is a patch to use 1.



    * include/bits/ebo_helper.h: New.
    * include/Makefile.am: Add latter.
    * include/Makefile.in: Regenerate.
    * include/bits/hashtable_policy.h: Adapt.
    * include/bits/shared_ptr_base.h: Adapt.

Tested under linux x86_64.

Ok to commit ?


I don't think we support [[no_unique_address]] yet, but assuming we 
soon will and we enable it also for C++03 (at least with the 
__attribute__ syntax and/or in system headers), do you know if some 


Yes, I hope we'll have that soon.

similar helper will still be necessary, with a simpler implementation, 
or if the attribute will magically get rid of it?


We'll be able to replace some uses of EBO with the attribute
(specifically, in std::tuple). In some places we'll want to only apply
the attribute under the same conditions as we currently use the EBO,
because otherwise we'd change the layout ("compressing" the member
using the attribute where we previously didn't compress it).

In some cases that will be OK because it's an internal implementation
detail, or because we can replace e.g. _Sp_counted_deleter with
_Sp_counted_deleter_v2. In other cases we must avoid any layout change
(e.g. std::tuple).

Concretely, we probably don't want to change the layout of the
hashtable types. We could change the layout for shared_ptr
_Sp_counted_xxx types (gaining some additional space savings for final
types that currently can't be EBO'd) as long as we rename them to
avoid the linker trying to combine incompatible definitions. So on
that basis, maybe we don't want to bother changing the _Sp_ebo_helper
for now.

Re: Share ebo helper throughout lib

2018-07-25 Thread Jonathan Wakely


On 25/07/18 21:42 +0200, François Dumont wrote:

Hi

    It has already been noticed that there are 2 ebo helpers in the 
lib. Here is a patch to use 1.



    * include/bits/ebo_helper.h: New.
    * include/Makefile.am: Add latter.
    * include/Makefile.in: Regenerate.
    * include/bits/hashtable_policy.h: Adapt.
    * include/bits/shared_ptr_base.h: Adapt.


I think we want an extra template parameter which is used for a tag
type, to guarantee that the two uses (in hash tables and in shared
ptr) can never conflict and produce ambiguous bases.

i.e.

 template
   struct _Ebo_helper;

and:

using _Sp_ebo_helper = __detail::_Ebo_helper<_Nm, _Tp, _Sp_counted_base<>>;

(for example).

Re: Share ebo helper throughout lib

2018-07-25 Thread Marc Glisse


On Wed, 25 Jul 2018, François Dumont wrote:

    It has already been noticed that there are 2 ebo helpers in the lib. Here 
is a patch to use 1.



    * include/bits/ebo_helper.h: New.
    * include/Makefile.am: Add latter.
    * include/Makefile.in: Regenerate.
    * include/bits/hashtable_policy.h: Adapt.
    * include/bits/shared_ptr_base.h: Adapt.

Tested under linux x86_64.

Ok to commit ?


I don't think we support [[no_unique_address]] yet, but assuming we soon 
will and we enable it also for C++03 (at least with the __attribute__ 
syntax and/or in system headers), do you know if some similar helper will 
still be necessary, with a simpler implementation, or if the attribute 
will magically get rid of it?


(I haven't looked at it at all, the answer may be obvious)

--
Marc Glisse

Share ebo helper throughout lib

2018-07-25 Thread François Dumont


Hi

    It has already been noticed that there are 2 ebo helpers in the 
lib. Here is a patch to use 1.



    * include/bits/ebo_helper.h: New.
    * include/Makefile.am: Add latter.
    * include/Makefile.in: Regenerate.
    * include/bits/hashtable_policy.h: Adapt.
    * include/bits/shared_ptr_base.h: Adapt.

Tested under linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 70db3cb..98c1a6c 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -105,6 +105,7 @@ bits_headers = \
 	${bits_srcdir}/concept_check.h \
 	${bits_srcdir}/cpp_type_traits.h \
 	${bits_srcdir}/deque.tcc \
+	${bits_srcdir}/ebo_helper.h \
 	${bits_srcdir}/enable_special_members.h \
 	${bits_srcdir}/forward_list.h \
 	${bits_srcdir}/forward_list.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 0e1cbe4..35093bc 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -398,6 +398,7 @@ bits_headers = \
 	${bits_srcdir}/concept_check.h \
 	${bits_srcdir}/cpp_type_traits.h \
 	${bits_srcdir}/deque.tcc \
+	${bits_srcdir}/ebo_helper.h \
 	${bits_srcdir}/enable_special_members.h \
 	${bits_srcdir}/forward_list.h \
 	${bits_srcdir}/forward_list.tcc \
diff --git a/libstdc++-v3/include/bits/ebo_helper.h b/libstdc++-v3/include/bits/ebo_helper.h
new file mode 100644
index 000..5b9073a
--- /dev/null
+++ b/libstdc++-v3/include/bits/ebo_helper.h
@@ -0,0 +1,114 @@
+// Ebo helper header -*- C++ -*-
+
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/ebo_helper.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly.
+ *  @headername{unordered_map,unordered_set,memory}
+ */
+
+#ifndef _EBO_HELPER_H
+#define _EBO_HELPER_H 1
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+namespace __detail
+{
+  /**
+   *  Primary class template _Ebo_helper.
+   *
+   *  Helper class using EBO when it is not forbidden (the type is not
+   *  final) and when it is worth it (the type is empty.)
+   */
+  template
+struct _Ebo_helper;
+
+  /// Specialization using EBO.
+  template
+struct _Ebo_helper<_Nm, _Tp, true>
+: private _Tp
+{
+  _Ebo_helper() = default;
+  _Ebo_helper(const _Tp& __tp)
+  : _Tp(__tp)
+  { }
+
+  _Ebo_helper(_Tp&& __tp)
+  : _Tp(std::move(__tp))
+  { }
+
+  template
+	_Ebo_helper(_OtherTp&& __tp)
+	: _Tp(std::forward<_OtherTp>(__tp))
+	{ }
+
+  static const _Tp&
+  _S_cget(const _Ebo_helper& __eboh)
+  { return __eboh; }
+
+  static _Tp&
+  _S_get(_Ebo_helper& __eboh)
+  { return __eboh; }
+};
+
+  /// Specialization not using EBO.
+  template
+struct _Ebo_helper<_Nm, _Tp, false>
+{
+  _Ebo_helper() = default;
+  _Ebo_helper(const _Tp& __tp)
+  : _M_tp(__tp)
+  { }
+
+  _Ebo_helper(_Tp&& __tp)
+  : _M_tp(std::move(__tp))
+  { }
+
+  template
+	_Ebo_helper(_OtherTp&& __tp)
+	: _M_tp(std::forward<_OtherTp>(__tp))
+	{ }
+
+  static const _Tp&
+  _S_cget(const _Ebo_helper& __eboh)
+  { return __eboh._M_tp; }
+
+  static _Tp&
+  _S_get(_Ebo_helper& __eboh)
+  { return __eboh._M_tp; }
+
+private:
+  _Tp _M_tp;
+};
+}
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+
+#endif
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 3ff6b14..62f6fd1 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -34,6 +34,7 @@
 #include 		// for std::tuple, std::forward_as_tuple
 #include 		// for std::uint_fast64_t
 #include 	// for std::min.
+#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -1088,59 +1089,8 @@ namespace __detail
   }
 };
 
-  /**
-   *  Primary class template _Hashtable_ebo_helper.
-

Re: [PATCH] Make strlen range computations more conservative

2018-07-25 Thread Martin Sebor


BUT - for the string_constant and c_strlen functions we are,
in all cases we return something interesting, able to look
at an initializer which then determines that type.  Hopefully.
I think the strlen() folding code when it sets SSA ranges
now looks at types ...?

Consider

struct X { int i; char c[4]; int j;};
struct Y { char c[16]; };

void foo (struct X *p, struct Y *q)
{
  memcpy (p, q, sizeof (struct Y));
  if (strlen ((char *)(struct Y *)p + 4) < 7)
abort ();
}

here the GIMPLE IL looks like

  const char * _1;

   [local count: 1073741825]:
  _5 = MEM[(char * {ref-all})q_4(D)];
  MEM[(char * {ref-all})p_6(D)] = _5;
  _1 = p_6(D) + 4;
  _2 = __builtin_strlen (_1);

and I guess Martin would argue that since p is of type struct X
+ 4 gets you to c[4] and thus strlen of that cannot be larger
than 3.  But of course the middle-end doesn't work like that
and luckily we do not try to draw such conclusions or we
are somehow lucky that for the testcase as written above we do not
(I'm not sure whether Martins changes in this area would derive
such conclusions in principle).


Only if the strlen argument were p->c.


NOTE - we do not know the dynamic type here since we do not know
the dynamic type of the memory pointed-to by q!  We can only
derive that at q+4 there must be some object that we can
validly call strlen on (where Martin again thinks strlen
imposes constrains that memchr does not - sth I do not agree
with from a QOI perspective)


The dynamic type is a murky area.  As you said, above we don't
know whether *p is an allocated object or not.  Strictly speaking,
we would need to treat it as such.  It would basically mean
throwing out all type information and treating objects simply
as blobs of bytes.  But that's not what GCC or other compilers do
either.  For instance, in the modified foo below, GCC eliminates
the test because it assumes that *p and *q don't overlap.  It
does that because they are members of structs of unrelated types
access to which cannot alias.  I.e., not just the type of
the access matters (here int and char) but so does the type of
the enclosing object.  If it were otherwise and only the type
of the access mattered then eliminating the test below wouldn't
be valid (objects can have their stored value accessed by either
an lvalue of a compatible type or char).

  void foo (struct X *p, struct Y *q)
  {
int j = p->j;
q->c[__builtin_offsetof (struct X, j)] = 0;
if (j != p->j)
  __builtin_abort ();
}

Clarifying (and adjusting if necessary) this area is among
the goals of the C object model proposal and the ongoing study
group.  We have been talking about some of these cases there
and trying to come up with ways to let code do what it needs
to do without compromising existing language rules, which was
the consensus position within WG14 when the study group was
formed: i.e., to clarify or reaffirm existing rules and, in
cases of ambiguity or where the standard is unintentionally
overly permissive), favor tighter rules over looser ones.

Martin

[PATCH] [AArch64, Falkor] Adjust Falkor's sign extend reg+reg address cost

2018-07-25 Thread Luis Machado

Adjust Falkor's register_sextend cost from 4 to 3.  This fixes a testsuite
failure in gcc.target/aarch64/extend.c:ldr_sxtw where GCC was generating
a sbfiz instruction rather than a load with sign extension.

No performance changes.

gcc/ChangeLog:

2018-07-25  Luis Machado  

* config/aarch64/aarch64.c (qdf24xx_addrcost_table)
: Set to 3.
---
 gcc/config/aarch64/aarch64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..ea39272 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -329,7 +329,7 @@ static const struct cpu_addrcost_table 
qdf24xx_addrcost_table =
   1, /* pre_modify  */
   1, /* post_modify  */
   3, /* register_offset  */
-  4, /* register_sextend  */
+  3, /* register_sextend  */
   3, /* register_zextend  */
   2, /* imm_offset  */
 };
-- 
2.7.4

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-25 Thread Steve Ellcey

Here is version 3 of my patch to implement the SIMD ABI on Aarch64.
I am having a problem with how to handle a SIMD function calling a
non-SIMD function.  When this happens the SIMD function needs to save
V8 to V23 because it cannot count on the non-SIMD function to save
all 128 bits of these registers.

I thought I had this working in the last patch but as I write test
cases, it appears that it is not working and I am not sure how to
implement it.  I tried adding clobbers in aarch64_expand_call but
that is not working (see code in this patch in aarch64_expand_call).
If I add them to 'call' which is a parallel insn, they are ignored.
If I find the underlying call instruction that is part of the parallel
then the clobbers get added to the instruction but then the call itself
is not recognized with the extra clobbers in place.  I don't think we
want to add new call instructions in aarch64.md to handle the vector
register saves and restores.  Am I trying to add the clobbers in the
wrong place?  Where and when should extra clobbers be added to a call
that is going to clobber more registers than what is indicated by
CALL_USED_REGISTERS?

I suppose I could use TARGET_HARD_REGNO_CALL_PART_CLOBBERED but I would
have to extend it to include the call instruction as an argument so the
the code could determine if the call being made was to a simd or non-simd
function.

Steve Ellcey
sell...@cavium.com


2018-07-25  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_components_for_bb): Check for simd function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (REG_ALLOC_ORDER): New define.
(HONOR_REG_ALLOC_ORDER): Ditto.
(FP_SIMD_SAVED_REGNUM_P): Ditto.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(loadwb_pair_): New instruction.
("storewb_pair_): Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..cc642f5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1405,6 +1414,26 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
+static
+machine_mode aarch64_reg_save_mode (tree fndecl, unsigned regno)
+{
+  return GP_REGNUM_P (regno)
+	   ? E_DImode
+	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
@@ -1499,6 +1528,13 @@ aarch64_is_noplt_call_p (rtx sym)
   return false;
 }
 
+static bool
+aarch64_is_simd_call_p (rtx sym)
+{
+  tree decl = SYMBOL_REF_DECL (sym);
+  return  decl && aarch64_simd_decl_p (decl);
+}
+
 /* Return true if the offsets to a zero/sign-extract operation
represent an expression that matches an extend operation.  The
operands represent the paramters from
@@ -3269,10 +3305,11 @@ aarch64_split_sve_subreg_move (rtx dest, rtx ptrue, rtx src)
 }
 
 static bool
-aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
- tree exp ATTRIBUTE_UNUSED)
+aarch64_function_ok_for_sibcall (tree decl, tree exp

[PATCH] [AArch64, Falkor] Switch to using Falkor-specific vector costs

2018-07-25 Thread Luis Machado

The adjusted vector costs give Falkor a reasonable boost in performance for FP
benchmarks (both CPU2017 and CPU2006) and doesn't change INT benchmarks that
much. About 0.7% for CPU2017 FP and 1.54% for CPU2006 FP.

OK for trunk?

gcc/ChangeLog:

2018-07-25  Luis Machado  

* config/aarch64/aarch64.c (qdf24xx_vector_cost): New.
(qdf24xx_tunings) : Set to qdf24xx_vector_cost.
---
 gcc/config/aarch64/aarch64.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..d443aee 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -430,6 +430,26 @@ static const struct cpu_vector_cost generic_vector_cost =
   1 /* cond_not_taken_branch_cost  */
 };
 
+/* Qualcomm QDF24xx costs for vector insn classes.  */
+static const struct cpu_vector_cost qdf24xx_vector_cost =
+{
+  1, /* scalar_int_stmt_cost  */
+  1, /* scalar_fp_stmt_cost  */
+  1, /* scalar_load_cost  */
+  1, /* scalar_store_cost  */
+  1, /* vec_int_stmt_cost  */
+  3, /* vec_fp_stmt_cost  */
+  2, /* vec_permute_cost  */
+  1, /* vec_to_scalar_cost  */
+  1, /* scalar_to_vec_cost  */
+  1, /* vec_align_load_cost  */
+  1, /* vec_unalign_load_cost  */
+  1, /* vec_unalign_store_cost  */
+  1, /* vec_store_cost  */
+  3, /* cond_taken_branch_cost  */
+  1  /* cond_not_taken_branch_cost  */
+};
+
 /* ThunderX costs for vector insn classes.  */
 static const struct cpu_vector_cost thunderx_vector_cost =
 {
@@ -890,7 +910,7 @@ static const struct tune_params qdf24xx_tunings =
   _extra_costs,
   _addrcost_table,
   _regmove_cost,
-  _vector_cost,
+  _vector_cost,
   _branch_cost,
   _approx_modes,
   4, /* memmov_cost  */
-- 
2.7.4

Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-25 Thread Sudakshina Das


Hi Sam

On 25/07/18 14:08, Sam Tebbs wrote:

On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email still 
apply.




Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" 
"final" } } */


Thanks
Sudi



   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-25 Thread Richard Earnshaw (lists)

On 24/07/18 18:26, Richard Biener wrote:
> So, please make resolve_overloaded_builtin return a no-op on such targets
> which means you can remove the above warning.  Maybe such targets
> shouldn't advertise / initialize the builtins at all?

So I tried to make resolve_overloaded_builtin collapse the builtin
entirely if it's not needed by the machine, transforming

  x = __b_s_s_v (y);

into

  x = y;

but I can't see how to make any side-effects on the optional second
argument hang around.  It's somewhat obscure, but if the user does write

 x = __b_s_s_v (y, z++);

then z++ does still need to be performed.

The problem seems to be that the callers of resolve_overloaded_builtin
expect just a simple value result - they can't, for example, deal with a
statement list and just calling save_expr on the argument isn't enough;
so I can't see an obvious way to force the z++ expression back into the
token stream at this point.

Any ideas?  The alternative seems to be that we must keep the call until
such time as the builtins are lowered during expansion, which pretty
much loses all the benefits you were looking for.

R.

Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-25 Thread Joseph Myers

On Fri, 6 Jul 2018, Segher Boessenkool wrote:

> Version checks are terrible.  This is nothing new.

The key principle behind --with-glibc-version is that you can pass that 
option *when building the static-only inhibit_libc bootstrap compiler 
without having built glibc yet* and it will result in the compiler being 
correctly configured for the specified glibc version, and thus able to 
build glibc binaries identical to those you get from a longer alternating 
sequence of compiler and glibc (headers) builds.

At that point in a bootstrap of a cross toolchain you don't have any 
target glibc headers available (you might have target kernel headers) and 
so have no other way in which the compiler can possibly tell what glibc 
version is in use.

> For cross builds you can just assume it works.  That should work fine here.

We definitely support building a new GCC using a sysroot of an old glibc 
(so that the new GCC can then be used to build binaries that will run on 
old distributions, for example).  (This certainly works and is useful for 
x86_64; I don't assert whether or not it works, or makes sense, for 
powerpc64le.)

Of course in that case the build can examine target headers to determine 
versions and features, but a new GCC release is still likely to be able to 
build a few past glibc release branches if anyone backported the required 
build fixes to those branches, and so the bootstrap case is still useful 
with somewhat older glibc versions.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Make strlen range computations more conservative

2018-07-25 Thread Martin Sebor


One other example I have found in one of the test cases:

char c;

if (strlen() != 0) abort();

this is now completely elided, but why?


Because the only string that can be stored in an array of one
element is the empty string.  Expanding that call to strlen()
is in all likelihood going to result in zero.  The only two
cases when it doesn't are invalid: either the character is
uninitialized (GCC may not see it so it may not warn about
it), or it is initialized to a non-zero value (which makes
it not a string -- I have submitted an enhancement to detect
a subset of these cases).  The cases where the user expects
to be able to read past the end of the character and what
follows are both exceedingly unlikely and also undefined.
So in my view, it is safer to fold the call into zero than
not.

  Is there a code base where

that is used?  I doubt, but why do we care to eliminate something
stupid like that?  If we would emit a warning for that I'm fine with it,
But if we silently remove code like that I don't think that it
will improve anything.  So I ask, where is the code base which
gets an improvement from that optimization?


Jonathan suggested issuing a warning in this case.  That
sounds reasonable to me, but not everyone is in favor of
issuing warnings out of the folder.  (I'm guilty of having
done that in a few important cases despite it.)  I am fully
supportive of enhancing warnings to detect more problems,
but I am opposed to gratuitously removing solutions that
have been put in after a great deal of thought, without as
much as bring them up for discussion.


This work concentrates mostly on avoiding to interfere with code that
actually deserves warnings, but which is not being warned about.


Then help by adding the missing warnings.  It will help drive
improvements to user code and will ultimately lead to greater
efficiency.  Dumbing down the analyses and accommodating
undefined code is not a good way forward.  It will only lead
to a kludgy compiler with hacks for this or that bad practice
and compromise our ability to implement new optimizations (and
detect more bugs).

Martin

Re: [PATCH] Make __resource_adaptor_imp usable with C++17 memory_resource

2018-07-25 Thread Jonathan Wakely


On 24/07/18 14:06 +0100, Jonathan Wakely wrote:

By making the memory_resource base class a template parameter the
__resource_adaptor_imp can be used to adapt an allocator into a
std::pmr::memory_resource instead of experimental::pmr::memory_resource.

No uses for this in the library but somebody might want to do it, and
it costs us nothing to support.

* include/experimental/memory_resource: Adjust comments and
whitespace.
(__resource_adaptor_imp): Add second template parameter for type of
memory resource base class.
(memory_resource): Define default constructor, destructor, copy
constructor and copy assignment operator as defaulted.

Tested powerpc64le-linux, committed to trunk.




commit ce04fa1c00b40a938cc25a264836a2e30149056e
Author: Jonathan Wakely 
Date:   Tue Jul 24 12:24:53 2018 +0100

   Make __resource_adaptor_imp usable with C++17 memory_resource

   By making the memory_resource base class a template parameter the
   __resource_adaptor_imp can be used to adapt an allocator into a
   std::pmr::memory_resource instead of experimental::pmr::memory_resource.

   * include/experimental/memory_resource: Adjust comments and
   whitespace.
   (__resource_adaptor_imp): Add second template parameter for type of
   memory resource base class.
   (memory_resource): Define default constructor, destructor, copy
   constructor and copy assignment operator as defaulted.

diff --git a/libstdc++-v3/include/experimental/memory_resource 
b/libstdc++-v3/include/experimental/memory_resource
index 83379d1367a..7ce64457a11 100644
--- a/libstdc++-v3/include/experimental/memory_resource
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -29,12 +29,12 @@
#ifndef _GLIBCXX_EXPERIMENTAL_MEMORY_RESOURCE
#define _GLIBCXX_EXPERIMENTAL_MEMORY_RESOURCE 1

-#include 
+#include // align, uses_allocator, __uses_alloc
+#include   // pair, experimental::erased_type
+#include // atomic
#include 
-#include 
-#include 
#include 
-#include 
+#include 


I should not have removed  here, it's needed for
std::max_align_t.

Committed to trunk as obvious.


commit 5258907ab3e829b75a70872e9d6f627461c84176
Author: Jonathan Wakely 
Date:   Wed Jul 25 18:20:29 2018 +0100

Add missing header for std::max_align_t

* include/experimental/memory_resource: Include  header.

diff --git a/libstdc++-v3/include/experimental/memory_resource b/libstdc++-v3/include/experimental/memory_resource
index 7ce64457a11..61c9ce0a14a 100644
--- a/libstdc++-v3/include/experimental/memory_resource
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -32,7 +32,8 @@
 #include 			// align, uses_allocator, __uses_alloc
 #include 		// pair, experimental::erased_type
 #include 			// atomic
-#include 
+#include // placement new
+#include 			// max_align_t
 #include 
 #include

[PATCH][Middle-end] disable strcmp/strncmp inlining with O2 below and Os

2018-07-25 Thread Qing Zhao

Hi,

As Wilco suggested, the new added strcmp/strncmp inlining should be only 
enabled with O2 and above.

this is the simple patch for this change.

tested on both X86 and aarch64.

Okay for thunk?

Qing

gcc/ChangeLog:

+2018-07-25  Qing Zhao  
+
+   * builtins.c (inline_expand_builtin_string_cmp): Disable inlining
+   when optimization level is lower than 2 or optimize for size.
+   

gcc/testsuite/ChangeLog:

+2018-07-25  Qing Zhao  
+
+   * gcc.dg/strcmpopt_5.c: Change to O2 to enable the transformation.
+   * gcc.dg/strcmpopt_6.c: Likewise.
+



78809_O2.patch
Description: Binary data

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-25 Thread Umesh Kalappa

Hi,

Any more suggestions or comments on the patch ?

Thank you
~Umesh

On Tue, Jul 24, 2018, 2:08 PM Umesh Kalappa 
wrote:

> Thank you All for the suggestions  and we tried runing the GCC
> testsuite and found that no regression with the fix and also ran the
> our regressions base for conformance with no regress.
>
> Is ok for commit with below  Changelog ?
> +++ libgcc/ChangeLog(working copy)
> @@ -1,3 +1,9 @@
> +2018-07-18  Umesh Kalappa 
> +
> +   PR libgcc/86512
> +   * config/arm/ieee754-df.S :Don't normalise the denormal result.
> +   * config/arm/ieee754-sf.S:Likewise.
> +
> +
> +++ gcc/testsuite/ChangeLog (working copy)
> @@ -1,3 +1,8 @@
> +2018-07-18  Umesh Kalappa 
> +
> +   PR libgcc/86512
> +   * gcc.target/arm/pr86512.c :New test.
> +
>
> On Mon, Jul 23, 2018 at 5:24 PM, Wilco Dijkstra 
> wrote:
> > Umesh Kalappa wrote:
> >
> >> We tested on the SP and yes the problem persist on the SP too and
> >> attached patch will fix the both SP and DP issues for the  denormal
> >> resultant.
> >
> > The patch now looks correct to me (but I can't approve).
> >
> >> We bootstrapped the compiler ,look ok to us with minimal testing ,
> >>
> >> Any floating point test-suite to test for the attached patch ? any
> >> recommendations or inputs  ?
> >
> > Running the GCC regression tests would be required since a bootstrap
> isn't
> > useful for this kind of change. Assuming you use Linux, building and
> running
> > GLIBC with the changed GCC would give additional test coverage as it
> tests
> > all the math library functions.
> >
> > I don't know of any IEEE conformance testsuites in the GNU world, which
> is
> > why I'm suggesting running some targeted and randomized tests. You could
> > use the generic soft-float code in libgcc/soft-fp/adddf3.c to compare
> the outputs.
> >
> >
>  Index: libgcc/config/arm/ieee754-df.S
>  ===
>  --- libgcc/config/arm/ieee754-df.S   (revision 262850)
>  +++ libgcc/config/arm/ieee754-df.S   (working copy)
>  @@ -203,6 +203,7 @@
>   #endif
> 
>   @ Determine how to normalize the result.
>  +@ if result is denormal i.e (exp)=0,then don't normalise the
> result,
> >
> > Use a standard sentence here, eg. like:
> >
> > If exp is zero and the mantissa unnormalized, return a denormal.
> >
> > Wilco
> >
>

Re: [PATCH 00/11] [nvptx] Initial vector length changes

2018-07-25 Thread Cesar Philippidis

On 07/24/2018 01:47 PM, ce...@codesourcery.com wrote:
> From: Cesar Philippidis 
> 
> This patch series contains various cleanups and structural
> reorganizations to the NVPTX BE in preparation for the forthcoming
> variable length vector length enhancements. Tom, in order to make
> these changes easier for you to review, I broke these patches into
> logical components. If approved for trunk, would you like to see these
> patches committed individually, or all together in a single huge
> commit?
> 
> One notable change in this patch set is the partial inclusion of the
> PTX_DEFAULT_RUNTIME_DIM change that I previously placed with the
> libgomp default geometry update patch that I posted a couple of weeks
> ago. I don't want to block this patch series so I included the nvptx
> changes in patch 01.
> 
> It this OK for trunk? I regtested both standalone and offloading
> compiliers. I'm seeing some inconsistencies in the standalone compiler
> results, so I might rerun those just to be safe. But the results using
> nvptx as an offloading compiler came back clean.

On further inspection, the inconsistencies turned out to be isolated in
the c++ tests. The c tests results are clean.

Cesar

Re: [C++ PATCH] Further get_identifier ("string literal") C++ FE caching

2018-07-25 Thread Nathan Sidwell


On 07/18/2018 06:43 PM, Jakub Jelinek wrote:

On Wed, Jul 18, 2018 at 06:00:20PM -0400, Nathan Sidwell wrote:

So cool! Thanks.


Ok for both patches or just this one?


both are ok, sorry for delay (vacation), thanks for doing that!

nathan

--
Nathan Sidwell

[gomp5] Add host teams construct support

2018-07-25 Thread Jakub Jelinek

Hi!

OpenMP 5.0 allows the teams construct, previously required to be strictly
nested (i.e. without any OpenMP construct in between) and even with no user
code in between inside of target construct, also as a host construct that is
not nested in any OpenMP construct at all.  The primary goal is for use
in NUMA setups, where the construct will create a league of teams (threads)
where each binds to one NUMA node and let those teams run pretty much
independently (no synchronization between them, except for the final
reduction processing if required), only wait for the work of all the teams
at the end of the construct.

This patch implements the compiler side of this, and adds a simple (if
ignoring mixing of user POSIX threads and OpenMP like calling #pragma omp
teams from multiple POSIX threads (which is outside of the standard anyway)
even conforming) implementation to the library for now so that it can be
tested.  The implementation just runs the different teams in the same thread
sequentially.

When I get some agreement on the defaults (I believe the default if
num_threads is not specified should be number of NUMA nodes the CPUs in the
allowed set of CPUs belong to (1 if not on a NUMA system); if users specify
something smaller or larger, there are various options and we need to figure
out what is best for users), I'll change it to the final implementation.

Tested on x86_64-linux, committed to gomp-5_0-branch.

2018-07-25  Jakub Jelinek  

* gimple.h (enum gf_mask): Add GF_OMP_TEAMS_HOST.
(struct gimple_statement_omp_taskreg): Add GIMPLE_OMP_TEAMS to
comments.
(struct gimple_statement_omp_single_layout): And remove here.
(struct gomp_teams): Inherit from gimple_statement_omp_taskreg rather
than gimple_statement_omp_single_layout.
(is_a_helper ::test): Allow
GIMPLE_OMP_TEAMS.
(is_a_helper ::test): Likewise.
(gimple_omp_subcode): Formatting fix.
(gimple_omp_teams_child_fn, gimple_omp_teams_child_fn_ptr,
gimple_omp_teams_set_child_fn, gimple_omp_teams_data_arg,
gimple_omp_teams_data_arg_ptr, gimple_omp_teams_set_data_arg,
gimple_omp_teams_host, gimple_omp_teams_set_host): New inline
functions.
* gimple.def (GIMPLE_OMP_TEAMS): Use GSS_OMP_PARALLEL_LAYOUT instead
of GSS_OMP_SINGLE_LAYOUT, adjust comments.
* gimplify.c (enum omp_region_type): Reserve bits 1 and 2 for
auxiliary flags, renumber values of most of ORT_* enumerators,
add ORT_HOST_TEAMS and ORT_COMBINED_HOST_TEAMS enumerators.
(maybe_fold_stmt): Don't fold even in host teams regions.
(gimplify_scan_omp_clauses, gimplify_omp_for): Adjust tests for
ORT_COMBINED_TEAMS.
(gimplify_omp_workshare): Set ort to ORT_HOST_TEAMS or
ORT_COMBINED_HOST_TEAMS if not inside of target construct.  If
host teams, use gimplify_and_return_first etc. for body like
for target or target data constructs, and at the end call
gimple_omp_teams_set_host on the GIMPLE_OMP_TEAMS object.
* omp-builtins.def (BUILT_IN_GOMP_TEAMS_REG): New builtin.
* omp-low.c (is_host_teams_ctx): New function.
(is_taskreg_ctx): Return true also if is_host_teams_ctx.
(scan_sharing_clauses): Don't ignore shared clauses in
is_host_teams_ctx contexts.
(finish_taskreg_scan): Handle GIMPLE_OMP_TEAMS like
GIMPLE_OMP_PARALLEL.
(scan_omp_teams): Handle host teams constructs.
(check_omp_nesting_restrictions): Allow teams with no outer
OpenMP context.  Adjust diagnostics for teams strictly nested into
some explicit OpenMP construct other than target.
(scan_omp_1_stmt) : Temporarily bump
taskreg_nesting_level while scanning host teams construct.
(lower_rec_input_clauses): Don't ignore shared clauses in
is_host_teams_ctx contexts.
(lower_omp_1): Use lower_omp_taskreg instead of lower_omp_teams
for host teams constructs.
* omp-expand.c (expand_teams_call): New function.
(expand_omp_taskreg): Allow GIMPLE_OMP_TEAMS and call
expand_teams_call for it.  Formatting fix.
(expand_omp_synch): For host teams call expand_omp_taskreg.
c/
* c-parser.c (c_parser_omp_teams): Force a BIND_EXPR with BLOCK
around teams body.  Use SET_EXPR_LOCATION.
(c_parser_omp_target): Use SET_EXPR_LOCATION.
cp/
* cp-tree.h (finish_omp_atomic): Add LOC argument.
* parser.c (cp_parser_omp_atomic): Pass pragma_tok->location as
LOC to finish_omp_atomic.
(cp_parser_omp_single): Use SET_EXPR_LOCATION.
(cp_parser_omp_teams): Force a BIND_EXPR with BLOCK around teams
body.
* semantics.c (finish_omp_atomic): Add LOC argument, pass it through
to c_finish_omp_atomic and set it as location of OMP_ATOMIC* trees.
* pt.c (tsubst_expr): Force a BIND_EXPR with BLOCK around teams

Re: [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-07-25 Thread Martin Sebor


On 07/25/2018 08:57 AM, Jakub Jelinek wrote:

On Wed, Jul 25, 2018 at 08:54:13AM -0600, Martin Sebor wrote:

I don't mean for the special value to be used except internally
for the defaults.  Otherwise, users wanting to override the default
will choose a value other than it.  I'm happy to document it in
the .opt file for internal users though.

-1 has the documented effect of disabling the warnings altogether
(-1 is SIZE_MAX) so while I agree that -1 looks better it doesn't
work.  (It would need more significant changes.)


The variable is signed, so -1 is not SIZE_MAX.  Even if -1 disables it, you
could use e.g. -2 or other negative value for the other special case.


The -Wxxx-larger-than=N distinguish three ranges of argument
values (treated as unsigned):

  1.  [0, HOST_WIDE_INT_MAX)
  2.  HOST_WIDE_INT_MAX
  3.  [HOST_WIDE_INT_MAX + 1, Infinity)

(1) implies warnings for allocations in excess of the size.  For
the alloca/VLA warnings it also means warnings for allocations
that may be unbounded.  (This feels like a bit of a wart.)

(2) implies warnings for allocations in excess of PTRDIFF_MAX
only.  For the alloca/VLA warnings it also disables warnings
for allocations that may be unbounded (also a bit of a wart)

(3) isn't treated consistently by all options (yet) but for
the alloca/VLA warnings it means no warnings.  Since
the argument value is stored in signed HOST_WIDE_INT this
range is strictly negative.

Any value from (3) could in theory be made special and used
instead of -1 or HOST_WIDE_INT_MAX as a placeholder for
PTRDIFF_MAX.  But no matter what the choice is, it removes
the value from the usable set in (3) (i.e., it doesn't have
the expected effect of disabling the warning).

I don't see the advantage of picking -2 over any other negative
number.  As inelegant as the current choice of HOST_WIDE_INT_MAX
may be, it seems less arbitrary and less intrusive than picking
a random value from the negative range.

Martin

PS The handling of these ranges isn't consistent across all
the options because they were each developed independently
and without necessarily aiming for it.  I think making them
more consistent would be nice as a followup patch.  I would
expect consistency to be achievable more easily if baking
special cases into the design is kept to a minimum.  It
would also help to remove some existing special cases.
E.g., by introducing a distinct option for the special case
of diagnosing unbounded alloca/VLA allocations and removing
it from -W{alloca,vla}-larger-than=.

Re: [PATCH 1/3] Correct the reported line number in fortran combined OpenACC directives

2018-07-25 Thread Cesar Philippidis

On 07/25/2018 08:32 AM, Marek Polacek wrote:
> On Wed, Jul 25, 2018 at 08:29:17AM -0700, Cesar Philippidis wrote:
>> The fortran FE incorrectly records the line locations of combined acc
>> loop directives when it lowers the construct to gimple. Usually this
>> isn't a problem because the fortran FE is able to report problems with
>> acc loops itself. However, there will be inaccuracies if the ME tries
>> to use those locations.
>>
>> Note that test cases are inconspicuously absent in this patch.
>> However, without this bug fix, -fopt-info-note-omp will report bogus
>> line numbers. This code patch will be tested in a later patch in
>> this series.
>>
>> Is this OK for trunk? I bootstrapped and regtested it on x86_64 with
>> nvptx offloading.
>>
>> Thanks,
>> Cesar
>>
>> 2018-XX-YY  Cesar Philippidis  
>>
>>  gcc/fortran/
>>  * trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
>>  location of combined acc loops.
>>
>> (cherry picked from gomp-4_0-branch r245653)
>>
>> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
>> index f038f4c..e7707d0 100644
>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
>> @@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>>gfc_omp_clauses construct_clauses, loop_clauses;
>>tree stmt, oacc_clauses = NULL_TREE;
>>enum tree_code construct_code;
>> +  location_t loc = input_location;
>>  
>>switch (code->op)
>>  {
>> @@ -3930,12 +3931,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>>else
>>  pushlevel ();
>>stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, _clauses, 
>> NULL);
>> +
>> +  if (CAN_HAVE_LOCATION_P (stmt))
>> +SET_EXPR_LOCATION (stmt, loc);
> 
> This is protected_set_expr_location.

Neat, thanks! This patch includes that correction. Is it ok for trunk
after bootstrapping and regression testing?

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
	location of combined acc loops.

(cherry picked from gomp-4_0-branch r245653)
---
 gcc/fortran/trans-openmp.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c5bf8..b549c682533 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   gfc_omp_clauses construct_clauses, loop_clauses;
   tree stmt, oacc_clauses = NULL_TREE;
   enum tree_code construct_code;
+  location_t loc = input_location;
 
   switch (code->op)
 {
@@ -3929,13 +3930,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
 pblock = 
   else
 pushlevel ();
+
   stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, _clauses, NULL);
+  protected_set_expr_location (stmt, loc);
+
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   else
 poplevel (0, 0);
-  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
-		 oacc_clauses);
+
+  stmt = build2_loc (loc, construct_code, void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (, stmt);
   return gfc_finish_block ();
 }
-- 
2.17.1

Re: [PATCH 1/3] Correct the reported line number in fortran combined OpenACC directives

2018-07-25 Thread Marek Polacek

On Wed, Jul 25, 2018 at 08:29:17AM -0700, Cesar Philippidis wrote:
> The fortran FE incorrectly records the line locations of combined acc
> loop directives when it lowers the construct to gimple. Usually this
> isn't a problem because the fortran FE is able to report problems with
> acc loops itself. However, there will be inaccuracies if the ME tries
> to use those locations.
> 
> Note that test cases are inconspicuously absent in this patch.
> However, without this bug fix, -fopt-info-note-omp will report bogus
> line numbers. This code patch will be tested in a later patch in
> this series.
> 
> Is this OK for trunk? I bootstrapped and regtested it on x86_64 with
> nvptx offloading.
> 
> Thanks,
> Cesar
> 
> 2018-XX-YY  Cesar Philippidis  
> 
>   gcc/fortran/
>   * trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
>   location of combined acc loops.
> 
> (cherry picked from gomp-4_0-branch r245653)
> 
> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> index f038f4c..e7707d0 100644
> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>gfc_omp_clauses construct_clauses, loop_clauses;
>tree stmt, oacc_clauses = NULL_TREE;
>enum tree_code construct_code;
> +  location_t loc = input_location;
>  
>switch (code->op)
>  {
> @@ -3930,12 +3931,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>else
>  pushlevel ();
>stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, _clauses, 
> NULL);
> +
> +  if (CAN_HAVE_LOCATION_P (stmt))
> +SET_EXPR_LOCATION (stmt, loc);

This is protected_set_expr_location.

Marek

[PATCH 3/3] Add user-friendly OpenACC diagnostics regarding detected parallelism.

2018-07-25 Thread Cesar Philippidis

This patch teaches GCC to inform the user how it assigned parallelism
to each OpenACC loop at compile time using the -fopt-info-note-omp
flag. For instance, given the acc parallel loop nest:

  #pragma acc parallel loop
  for (...)
#pragma acc loop vector
for (...)

GCC will report somthing like

  foo.c:4:0: note: Detected parallelism 
  foo.c:6:0: note: Detected parallelism 

Note how only the inner loop specifies vector parallelism. In this
example, GCC automatically assigned gang and worker parallelism to the
outermost loop. Perhaps, going forward, it would be useful to
distinguish which parallelism was specified by the user and which was
assigned by the compiler. But that can be added in a follow up patch.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
with nvptx offloading.

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

gcc/
* omp-offload.c (inform_oacc_loop): New function.
(execute_oacc_device_lower): Use it to display loop parallelism.

gcc/testsuite/
* c-c++-common/goacc/note-parallelism.c: New test.
* gfortran.dg/goacc/note-parallelism.f90: New test.

(cherry picked from gomp-4_0-branch r245683, and gcc/testsuite/ parts of
r245770)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 0abf028..66b99bb 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -866,6 +866,31 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " seq" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+? " gang" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+? " worker" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+? " vector" : "";
+  dump_location_t loc = dump_location_t::from_location_t (loop->loc);
+
+  dump_printf_loc (MSG_NOTE, loc,
+  "Detected parallelism \n", seq, gang,
+  worker, vector);
+
+  if (loop->child)
+inform_oacc_loop (loop->child);
+  if (loop->sibling)
+inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
structures as we go.  By construction these loops are properly
nested.  */
@@ -1533,6 +1558,8 @@ execute_oacc_device_lower ()
   dump_oacc_loop (dump_file, loops, 0);
   fprintf (dump_file, "\n");
 }
+  if (dump_enabled_p () && loops->child)
+inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
  dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
new file mode 100644
index 000..3ec794c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -0,0 +1,61 @@
+/* Test the output of -fopt-info-note-omp.  */
+
+/* { dg-additional-options "-fopt-info-note-omp" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop vector /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang vector /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang worker /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker vector /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang worker vector /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop /* { dg-message "note: Detected parallelism " } */
+for (y = 0; y < 10; y++)
+  ;
+
+#pragma acc parallel loop gang /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker /* { dg-message "note: Detected parallelism " } */
+for (y = 0; y < 10; y++)
+#pragma acc loop vector /* { dg-message "note: Detected parallelism " } */
+  for (z = 0; z < 10; z++)
+   ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90 
b/gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90
new file mode 100644
index 000..a0c78c5
---

[PATCH 2/3] Correct the reported line number in c++ combined OpenACC directives

2018-07-25 Thread Cesar Philippidis

Like the fortran FE, the C++ FE doesn't set the expr_location of the
split acc loop in combined acc parallel/kernels loop directives. This
only happens for with combined directives, otherwise
cp_parser_omp_construct would be responsible for setting the
location. After fixing this bug, I was able to resolve a couple of
long standing diagnostics discrepancies between the c/c++ FEs in the
test suite.

Is this patch OK for trunk? I bootstrapped and regtested using x86_64
with nvptx offloading.

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

gcc/cp/
* parser.c (cp_parser_oacc_kernels_parallel): Adjust EXPR_LOCATION
on the combined acc loop.

gcc/testsuite/
* c-c++-common/goacc/combined-directives-3.c: New test.
* c-c++-common/goacc/loop-2-kernels.c (void K): Adjust test.
* c-c++-common/goacc/loop-2-parallel.c (void P): Adjust test.
* c-c++-common/goacc/loop-3.c (void p2): Adjust test.

(cherry picked from gomp-4_0-branch r245673)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 90d5d00..52e61fc 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -37183,8 +37183,9 @@ cp_parser_oacc_kernels_parallel (cp_parser *parser, 
cp_token *pragma_tok,
  cp_lexer_consume_token (parser->lexer);
  tree block = begin_omp_parallel ();
  tree clauses;
- cp_parser_oacc_loop (parser, pragma_tok, p_name, mask, ,
-  if_p);
+ tree stmt = cp_parser_oacc_loop (parser, pragma_tok, p_name, mask,
+  , if_p);
+ protected_set_expr_location (stmt, pragma_tok->location);
  return finish_omp_construct (code, block, clauses);
}
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives-3.c 
b/gcc/testsuite/c-c++-common/goacc/combined-directives-3.c
new file mode 100644
index 000..77d4182
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives-3.c
@@ -0,0 +1,24 @@
+/* Verify the accuracy of the line number associated with combined
+   constructs.  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq auto /* { dg-error "'seq' overrides other 
OpenACC loop specifiers" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+for (y = 0; y < 10; y++)
+  ;
+
+#pragma acc parallel loop gang auto /* { dg-error "'auto' conflicts with other 
OpenACC loop specifiers" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker auto /* { dg-error "'auto' conflicts with other 
OpenACC loop specifiers" } */
+for (y = 0; y < 10; y++)
+#pragma acc loop vector
+  for (z = 0; z < 10; z++)
+   ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c 
b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
index 01ad32d..3a11ef5f 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -145,8 +145,8 @@ void K(void)
 #pragma acc kernels loop worker(num:5)
   for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" }
+  for (i = 0; i < 10; i++)
 { }
 #pragma acc kernels loop gang worker
   for (i = 0; i < 10; i++)
@@ -161,8 +161,8 @@ void K(void)
 #pragma acc kernels loop vector(length:5)
   for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" }
+  for (i = 0; i < 10; i++)
 { }
 #pragma acc kernels loop gang vector
   for (i = 0; i < 10; i++)
@@ -174,16 +174,16 @@ void K(void)
 #pragma acc kernels loop auto
   for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "" { target 
c } }
-  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" }
+  for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+#pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" }
+  for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop worker auto // { dg-error "'auto' conflicts" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+#pragma acc kernels loop worker auto // { dg-error "'auto' conflicts" }
+  for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop vector auto // { dg-error "'auto' conflicts" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+#pragma acc kernels loop vector auto // { dg-error "'auto'

[PATCH 1/3] Correct the reported line number in fortran combined OpenACC directives

2018-07-25 Thread Cesar Philippidis

The fortran FE incorrectly records the line locations of combined acc
loop directives when it lowers the construct to gimple. Usually this
isn't a problem because the fortran FE is able to report problems with
acc loops itself. However, there will be inaccuracies if the ME tries
to use those locations.

Note that test cases are inconspicuously absent in this patch.
However, without this bug fix, -fopt-info-note-omp will report bogus
line numbers. This code patch will be tested in a later patch in
this series.

Is this OK for trunk? I bootstrapped and regtested it on x86_64 with
nvptx offloading.

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

gcc/fortran/
* trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
location of combined acc loops.

(cherry picked from gomp-4_0-branch r245653)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c..e7707d0 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   gfc_omp_clauses construct_clauses, loop_clauses;
   tree stmt, oacc_clauses = NULL_TREE;
   enum tree_code construct_code;
+  location_t loc = input_location;
 
   switch (code->op)
 {
@@ -3930,12 +3931,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   else
 pushlevel ();
   stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, _clauses, NULL);
+
+  if (CAN_HAVE_LOCATION_P (stmt))
+SET_EXPR_LOCATION (stmt, loc);
+
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   else
 poplevel (0, 0);
-  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
-oacc_clauses);
+
+  stmt = build2_loc (loc, construct_code, void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (, stmt);
   return gfc_finish_block ();
 }
-- 
2.7.4

[PATCH 0/3] Add OpenACC diagnostics to -fopt-info-note-omp

2018-07-25 Thread Cesar Philippidis

This patch series extends -fopt-info-note-omp to include OpenACC loop
diagnostics when it is used in conjunction with -fopenacc. At present,
the diagnostics are limited to reporting how OpenACC loops are
partitioned, e.g., seq, gang, worker or vector. The major advantage of
this diagnostics is that it informs the user how GCC automatically
partitions independent loops, i.e., acc loops without any parallelism
clauses inside acc parallel regions. This information provides the
user with insights on how to select num_gangs, num_workers and
vector_length for their application.

All three patches in this series are independent from one
another. Patches 1 and 2 fix diagnostics bugs involving incorrect line
numbers. Patch 3 is responsible for generating the actual diagnostics.

Cesar

RE: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-07-25 Thread Tamar Christina

Hi Thomas,

Thanks for the review!

> >
> > I don't believe the TARGET_FP16 guard to be needed, because the
> > pattern doesn't actually generate code and requires another pattern
> > for that, and a reg to reg move should always be possible anyway. So
> > allowing the force to register here is safe and it allows the compiler
> > to generate a correct error instead of ICEing in an infinite loop.
> 
> How about subreg to subreg move? Doesn't that expand to more insns
> (subreg to reg and reg to subreg)? Couldn't you improve the logic to check
> that there is actually a mode change so that if there isn't (like moving from
> one subreg to another) just expand to a single move?
> 

Yes, but that is not a new issue. My patch is simply removing the TARGET_FP16 
restrictions and
merging two patterns that should be one using an iterator and nothing more.

The redundant mov is already there and a different issue than the ICE I'm 
trying to fix.

None of the code inside the expander is needed at all, the code really only has 
an effect on subreg
to subreg moves, as `force_reg` doesn't do anything when it's argument is 
already a reg.

The comment in the expander (which was already there) is wrong. The *reason* 
the ICE is fixed isn't
because of the `force_reg`. It's because of the mere presence of the expander 
itself. The expander matches the
standard mov$a optab and so this prevents emit_move_insn_1 from doing the move 
by subwords as it finds a pattern
that's able to do the move.

The expander however always falls through and doesn’t stop RTL generation. You 
could remove all the code in there and have
it properly match the *neon_mov instructions which will do the right thing 
later at code generation time and avoid the redundant
moves.  My guess is the original `force_reg` was copied from the other patterns 
like `movti` and the existing `mov`. There It makes
sense because the operands can be MEM or anything general_operand.

However the redundant moves are a different problem than what I'm trying to 
solve here. So I think that's another patch which requires further
testing.

Regards,
Tamar

> Best regards,
> 
> Thomas
> 
> >
> > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without
> > introducing any regressions while fixing
> >
> > gcc.dg/vect/vect-nop-move.c execution test
> > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test
> > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test
> > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test
> >
> > Regtested on armeb-none-eabi and no regressions.
> > Bootstrapped on arm-none-linux-gnueabihf and no issues.
> >
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-07-23  Tamar Christina  
> >
> > PR target/84711
> > * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.
> > * config/arm/neon.md (movv4hf, movv8hf): Refactored to..
> > (mov): ..this and enable unconditionally.
> >
> > --

RE: [PATCH 1/4] [ARC] Add more additional register names

2018-07-25 Thread Claudiu Zissulescu

Pushed. Thank you for your review,
Claudiu

From: Andrew Burgess [andrew.burg...@embecosm.com]
Sent: Wednesday, July 25, 2018 3:49 PM
To: Claudiu Zissulescu
Cc: gcc-patches@gcc.gnu.org; francois.bed...@synopsys.com; claziss
Subject: Re: [PATCH 1/4] [ARC] Add more additional register names

All the patches in this series look fine.

Thanks,
Andrew


* Claudiu Zissulescu  [2018-07-16 15:29:42 +0300]:

> From: claziss 
>
> gcc/
> 2017-06-14  Claudiu Zissulescu  
>
>   * config/arc/arc.h (ADDITIONAL_REGISTER_NAMES): Add additional
>   register names.
> ---
>  gcc/config/arc/arc.h | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 1780034aabe..3648314eaca 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -1215,7 +1215,15 @@ extern char rname56[], rname57[], rname58[], rname59[];
>  {\
>{"ilink",  29},\
>{"r29",29},\
> -  {"r30",30} \
> +  {"r30",30},\
> +  {"r40",40},\
> +  {"r41",41},\
> +  {"r42",42},\
> +  {"r43",43},\
> +  {"r56",56},\
> +  {"r57",57},\
> +  {"r58",58},\
> +  {"r59",59} \
>  }
>
>  /* Entry to the insn conditionalizer.  */
> --
> 2.17.1
>

Re: [PATCH] -fsave-optimization-record: add contrib/optrecord.py

2018-07-25 Thread David Malcolm

On Tue, 2018-07-24 at 16:11 +0200, Richard Biener wrote:
> On Mon, Jul 23, 2018 at 9:20 PM David Malcolm 
> wrote:
> > 
> > On Mon, 2018-07-23 at 11:46 +0200, Richard Biener wrote:
> > > On Fri, Jul 20, 2018 at 6:27 PM David Malcolm  > > m>
> > > wrote:
> > > > 
> > > > This patch adds a Python 3 module to "contrib" for reading the
> > > > output of
> > > > -fsave-optimization-record.
> > > > 
> > > > It can be imported from other Python code, or run standalone as
> > > > a
> > > > script,
> > > > in which case it prints the saved messages in a form resembling
> > > > GCC
> > > > diagnostics.
> > > > 
> > > > OK for trunk?
> > > 
> > > OK, but shouldn't there maybe a user-visible (and thus installed)
> > > tool for
> > > this kind of stuff?  Which would mean to place it somewhere else.
> > 
> > As well as this support code, I've got code that uses it to
> > generate
> > HTML reports.  I'm thinking that all this Python code might be
> > better
> > to maintain in an entirely separate repository, as a third-party
> > project (maintained by me, under some suitable Free Software
> > license,
> > accessible via PyPI), since I suspect that the release cycle ought
> > to
> > be different from that of gcc itself.
> > 
> > Would that be a better approach?
> 
> Possibly.
> 
> Richard.

[CCing Rainer and Mike]

A related matter that may affect this: currently there's not much test
coverage for -fsave-optimization-record in trunk (sorry).

"trunk" currently has:

(a) selftest::test_building_json_from_dump_calls, which captures the
results of some dump calls, and does very minimal textual verification
of the JSON that would be emitted by -fsave-optimization-record.

(b) gcc.c-torture/compile/pr86636.c, which merely verifies that we
don't ICE with a particular usage of -fsave-optimization-record.

Ideally we'd have some test coverage of the file written out by -fsave-
optimization-record: that it's valid JSON, that it conforms to the
expected internal structure, and that the expected data is correct and
complete (relative to some known dump calls; I have a plugin for
testing this if need be, in gcc.dg/plugins).

I don't know if Tcl has any JSON support, but in Python, JSON support
is built-in to the standard library, so I wonder if there's a case for
having a DejaGnu directive to (optionally) call out to a Python script
to check the JSON file that's been written, using this optrecord.py
module to handle loading the JSON.  Doing so would implicitly check
that that the emitted adheres to the expected internal structure, and
the script could add additional testcase-specific verifications.

The directive would have to check for the presence of Python, and emit
an UNSUPPORTED if unavailable.

If that sounds sane (and I'm willing to try implementing it), then that
suggests that optrecord.py should live in the gcc source tree, and be
installed somewhere (though I'm not sure where).

Alternatively, this could be done as a selftest, by adding some
functions to json.h/cc for inspecting and traversing the in-memory JSON
tree.  Though that wouldn't be as effective as an "end-to-
end"/integration test.

Thoughts?

Dave

> > Dave
> > 
> > > Richard.
> > > 
> > > > contrib/ChangeLog:
> > > > * optrecord.py: New file.
> > > > ---
> > > >  contrib/optrecord.py | 295
> > > > +++
> > > >  1 file changed, 295 insertions(+)
> > > >  create mode 100755 contrib/optrecord.py
> > > > 
> > > > diff --git a/contrib/optrecord.py b/contrib/optrecord.py
> > > > new file mode 100755
> > > > index 000..b07488e
> > > > --- /dev/null
> > > > +++ b/contrib/optrecord.py
> > > > @@ -0,0 +1,295 @@
> > > > +#!/usr/bin/env python3
> > > > +#
> > > > +# Python module for working with the result of -fsave-
> > > > optimization-record
> > > > +# Contributed by David Malcolm .
> > > > +#
> > > > +# Copyright (C) 2018 Free Software Foundation, Inc.
> > > > +# This file is part of GCC.
> > > > +#
> > > > +# GCC is free software; you can redistribute it and/or modify
> > > > it
> > > > under
> > > > +# the terms of the GNU General Public License as published by
> > > > the
> > > > Free
> > > > +# Software Foundation; either version 3, or (at your option)
> > > > any
> > > > later
> > > > +# version.
> > > > +#
> > > > +# GCC is distributed in the hope that it will be useful, but
> > > > WITHOUT ANY
> > > > +# WARRANTY; without even the implied warranty of
> > > > MERCHANTABILITY
> > > > or
> > > > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
> > > > Public
> > > > License
> > > > +# for more details.
> > > > +#
> > > > +# You should have received a copy of the GNU General Public
> > > > License
> > > > +# along with GCC; see the file COPYING3.  If not see
> > > > +# .  */
> > > > +
> > > > +import argparse
> > > > +import json
> > > > +import os
> > > > +import sys
> > > > +
> > > > +class TranslationUnit:
> > > > +"""Top-level class for containing

Re: [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-07-25 Thread Jakub Jelinek

On Wed, Jul 25, 2018 at 08:54:13AM -0600, Martin Sebor wrote:
> I don't mean for the special value to be used except internally
> for the defaults.  Otherwise, users wanting to override the default
> will choose a value other than it.  I'm happy to document it in
> the .opt file for internal users though.
> 
> -1 has the documented effect of disabling the warnings altogether
> (-1 is SIZE_MAX) so while I agree that -1 looks better it doesn't
> work.  (It would need more significant changes.)

The variable is signed, so -1 is not SIZE_MAX.  Even if -1 disables it, you
could use e.g. -2 or other negative value for the other special case.

Jakub

Re: [2/5] C-SKY port: Backend implementation

2018-07-25 Thread Sandra Loosemore


On 07/25/2018 07:16 AM, Paul Koning wrote:


Non-executable stacks are a very good thing.

That said, I also looked at the target hook documentation and was
left without any clue whatsoever.  It sure isn't clear what powers of
two have to do with descriptors, or what descriptors have to do with
support for nested functions.

Can you suggest places to look to get an understanding of this
feature?  It sounds like the only other option is "Use the source,
Luke".  Any specific targets that make a good reference
implementation for this?


FYI, so far I have found PR ada/67205 and the original patch posting 
here, but it looks like "Use the source" is indeed where we are on this. 
 :-(


https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02016.html

-Sandra

Re: [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-07-25 Thread Martin Sebor


On 07/25/2018 02:34 AM, Richard Biener wrote:

On Wed, Jul 25, 2018 at 4:07 AM Martin Sebor  wrote:


The very large option argument enhancement committed last week
inadvertently introduced an assumption about the LP64 data model
that makes the -Wxxx-larger-than options have a different effect
at their default documented setting of PTRDIFF_MAX between ILP32
and LP64.  As a result, the options are treated as suppressed in
ILP32 while triggering the expected warnings for allocations or
sizes in excess of the limit in ILP64.

To remove this I considered making it possible to use symbolic
constants like PTRDIFF_MAX as the option arguments so that
then defaults in the .opt files could be set to that.  Sadly,
options in GCC are processed before the values of constants
like PTRDIFF_MAX for the target are known, and deferring
the handling of just the -Wxxx-larger-than= options until
the constants have been computed would be too involved to
make it worth the effort.

To keep things simple I decided to have the code that handles
each of the affected options treat the HOST_WIDE_INT_MAX default
as a special request to substitute the value of PTRDIFF_MAX at
the time.

The attached patch implements this.


I wonder if documenting a special value of -1 would be easier for
users to use.  Your patch doesn't come with adjustments to
invoke.texi so I wonder how people could know of this special
handling?


I don't mean for the special value to be used except internally
for the defaults.  Otherwise, users wanting to override the default
will choose a value other than it.  I'm happy to document it in
the .opt file for internal users though.

-1 has the documented effect of disabling the warnings altogether
(-1 is SIZE_MAX) so while I agree that -1 looks better it doesn't
work.  (It would need more significant changes.)

I don't consider this the most elegant design but it's the best
I could think of short of complicating things even more.

Martin

[committed] optinfo-emit-json.cc: fix trivial memory leak

2018-07-25 Thread David Malcolm

There's a small leak in class optrecord_json_writer, which shows
up as a new leak in "make selftest-valgrind" as:

==50133== 40 bytes in 1 blocks are definitely lost in loss record 27 of 672
==50133==at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==50133==by 0x1EF5CAF: xrealloc (xmalloc.c:177)
==50133==by 0xEDDA56: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:288)
==50133==by 0xEDD74D: vec::reserve(unsigned 
int, bool) (vec.h:1621)
==50133==by 0xEDD515: vec::safe_push(json::array* const&) (vec.h:1730)
==50133==by 0xEDB060: optrecord_json_writer::optrecord_json_writer() 
(optinfo-emit-json.cc:124)
==50133==by 0xEDD141: selftest::test_building_json_from_dump_calls() 
(optinfo-emit-json.cc:935)
==50133==by 0xEDD3AF: selftest::optinfo_emit_json_cc_tests() 
(optinfo-emit-json.cc:955)
==50133==by 0x1DEB3AA: selftest::run_tests() (selftest-run-tests.c:78)
==50133==by 0x1065264: toplev::run_self_tests() (toplev.c:2225)
==50133==by 0x1065486: toplev::main(int, char**) (toplev.c:2303)
==50133==by 0x1E4A092: main (main.c:39)

The fix is trivial.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu;
manually checked "make selftest-valgrind" (which with this fix is now
back from four leaks down to three).

Committed to trunk as r262967 (under the "obvious" rule).

gcc/ChangeLog:
* optinfo-emit-json.cc (class optrecord_json_writer): Convert
field "m_scopes" from vec to auto_vec.
---
 gcc/optinfo-emit-json.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
index 6460a81..2199d52 100644
--- a/gcc/optinfo-emit-json.cc
+++ b/gcc/optinfo-emit-json.cc
@@ -75,7 +75,7 @@ private:
   json::array *m_root_tuple;
 
   /* The currently open scopes, for expressing nested optimization records.  */
-  vec m_scopes;
+  auto_vec m_scopes;
 };
 
 /* optrecord_json_writer's ctor.  Populate the top-level parts of the
-- 
1.8.5.3

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-25 Thread Segher Boessenkool

On Wed, Jul 25, 2018 at 09:47:31AM -0400, David Malcolm wrote:
> > +/* Return whether X is just a single set, with the source
> > +   a general_operand.  */
> > +static bool
> > +is_just_move (rtx x)
> > +{
> > +  if (INSN_P (x))
> > +x = PATTERN (x);
> > +
> > +  return (GET_CODE (x) == SET && general_operand (SET_SRC (x),
> > VOIDmode));
> > +}
> 
> If I'm reading it right, the patch only calls this function on i2 and
> i3, which are known to be rtx_insn *, rather than just rtx.

I used to also have

  is_just_move (XVECEXP (newpat, 0, 0))

etc.; during most of combine you do not have instructions, just patterns.


Segher

[PATCH] Alias -Warray-bounds to -Warray-bounds=1

2018-07-25 Thread Franz Sirl


Hi,

as discussed with Martin, this patch consolidates -Warray-bounds into an 
alias of -Warray-bounds=1.


Bootstrapped on x86_64-linux, no regressions.

Please apply if it's OK.

Franz.



gcc/ChangeLog:

2018-07-25  Franz Sirl  

* common.opt: Alias -Warray-bounds to -Warray-bounds=1.
* builtins.c (c_strlen): Use OPT_Warray_bounds_.
* gimple-ssa-warn-restrict.c (maybe_diag_offset_bounds): Likewise.
* tree-vrp.c (vrp_prop::check_array_ref, vrp_prop::check_mem_ref,
vrp_prop::search_for_addr_array): Likewise.


gcc/c-family/ChangeLog:

2018-07-25  Franz Sirl  

* c.opt: Remove -Warray-bounds.
* c-common.c (fold_offsetof, convert_vector_to_array_for_subscript):
Use OPT_Warray_bounds_.



Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 262966)
+++ gcc/builtins.c  (working copy)
@@ -675,7 +675,7 @@ c_strlen (tree src, int only_value)
   if (only_value != 2
  && !TREE_NO_WARNING (src))
 {
- warning_at (loc, OPT_Warray_bounds,
+ warning_at (loc, OPT_Warray_bounds_,
  "offset %qwi outside bounds of constant string",
  eltoff);
   TREE_NO_WARNING (src) = 1;
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 262966)
+++ gcc/c-family/c-common.c (working copy)
@@ -6257,7 +6257,7 @@ fold_offsetof (tree expr, tree type, enum tree_cod
 definition thereof.  */
  if (TREE_CODE (v) == ARRAY_REF
  || TREE_CODE (v) == COMPONENT_REF)
-   warning (OPT_Warray_bounds,
+   warning (OPT_Warray_bounds_,
 "index %E denotes an offset "
 "greater than size of %qT",
 t, TREE_TYPE (TREE_OPERAND (expr, 0)));
@@ -7662,7 +7662,7 @@ convert_vector_to_array_for_subscript (location_t
   if (TREE_CODE (index) == INTEGER_CST)
 if (!tree_fits_uhwi_p (index)
|| maybe_ge (tree_to_uhwi (index), TYPE_VECTOR_SUBPARTS (type)))
-  warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
+  warning_at (loc, OPT_Warray_bounds_, "index value is out of bound");
 
   /* We are building an ARRAY_REF so mark the vector as addressable
  to not run into the gimplifiers premature setting of DECL_GIMPLE_REG_P
Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 262966)
+++ gcc/c-family/c.opt  (working copy)
@@ -326,10 +326,6 @@ Wno-alloca-larger-than
 C ObjC C++ LTO ObjC++ Alias(Walloca-larger-than=,18446744073709551615EiB,none) 
Warning
 -Wno-alloca-larger-than Disable Walloca-larger-than= warning.  Equivalent to 
Walloca-larger-than= or larger.
 
-Warray-bounds
-LangEnabledBy(C ObjC C++ LTO ObjC++)
-; in common.opt
-
 Warray-bounds=
 LangEnabledBy(C ObjC C++ LTO ObjC++,Wall,1,0)
 ; in common.opt
Index: gcc/common.opt
===
--- gcc/common.opt  (revision 262966)
+++ gcc/common.opt  (working copy)
@@ -539,8 +539,7 @@ Common Var(warn_aggressive_loop_optimizations) Ini
 Warn if a loop with constant number of iterations triggers undefined behavior.
 
 Warray-bounds
-Common Var(warn_array_bounds) Warning
-Warn if an array is accessed out of bounds.
+Common Warning Alias(Warray-bounds=,1,0)
 
 Warray-bounds=
 Common Joined RejectNegative UInteger Var(warn_array_bounds) Warning 
IntegerRange(0, 2)
Index: gcc/gimple-ssa-warn-restrict.c
===
--- gcc/gimple-ssa-warn-restrict.c  (revision 262966)
+++ gcc/gimple-ssa-warn-restrict.c  (working copy)
@@ -1619,7 +1619,7 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
   if (DECL_P (ref.base)
  && TREE_CODE (type = TREE_TYPE (ref.base)) == ARRAY_TYPE)
{
- if (warning_at (loc, OPT_Warray_bounds,
+ if (warning_at (loc, OPT_Warray_bounds_,
  "%G%qD pointer overflow between offset %s "
  "and size %s accessing array %qD with type %qT",
  call, func, rangestr[0], rangestr[1], ref.base, type))
@@ -1629,13 +1629,13 @@ maybe_diag_offset_bounds (location_t loc, gcall *c
  warned = true;
}
  else
-   warned = warning_at (loc, OPT_Warray_bounds,
+   warned = warning_at (loc, OPT_Warray_bounds_,
 "%G%qD pointer overflow between offset %s "
 "and size %s",
 call, func, rangestr[0], rangestr[1]);
}
   else
-   warned = warning_at (loc, OPT_Warray_bounds,
+   warned = warning_at (loc, OPT_Warray_bounds_,

Re: [PATCH] Fix target clones (PR gcov-profile/85370).

2018-07-25 Thread Richard Biener

On Wed, Jul 25, 2018 at 3:38 PM Martin Liška  wrote:
>
> Hi.
>
> Target clones have DECL_ARTIFICIAL set to 1, but we want to
> provide --coverage for that. With patched GCC on can see:
>
> -:0:Source:pr85370.c
> -:0:Graph:pr85370.gcno
> -:0:Data:pr85370.gcda
> -:0:Runs:1
> -:0:Programs:1
> -:1:__attribute__((target_clones("arch=slm","default")))
> 1:2:int foo1 (int a, int b) { // executed  wrongly
> 1:3:  return a + b;
> -:4:}
> --
> foo1.arch_slm.0:
> 0:2:int foo1 (int a, int b) { // executed  wrongly
> 0:3:  return a + b;
> -:4:}
> --
> foo1.default.1:
> 1:2:int foo1 (int a, int b) { // executed  wrongly
> 1:3:  return a + b;
> -:4:}
> --
> -:5:
> 1:6:int foo2 (int a, int b) {
> 1:7:  return a + b;
> -:8:}
> -:9:
> 1:   10:int main() {
> 1:   11:  foo1(1, 1);
> 1:   12:  foo2(1, 1);
> 1:   13:  return 1;
> -:   14:}
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> Will install in couple of days if no objection.

I wonder if representing the clones as artificial but have their body be
marked as inline instance of the original function works for gcov?  I think
it should for debuggers.  A similar case is probably the
static_constructors_and_destructors
function which has all ctors/dtors of static objects inlined into but itself is
of course artificial.  Is that handled correctly?

Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2018-07-25  Martin Liska  
>
> PR gcov-profile/85370
> * coverage.c (coverage_begin_function): Do not mark target
> clones as artificial functions.
> ---
>  gcc/coverage.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
>

Re: [PATCH 1/4] [ARC] Add more additional register names

2018-07-25 Thread Andrew Burgess

All the patches in this series look fine.

Thanks,
Andrew


* Claudiu Zissulescu  [2018-07-16 15:29:42 +0300]:

> From: claziss 
> 
> gcc/
> 2017-06-14  Claudiu Zissulescu  
> 
>   * config/arc/arc.h (ADDITIONAL_REGISTER_NAMES): Add additional
>   register names.
> ---
>  gcc/config/arc/arc.h | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 1780034aabe..3648314eaca 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -1215,7 +1215,15 @@ extern char rname56[], rname57[], rname58[], rname59[];
>  {\
>{"ilink",  29},\
>{"r29",29},\
> -  {"r30",30} \
> +  {"r30",30},\
> +  {"r40",40},\
> +  {"r41",41},\
> +  {"r42",42},\
> +  {"r43",43},\
> +  {"r56",56},\
> +  {"r57",57},\
> +  {"r58",58},\
> +  {"r59",59} \
>  }
>  
>  /* Entry to the insn conditionalizer.  */
> -- 
> 2.17.1
>

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-25 Thread David Malcolm

On Tue, 2018-07-24 at 17:18 +, Segher Boessenkool wrote:
> This patch allows combine to combine two insns into two.  This helps
> in many cases, by reducing instruction path length, and also allowing
> further combinations to happen.  PR85160 is a typical example of code
> that it can improve.
> 
> This patch does not allow such combinations if either of the original
> instructions was a simple move instruction.  In those cases combining
> the two instructions increases register pressure without improving
> the
> code.  With this move test register pressure does no longer increase
> noticably as far as I can tell.
> 
> (At first I also didn't allow either of the resulting insns to be a
> move instruction.  But that is actually a very good thing to have, as
> should have been obvious).
> 
> Tested for many months; tested on about 30 targets.
> 
> I'll commit this later this week if there are no objections.
> 
> 
> Segher
> 
> 
> 2018-07-24  Segher Boessenkool  
> 
>   PR rtl-optimization/85160
>   * combine.c (is_just_move): New function.
>   (try_combine): Allow combining two instructions into two if
> neither of
>   the original instructions was a move.
> 
> ---
>  gcc/combine.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index cfe0f19..d64e84d 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -2604,6 +2604,17 @@ can_split_parallel_of_n_reg_sets (rtx_insn
> *insn, int n)
>return true;
>  }
>  
> +/* Return whether X is just a single set, with the source
> +   a general_operand.  */
> +static bool
> +is_just_move (rtx x)
> +{
> +  if (INSN_P (x))
> +x = PATTERN (x);
> +
> +  return (GET_CODE (x) == SET && general_operand (SET_SRC (x),
> VOIDmode));
> +}

If I'm reading it right, the patch only calls this function on i2 and
i3, which are known to be rtx_insn *, rather than just rtx.

Hence the only way in which GET_CODE (x) can be SET is if the INSN_P
pattern test sets x to PATTERN (x) immediately above: it can't be a SET
otherwise - but this isn't obvious from the code.

Can this function take an rtx_insn * instead?  Maybe something like:

/* Return whether INSN's pattern is just a single set, with the source
   a general_operand.  */
static bool
is_just_move_p (rtx_insn *insn)
{
  if (!INSN_P (insn))
return false;

  rtx x = PATTERN (insn);
  return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode));
}

or similar?

[...snip...]

Thanks; I hope this is constructive.
Dave

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-25 Thread Richard Biener

On Wed, Jul 25, 2018 at 2:41 PM Richard Earnshaw (lists)
 wrote:
>
> On 25/07/18 11:36, Richard Biener wrote:
> > On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 24/07/18 18:26, Richard Biener wrote:
> >>> On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
> >>>  wrote:
> 
> 
>  This patch defines a new intrinsic function
>  __builtin_speculation_safe_value.  A generic default implementation is
>  defined which will attempt to use the backend pattern
>  "speculation_safe_barrier".  If this pattern is not defined, or if it
>  is not available, then the compiler will emit a warning, but
>  compilation will continue.
> 
>  Note that the test spec-barrier-1.c will currently fail on all
>  targets.  This is deliberate, the failure will go away when
>  appropriate action is taken for each target backend.
> >>>
> >>> So given this series is supposed to be backported I question
> >>>
> >>> +rtx
> >>> +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
> >>> +   rtx result, rtx val,
> >>> +   rtx failval ATTRIBUTE_UNUSED)
> >>> +{
> >>> +  emit_move_insn (result, val);
> >>> +#ifdef HAVE_speculation_barrier
> >>> +  /* Assume the target knows what it is doing: if it defines a
> >>> + speculation barrier, but it is not enabled, then assume that one
> >>> + isn't needed.  */
> >>> +  if (HAVE_speculation_barrier)
> >>> +emit_insn (gen_speculation_barrier ());
> >>> +
> >>> +#else
> >>> +  warning_at (input_location, 0,
> >>> + "this target does not define a speculation barrier; "
> >>> + "your program will still execute correctly, but speculation 
> >>> "
> >>> + "will not be inhibited");
> >>> +#endif
> >>> +  return result;
> >>>
> >>> which makes all but aarch64 archs warn on __bultin_speculation_safe_value
> >>> uses, even those that do not suffer from Spectre like all those embedded 
> >>> targets
> >>> where implementations usually do not speculate at all.
> >>>
> >>> In fact for those targets the builtin stays in the way of optimization on 
> >>> GIMPLE
> >>> as well so we should fold it away early if neither the target hook is
> >>> implemented
> >>> nor there is a speculation_barrier insn.
> >>>
> >>> So, please make resolve_overloaded_builtin return a no-op on such targets
> >>> which means you can remove the above warning.  Maybe such targets
> >>> shouldn't advertise / initialize the builtins at all?
> >>
> >> I disagree with your approach here.  Why would users not want to know
> >> when the compiler is failing to implement a security feature when it
> >> should?  As for targets that don't need something, they can easily
> >> define the hook as described to suppress the warning.
> >>
> >> Or are you just suggesting moving the warning to resolve overloaded 
> >> builtin.
> >
> > Well.  You could argue I say we shouldn't even support
> > __builtin_sepeculation_safe_value
> > for archs that do not need it or have it not implemented.  That way users 
> > can
> > decide:
> >
> > #if __HAVE_SPECULATION_SAFE_VALUE
> >  
> > #else
> > #warning oops // or nothing
> > #endif
> >
>
> So how about removing the predefine of __HAVE_S_S_V when the builtin is
> a nop, but then leaving the warning in if people try to use it anyway?

Little bit inconsistent but I guess I could live with that.  It still leaves
the question open for how to declare you do not need speculation
barriers at all then.

> >> Other ports will need to take action, but in general, it can be as
> >> simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
> >> simpler still if nothing is needed for that architecture.
> >
> > Then that should be the default.  You might argue we'll only see
> > __builtin_speculation_safe_value uses for things like Firefox which
> > is unlikely built for AVR (just to make an example).  But people
> > are going to test build just on x86 and if they build with -Werror
> > this will break builds on all targets that didn't even get the chance
> > to implement this feature.
> >
> >> There is a test which is intended to fail to targets that have not yet
> >> been patched - I thought that was better than hard-failing the build,
> >> especially given that we want to back-port.
> >>
> >> Port maintainers DO need to decide what to do about speculation, even if
> >> it is explicitly that no mitigation is needed.
> >
> > Agreed.  But I didn't yet see a request for maintainers to decide that?
> >
>
> consider it made, then :-)

I suspect that drew their attention ;)

So a different idea would be to produce patches implementing the hook for
each target "empty", CC the target maintainers and hope they quickly
ack if the target doesn't have a speculation problem.  Others then would
get no patch (from you) and thus raise a warning?

Maybe at least do that for all primary and secondary targets given we do
not want to regress

[PATCH] GCOV: add cache for streamed locations.

2018-07-25 Thread Martin Liška

Hi.

Last patch from GCOV series is about not streaming of redundant lines
for a basic-block. It helps to fix few issues.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
Will install in couple of days if no objection.

Martin

gcc/ChangeLog:

2018-07-25  Martin Liska  

PR gcov-profile/85338
PR gcov-profile/85350
PR gcov-profile/85372
* profile.c (struct location_triplet): New.
(struct location_triplet_hash): Likewise.
(output_location): Do not output a BB that
is already recorded for a line.
(branch_prob): Use streamed_locations.

gcc/testsuite/ChangeLog:

2018-07-25  Martin Liska  

PR gcov-profile/85338
PR gcov-profile/85350
PR gcov-profile/85372
* gcc.misc-tests/gcov-pr85338.c: New test.
* gcc.misc-tests/gcov-pr85350.c: New test.
* gcc.misc-tests/gcov-pr85372.c: New test.
---
 gcc/profile.c   | 91 +++--
 gcc/testsuite/gcc.misc-tests/gcov-pr85338.c | 21 +
 gcc/testsuite/gcc.misc-tests/gcov-pr85350.c | 21 +
 gcc/testsuite/gcc.misc-tests/gcov-pr85372.c | 28 +++
 4 files changed, 153 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr85338.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr85350.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr85372.c


diff --git a/gcc/profile.c b/gcc/profile.c
index 00f37b657a4..cb51e0d4c51 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -919,17 +919,90 @@ compute_value_histograms (histogram_values values, unsigned cfg_checksum,
 free (histogram_counts[t]);
 }
 
+/* Location triplet which records a location.  */
+struct location_triplet
+{
+  const char *filename;
+  int lineno;
+  int bb_index;
+};
+
+/* Traits class for streamed_locations hash set below.  */
+
+struct location_triplet_hash : typed_noop_remove 
+{
+  typedef location_triplet value_type;
+  typedef location_triplet compare_type;
+
+  static hashval_t
+  hash (const location_triplet )
+  {
+inchash::hash hstate (0);
+if (ref.filename)
+  hstate.add_int (strlen (ref.filename));
+hstate.add_int (ref.lineno);
+hstate.add_int (ref.bb_index);
+return hstate.end ();
+  }
+
+  static bool
+  equal (const location_triplet , const location_triplet )
+  {
+return ref1.lineno == ref2.lineno
+  && ref1.bb_index == ref2.bb_index
+  && ref1.filename != NULL
+  && ref2.filename != NULL
+  && strcmp (ref1.filename, ref2.filename) == 0;
+  }
+
+  static void
+  mark_deleted (location_triplet )
+  {
+ref.lineno = -1;
+  }
+
+  static void
+  mark_empty (location_triplet )
+  {
+ref.lineno = -2;
+  }
+
+  static bool
+  is_deleted (const location_triplet )
+  {
+return ref.lineno == -1;
+  }
+
+  static bool
+  is_empty (const location_triplet )
+  {
+return ref.lineno == -2;
+  }
+};
+
+
+
+
 /* When passed NULL as file_name, initialize.
When passed something else, output the necessary commands to change
line to LINE and offset to FILE_NAME.  */
 static void
-output_location (char const *file_name, int line,
+output_location (hash_set *streamed_locations,
+		 char const *file_name, int line,
 		 gcov_position_t *offset, basic_block bb)
 {
   static char const *prev_file_name;
   static int prev_line;
   bool name_differs, line_differs;
 
+  location_triplet triplet;
+  triplet.filename = file_name;
+  triplet.lineno = line;
+  triplet.bb_index = bb ? bb->index : 0;
+
+  if (streamed_locations->add (triplet))
+return;
+
   if (!file_name)
 {
   prev_file_name = NULL;
@@ -1018,6 +1091,8 @@ branch_prob (void)
   flow_call_edges_add (NULL);
   add_noreturn_fake_exit_edges ();
 
+  hash_set  streamed_locations;
+
   /* We can't handle cyclic regions constructed using abnormal edges.
  To avoid these we replace every source of abnormal edge by a fake
  edge from entry node and every destination by fake edge to exit.
@@ -1254,7 +1329,7 @@ branch_prob (void)
 
   /* Line numbers.  */
   /* Initialize the output.  */
-  output_location (NULL, 0, NULL, NULL);
+  output_location (_locations, NULL, 0, NULL, NULL);
 
   hash_set > seen_locations;
 
@@ -1268,8 +1343,8 @@ branch_prob (void)
 	  location_t loc = DECL_SOURCE_LOCATION (current_function_decl);
 	  seen_locations.add (loc);
 	  expanded_location curr_location = expand_location (loc);
-	  output_location (curr_location.file, curr_location.line,
-			   , bb);
+	  output_location (_locations, curr_location.file,
+			   curr_location.line, , bb);
 	}
 
 	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
@@ -1279,8 +1354,8 @@ branch_prob (void)
 	  if (!RESERVED_LOCATION_P (loc))
 		{
 		  seen_locations.add (loc);
-		  output_location (gimple_filename (stmt), gimple_lineno (stmt),
-   , bb);
+		  output_location (_locations, gimple_filename (stmt),
+   gimple_lineno

[PATCH] Fix GCOV CFG related issues.

2018-07-25 Thread Martin Liška

Hi.

It fixes couple of very similar issues. Currently we handle GOTO
expression, but it's not easy to find these in GIMPLE middle-end.
The patch fixes that.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
Will install in couple of days if no objection.

Martin

gcc/ChangeLog:

2018-07-25  Martin Liska  

PR gcov-profile/83813
PR gcov-profile/84758
PR gcov-profile/85217
PR gcov-profile/85332
* profile.c (branch_prob): Do not record GOTO expressions
for GIMPLE statements which locations are already streamed.

gcc/testsuite/ChangeLog:

2018-07-25  Martin Liska  

PR gcov-profile/83813
PR gcov-profile/84758
PR gcov-profile/85217
PR gcov-profile/85332
* gcc.misc-tests/gcov-pr83813.c: New test.
* gcc.misc-tests/gcov-pr84758.c: New test.
* gcc.misc-tests/gcov-pr85217.c: New test.
* gcc.misc-tests/gcov-pr85332.c: New test.
---
 gcc/profile.c   | 29 ++---
 gcc/testsuite/gcc.misc-tests/gcov-pr83813.c | 23 
 gcc/testsuite/gcc.misc-tests/gcov-pr84758.c | 28 
 gcc/testsuite/gcc.misc-tests/gcov-pr85217.c | 20 ++
 gcc/testsuite/gcc.misc-tests/gcov-pr85332.c | 26 ++
 5 files changed, 117 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr83813.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr84758.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr85217.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr85332.c


diff --git a/gcc/profile.c b/gcc/profile.c
index 0cd0270b4fb..00f37b657a4 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -1256,6 +1256,8 @@ branch_prob (void)
   /* Initialize the output.  */
   output_location (NULL, 0, NULL, NULL);
 
+  hash_set > seen_locations;
+
   FOR_EACH_BB_FN (bb, cfun)
 	{
 	  gimple_stmt_iterator gsi;
@@ -1263,8 +1265,9 @@ branch_prob (void)
 
 	  if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
 	{
-	  expanded_location curr_location =
-		expand_location (DECL_SOURCE_LOCATION (current_function_decl));
+	  location_t loc = DECL_SOURCE_LOCATION (current_function_decl);
+	  seen_locations.add (loc);
+	  expanded_location curr_location = expand_location (loc);
 	  output_location (curr_location.file, curr_location.line,
 			   , bb);
 	}
@@ -1272,17 +1275,25 @@ branch_prob (void)
 	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
 	{
 	  gimple *stmt = gsi_stmt (gsi);
-	  if (!RESERVED_LOCATION_P (gimple_location (stmt)))
-		output_location (gimple_filename (stmt), gimple_lineno (stmt),
- , bb);
+	  location_t loc = gimple_location (stmt);
+	  if (!RESERVED_LOCATION_P (loc))
+		{
+		  seen_locations.add (loc);
+		  output_location (gimple_filename (stmt), gimple_lineno (stmt),
+   , bb);
+		}
 	}
 
-	  /* Notice GOTO expressions eliminated while constructing the CFG.  */
+	  /* Notice GOTO expressions eliminated while constructing the CFG.
+	 It's hard to distinguish such expression, but goto_locus should
+	 not be any of already seen location.  */
+	  location_t loc;
 	  if (single_succ_p (bb)
-	  && !RESERVED_LOCATION_P (single_succ_edge (bb)->goto_locus))
+	  && (loc = single_succ_edge (bb)->goto_locus)
+	  && !RESERVED_LOCATION_P (loc)
+	  && !seen_locations.contains (loc))
 	{
-	  expanded_location curr_location
-		= expand_location (single_succ_edge (bb)->goto_locus);
+	  expanded_location curr_location = expand_location (loc);
 	  output_location (curr_location.file, curr_location.line,
 			   , bb);
 	}
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-pr83813.c b/gcc/testsuite/gcc.misc-tests/gcov-pr83813.c
new file mode 100644
index 000..ac935b969f8
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-pr83813.c
@@ -0,0 +1,23 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage" } */
+/* { dg-do run { target native } } */
+
+union U
+{
+int f0;
+unsigned char f1;
+};
+
+int main()
+{
+int i = 0;
+union U u = {0};  /* count(1) */
+for (u.f1 = 0; u.f1 != -2; ++u.f1) {
+i ^= u.f1;  /* count(1) */
+if (i < 1)  /* count(1) */
+return 0;  /* count(1) */
+}
+
+return 1;
+}
+
+/* { dg-final { run-gcov gcov-pr83813.c } } */
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-pr84758.c b/gcc/testsuite/gcc.misc-tests/gcov-pr84758.c
new file mode 100644
index 000..2ae6900375f
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-pr84758.c
@@ -0,0 +1,28 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage" } */
+/* { dg-do run { target native } } */
+
+int x, y;
+
+static void
+foo (int a, int b)
+{
+  {
+if (a == 1 || a == 2)  /* count(1) */
+  {
+	x = 4;  /* count(1) */
+	if (b == 3)  /* count(1) */
+	  x = 6;  /* count(1) */
+  }
+else
+  x = 15;  /* count(#)

[PATCH] Fix target clones (PR gcov-profile/85370).

2018-07-25 Thread Martin Liška

Hi.

Target clones have DECL_ARTIFICIAL set to 1, but we want to
provide --coverage for that. With patched GCC on can see:

-:0:Source:pr85370.c
-:0:Graph:pr85370.gcno
-:0:Data:pr85370.gcda
-:0:Runs:1
-:0:Programs:1
-:1:__attribute__((target_clones("arch=slm","default")))
1:2:int foo1 (int a, int b) { // executed  wrongly
1:3:  return a + b;
-:4:}
--
foo1.arch_slm.0:
0:2:int foo1 (int a, int b) { // executed  wrongly
0:3:  return a + b;
-:4:}
--
foo1.default.1:
1:2:int foo1 (int a, int b) { // executed  wrongly
1:3:  return a + b;
-:4:}
--
-:5:
1:6:int foo2 (int a, int b) {
1:7:  return a + b;
-:8:}
-:9:
1:   10:int main() {
1:   11:  foo1(1, 1);
1:   12:  foo2(1, 1);
1:   13:  return 1;
-:   14:}

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
Will install in couple of days if no objection.

Martin

gcc/ChangeLog:

2018-07-25  Martin Liska  

PR gcov-profile/85370
* coverage.c (coverage_begin_function): Do not mark target
clones as artificial functions.
---
 gcc/coverage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/gcc/coverage.c b/gcc/coverage.c
index da171c84d3c..bae6f5cafac 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -656,7 +656,8 @@ coverage_begin_function (unsigned lineno_checksum, unsigned cfg_checksum)
   gcov_write_unsigned (cfg_checksum);
   gcov_write_string (IDENTIFIER_POINTER
 		 (DECL_ASSEMBLER_NAME (current_function_decl)));
-  gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl));
+  gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl)
+		   && !DECL_FUNCTION_VERSIONED (current_function_decl));
   gcov_write_filename (xloc.file);
   gcov_write_unsigned (xloc.line);
   gcov_write_unsigned (xloc.column);

Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

2018-07-25 Thread Thomas Preudhomme

Hi Kyrill,

Using memory_operand worked, the issues I encountered when using it in
earlier versions of the patch must have been due to the missing test
on address_operand in the preparation statements which I added later.
Please find an updated patch in attachment. ChangeLog entry is as
follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  

* target-insns.def (stack_protect_combined_set): Define new standard
pattern name.
(stack_protect_combined_test): Likewise.
* cfgexpand.c (stack_protect_prologue): Try new
stack_protect_combined_set pattern first.
* function.c (stack_protect_epilogue): Try new
stack_protect_combined_test pattern first.
* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
parameters to control which register to use as PIC register and force
reloading PIC register respectively.  Insert in the stream of insns if
possible.
(legitimize_pic_address): Expose above new parameters in prototype and
adapt recursive calls accordingly.
(arm_legitimize_address): Adapt to new legitimize_pic_address
prototype.
(thumb_legitimize_address): Likewise.
(arm_emit_call_insn): Adapt to new require_pic_register prototype.
* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
change.
* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
prototype change.
(stack_protect_combined_set): New insn_and_split pattern.
(stack_protect_set): New insn pattern.
(stack_protect_combined_test): New insn_and_split pattern.
(stack_protect_test): New insn pattern.
* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
(UNSPEC_SP_TEST): Likewise.
* doc/md.texi (stack_protect_combined_set): Document new standard
pattern name.
(stack_protect_set): Clarify that the operand for guard's address is
legal.
(stack_protect_combined_test): Document new standard pattern name.
(stack_protect_test): Clarify that the operand for guard's address is
legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  

* gcc.target/arm/pr85434.c: New test.

Bootstrapped again for Arm and Thumb-2 and regtested with and without
-fstack-protector-all without any regression.

Best regards,

Thomas
On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme
 wrote:
>
> [Dropping Jeff Law from the list since he already commented on the
> middle end parts]
>
> Hi Kyrill,
>
> On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov
>  wrote:
> >
> > Hi Thomas,
> >
> > On 17/07/18 12:02, Thomas Preudhomme wrote:
> > > Fixed in attached patch. ChangeLog entries are unchanged:
> > >
> > > *** gcc/ChangeLog ***
> > >
> > > 2018-07-05  Thomas Preud'homme 
> > >
> > > PR target/85434
> > > * target-insns.def (stack_protect_combined_set): Define new standard
> > > pattern name.
> > > (stack_protect_combined_test): Likewise.
> > > * cfgexpand.c (stack_protect_prologue): Try new
> > > stack_protect_combined_set pattern first.
> > > * function.c (stack_protect_epilogue): Try new
> > > stack_protect_combined_test pattern first.
> > > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > parameters to control which register to use as PIC register and force
> > > reloading PIC register respectively.
> > > (legitimize_pic_address): Expose above new parameters in prototype and
> > > adapt recursive calls accordingly.
> > > (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > prototype.
> > > (thumb_legitimize_address): Likewise.
> > > (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > change.
> > > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > prototype change.
> > > (stack_protect_combined_set): New insn_and_split pattern.
> > > (stack_protect_set): New insn pattern.
> > > (stack_protect_combined_test): New insn_and_split pattern.
> > > (stack_protect_test): New insn pattern.
> > > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > (UNSPEC_SP_TEST): Likewise.
> > > * doc/md.texi (stack_protect_combined_set): Document new standard
> > > pattern name.
> > > (stack_protect_set): Clarify that the operand for guard's address is
> > > legal.
> > > (stack_protect_combined_test): Document new standard pattern name.
> > > (stack_protect_test): Clarify that the operand for guard's address is
> > > legal.
> > >
> > > *** gcc/testsuite/ChangeLog ***
> > >
> > > 2018-07-05  Thomas Preud'homme 
> > >
> > > PR target/85434
> > > * gcc.target/arm/pr85434.c: New test.
> > >
> >
> > Sorry for the delay. Some comments inline.
> >
> > Kyrill
> >
> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> > index d6e3c382085..d1a893ac56e 100644
> > --- a/gcc/cfgexpand.c
> > +++ b/gcc/cfgexpand.c
> > @@

Re: optimize std::vector move assignment

2018-07-25 Thread Marc Glisse


On Wed, 25 Jul 2018, Jonathan Wakely wrote:

_M_copy_data is not really needed, we could add a defaulted assignment 
operator, or remove the move constructor (and call a new _M_clear() from 
the 2 users), etc. However, it seemed the least intrusive, the least likely 
to have weird consequences.


Yes, the alternative would be:

--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -100,14 +100,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  : _M_start(__x._M_start), _M_finish(__x._M_finish),
_M_end_of_storage(__x._M_end_of_storage)
  { __x._M_start = __x._M_finish = __x._M_end_of_storage = pointer(); }
+
+   _Vector_impl_data(const _Vector_impl_data&) = default;
+   _Vector_impl_data& oeprator=(const _Vector_impl_data&) = default;
#endif

  void
  _M_swap_data(_Vector_impl_data& __x) _GLIBCXX_NOEXCEPT
  {
- std::swap(_M_start, __x._M_start);
- std::swap(_M_finish, __x._M_finish);
- std::swap(_M_end_of_storage, __x._M_end_of_storage);
+ _Vector_impl_data __tmp = *this;
+ *this = __x;
+ __x = __tmp;


Or just std::swap(*this, __x).


  }
 };

Your _M_copy_data seems fine. It avoids unintentional copies, because
the copy constructor and copy assignment operator remain deleted (at
least in C++11).

I didn't add a testcase because I don't see any dg-final scan-tree-dump in 
the libstdc++ testsuite. The closest would be g++.dg/pr83239.C, 
g++.dg/vect/pr64410.cc, g++.dg/vect/pr33426-ivdep-4.cc that include 
, but from previous experience (already with vector), adding 
libstdc++ testcases to the C++ testsuite is not very popular.


Yes, C++ tests using std::vector are sometimes a bit fragile.

I don't see any reason we can't use scan-tree-dump in the libstdc++
testsuite, if you wanted to add one. We do have other dg-final tests.


The others only test for the presence of some name in assembly. But I may 
try it later.


--
Marc Glisse

Re: [2/5] C-SKY port: Backend implementation

2018-07-25 Thread Paul Koning




> On Jul 25, 2018, at 12:50 AM, Jeff Law  wrote:
> 
 ...
>>> It did.  See TARGET_CUSTOM_FUNCTION_DESCRIPTORS and the (relatively few)
>>> ports that define it.
>> 
>> Hmmm, I completely failed to make that connection from the docs -- the
>> whole description of that hook is pretty gibberishy and I thought it was
>> something for targets where the ABI already specifies some "standard
>> calling sequence" using descriptors (C-SKY doesn't), rather than a
>> generic alternative to executable trampolines.  Putting on my doc
>> maintainer hat briefly, I can see this needs a lot of work.  :-(
> Most likely :-)  So many things to do, so little time.
> 
> 
>> 
>> Anyway, is this required for new ports nowadays?  If so, I at least know
>> what to search for now.  At this point I couldn't say whether this would
>> do anything to fix the situation on ck801 targets where there simply
>> aren't enough spare registers available to the trampoline to both hold
>> the static link and do an indirect jump.
> It's not required, but preferred, particularly if the part has an MMU
> that can provide no-execute protections on pages in memory.  If the
> target doesn't have an mmu, then it's of marginal value.
> 
> The key advantage it has over the old trampoline implementation is that
> stacks can remain non-executable, even for Ada and nested functions.
> That's a big win from a security standpoint.

Non-executable stacks are a very good thing.

That said, I also looked at the target hook documentation and was left without 
any clue whatsoever.  It sure isn't clear what powers of two have to do with 
descriptors, or what descriptors have to do with support for nested functions.

Can you suggest places to look to get an understanding of this feature?  It 
sounds like the only other option is "Use the source, Luke".  Any specific 
targets that make a good reference implementation for this?

paul

Re: optimize std::vector move assignment

2018-07-25 Thread Jonathan Wakely


On 25/07/18 14:38 +0200, Marc Glisse wrote:

Hello,

I talked about this last year 
(https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01063.html and thread), 
this tweaks std::vector move assignment to help gcc generate better 
code for it.


Ah yes, thank for revisiting it.


For this code

#include 
#include 
typedef std::vector V;
void f(V,V){a=std::move(b);}

with -O2 -fdump-tree-optimized on powerpc64le-unknown-linux-gnu, we 
currently have


 _5 = MEM[(int * &)a_3(D)];
 MEM[(int * &)a_3(D)] = 0B;
 MEM[(int * &)a_3(D) + 8] = 0B;
 MEM[(int * &)a_3(D) + 16] = 0B;
 _6 = MEM[(int * &)b_2(D)];
 MEM[(int * &)a_3(D)] = _6;
 MEM[(int * &)b_2(D)] = 0B;
 _7 = MEM[(int * &)a_3(D) + 8];
 _8 = MEM[(int * &)b_2(D) + 8];
 MEM[(int * &)a_3(D) + 8] = _8;
 MEM[(int * &)b_2(D) + 8] = _7;
 _9 = MEM[(int * &)a_3(D) + 16];
 _10 = MEM[(int * &)b_2(D) + 16];
 MEM[(int * &)a_3(D) + 16] = _10;
 MEM[(int * &)b_2(D) + 16] = _9;
 if (_5 != 0B)

with the patch, we go down to

 _3 = MEM[(const struct _Vector_impl_data &)a_4(D)]._M_start;
 _5 = MEM[(const struct _Vector_impl_data &)b_2(D)]._M_start;
 MEM[(struct _Vector_impl_data *)a_4(D)]._M_start = _5;
 _6 = MEM[(const struct _Vector_impl_data &)b_2(D)]._M_finish;
 MEM[(struct _Vector_impl_data *)a_4(D)]._M_finish = _6;
 _7 = MEM[(const struct _Vector_impl_data &)b_2(D)]._M_end_of_storage;
 MEM[(struct _Vector_impl_data *)a_4(D)]._M_end_of_storage = _7;
 MEM[(struct _Vector_impl_data *)b_2(D)]._M_start = 0B;
 MEM[(struct _Vector_impl_data *)b_2(D)]._M_finish = 0B;
 MEM[(struct _Vector_impl_data *)b_2(D)]._M_end_of_storage = 0B;
 if (_3 != 0B)

removing 2 reads and 3 writes. At -O3 we also get more vectorization.

The main idea is to give the compiler more type information. 
Currently, the only type information the compiler cares about is the 
type used for a memory read/write. Using std::swap as before, the 
reads/writes are done on int&. Doing it directly, they are done on 
_Vector_impl_data::_M_start, a more precise information. Maybe some 
day the compiler will get stricter and be able to deduce the same 
information, but not yet.


The second point is to rotate { new, old, tmp } in an order that's 
simpler for the compiler. I was going to remove the swaps and use 
_M_copy_data directly, but since doing the swaps in a different order 
works...


_M_copy_data is not really needed, we could add a defaulted assignment 
operator, or remove the move constructor (and call a new _M_clear() 
from the 2 users), etc. However, it seemed the least intrusive, the 
least likely to have weird consequences.


Yes, the alternative would be:

--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -100,14 +100,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   : _M_start(__x._M_start), _M_finish(__x._M_finish),
 _M_end_of_storage(__x._M_end_of_storage)
   { __x._M_start = __x._M_finish = __x._M_end_of_storage = pointer(); }
+
+   _Vector_impl_data(const _Vector_impl_data&) = default;
+   _Vector_impl_data& oeprator=(const _Vector_impl_data&) = default;
#endif

   void
   _M_swap_data(_Vector_impl_data& __x) _GLIBCXX_NOEXCEPT
   {
- std::swap(_M_start, __x._M_start);
- std::swap(_M_finish, __x._M_finish);
- std::swap(_M_end_of_storage, __x._M_end_of_storage);
+ _Vector_impl_data __tmp = *this;
+ *this = __x;
+ __x = __tmp;
   }
  };

Your _M_copy_data seems fine. It avoids unintentional copies, because
the copy constructor and copy assignment operator remain deleted (at
least in C++11).

I didn't add a testcase because I don't see any dg-final 
scan-tree-dump in the libstdc++ testsuite. The closest would be 
g++.dg/pr83239.C, g++.dg/vect/pr64410.cc, 
g++.dg/vect/pr33426-ivdep-4.cc that include , but from 
previous experience (already with vector), adding libstdc++ testcases 
to the C++ testsuite is not very popular.


Yes, C++ tests using std::vector are sometimes a bit fragile.

I don't see any reason we can't use scan-tree-dump in the libstdc++
testsuite, if you wanted to add one. We do have other dg-final tests.



Index: libstdc++-v3/include/bits/stl_vector.h
===
--- libstdc++-v3/include/bits/stl_vector.h  (revision 262948)
+++ libstdc++-v3/include/bits/stl_vector.h  (working copy)
@@ -96,25 +96,36 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
{ }

#if __cplusplus >= 201103L
_Vector_impl_data(_Vector_impl_data&& __x) noexcept
: _M_start(__x._M_start), _M_finish(__x._M_finish),
  _M_end_of_storage(__x._M_end_of_storage)
{ __x._M_start = __x._M_finish = __x._M_end_of_storage = pointer(); }
#endif

void
+   _M_copy_data(_Vector_impl_data const& __x) _GLIBCXX_NOEXCEPT
+   {
+ _M_start = __x._M_start;
+ _M_finish = __x._M_finish;
+ _M_end_of_storage = __x._M_end_of_storage;
+   }
+
+   void
_M_swap_data(_Vector_impl_data&

Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-25 Thread Sam Tebbs


On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email still 
apply.




   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]


diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f1784d72e55c412d076de43f2f7aad4632d55ecb..e92a3b49c65e84d2a16a2a480c359a0b4d8fa3e3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3033,15 +3033,16 @@
   [(set_attr "type" "neon_to_gp")]
 )
 
-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-	(zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(zero_extend:GPI
 	  (vec_select:
 	(match_operand:VDQQH 1 "register_operand" "w")
 	(parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-operands[2] = aarch64_endian_lane_rtx (mode, INTVAL (operands[2]));
+operands[2] = aarch64_endian_lane_rtx (mode,
+	   INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]
diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
new file mode 100644
index ..deb613cd23150a83dfd36ae84504415993b97be3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* Tests *aarch64_get_lane_zero_extenddiv16qi.  */
+typedef unsigned char v16qi __attribute__ ((vector_size (16)));
+/* Tests *aarch64_get_lane_zero_extenddiv8qi.  */
+typedef unsigned char v8qi __attribute__ ((vector_size (8)));
+
+/* Tests *aarch64_get_lane_zero_extendsiv8hi.  */
+typedef unsigned short v16hi __attribute__ ((vector_size (16)));
+/* Tests *aarch64_get_lane_zero_extendsiv4hi.  */
+typedef unsigned short v8hi __attribute__ ((vector_size (8)));
+
+unsigned long long
+foo_16qi (v16qi a)
+{
+  return a[0];
+}
+
+unsigned long long
+foo_8qi (v8qi a)
+{
+  return a[0];
+}
+
+unsigned int
+foo_16hi (v16hi a)
+{
+  return a[0];
+}
+
+unsigned int
+foo_8hi (v8hi a)
+{
+  return a[0];
+}
+
+/* { dg-final { scan-assembler-times "umov\\t" 4 } } */
+/* { dg-final { scan-assembler-not "and\\t" } } */

Re: [PATCH] Make strlen range computations more conservative

2018-07-25 Thread Bernd Edlinger

On 07/24/18 16:50, Richard Biener wrote:
> On Tue, 24 Jul 2018, Bernd Edlinger wrote:
> 
>> Hi!
>>
>> This patch makes strlen range computations more conservative.
>>
>> Firstly if there is a visible type cast from type A to B before passing
>> then value to strlen, don't expect the type layout of B to restrict the
>> possible return value range of strlen.
>>
>> Furthermore use the outermost enclosing array instead of the
>> innermost one, because too aggressive optimization will likely
>> convert harmless errors into security-relevant errors, because
>> as the existing test cases demonstrate, this optimization is actively
>> attacking string length checks in user code, while and not giving
>> any warnings.
>>
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> I'd like us to be explicit in what we support, not what we do not
> support, thus
> 
> + /* Avoid arrays of pointers.  */
> + if (TREE_CODE (TREE_TYPE (arg)) == POINTER_TYPE)
> +   return false;
> 
> should become
> 
> /* We handle arrays of integer types.  */
>if (TREE_CODE (TRE_TYPE (arg)) != INTEGER_TYPE)
>  return false;
> 

Yes.  I think I can also check the TYPE_MODE/PRECISION.

> + tree base = arg;
> + while (TREE_CODE (base) == ARRAY_REF
> +|| TREE_CODE (base) == ARRAY_RANGE_REF
> +|| TREE_CODE (base) == COMPONENT_REF)
> +   base = TREE_OPERAND (base, 0);
> +
> + /* If this looks like a type cast don't assume anything.  */
> + if ((TREE_CODE (base) == MEM_REF
> +  && (! integer_zerop (TREE_OPERAND (base, 1))
> +  || TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 0)))
> + != TREE_TYPE (base)))
> + || TREE_CODE (base) == VIEW_CONVERT_EXPR)
>  return false;
> 
> likewise - you miss to handle BIT_FIELD_REF.  So, instead
> 

I did not expect to see BIT_FIELD_REF in the inner tree elements,
but you never know.  The new version handles them, and bails out
if they happen to be there.
Is this handling now how you wanted it to be?

>if (!(DECL_P (base)
>  || TREE_CODE (base) == STRING_CST
>  || (TREE_CODE (base) == MEM_REF
>  && ...> 
> you should look at comparing TYPE_MAIN_VARIANT in your type
> check, aligned/unaligned or const/non-const accesses shouldn't
> be considered a "type cast".  Maybe even use

Good point.  TYPE_MAIN_VARIANT is my friend.

> types_compatible_p.  Not sure why you enforce zero-offset MEMs?
> Do we in the end only handle  bases of MEMs?  Given you
> strip arbitrary COMPONENT_REFs the offset in a MEM isn't
> so much different?
> 

something like:
ma0[1].a5_7[0]
gets transformed into:
(const char *) &(ma0 + 64)->a5_7[0]

ma0 + 64 is a MEM_REF[, 64] I don't really think that happens
too often, but I think other weirdo-type casts can look quite
similar.  But that happens very rarely.


> It looks like the component-ref stripping plus type-check part
> could be factored out into sth like get_base_address?  I don't
> have a good name or suggested semantics for it though.
> 

Yes, done.

While playing with the now more rigorous type checking I noticed
something that is most likely a pre-existent programming error:

@@ -1310,8 +1350,8 @@ get_range_strlen (tree arg, tree length[2], bitmap
  member.  */
   tree idx = TREE_OPERAND (op, 1);

- arg = TREE_OPERAND (op, 0);
- tree optype = TREE_TYPE (arg);
+ op = TREE_OPERAND (op, 0);
+ tree optype = TREE_TYPE (op);
   if (tree dom = TYPE_DOMAIN (optype))
 if (tree bound = TYPE_MAX_VALUE (dom))
   if (TREE_CODE (bound) == INTEGER_CST

I believe this was not meant to change "arg".

This is in a block that is guarded by:

   /* We can end up with &(*iftmp_1)[0] here as well, so handle it.  */
   if (TREE_CODE (arg) == ADDR_EXPR
   && TREE_CODE (TREE_OPERAND (arg, 0)) == ARRAY_REF)

so this is entered with arg = _3_5_7[0][0][0].a5_7[4]
op = ma0_3_5_7[0][0][0].a5_7[4] at this point,
then arg is accidentally set to TREE_OPERAND (op, 0)
which now makes arg = ma0_3_5_7[0][0][0].a5_7 this did not pass the
type check in get_inner_char_array_unless_typecast, but more
importantly this is passed to val = c_strlen (arg, 1),
which will likely return the first array initializer instead of the
fifth.

I have also added an else here:

@@ -1400,8 +1432,7 @@ get_range_strlen (tree arg, tree length[2], bitmap
  the array could have zero length.  */
   *minlen = ssize_int (0);
 }
-
- if (VAR_P (arg))
+ else if (VAR_P (arg))
 {
   tree type = TREE_TYPE (arg);
   if (POINTER_TYPE_P (type))

because I noticed that the control flow can reach this if from the previous
if statement, but the

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-25 Thread Richard Earnshaw (lists)

On 25/07/18 11:36, Richard Biener wrote:
> On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
>  wrote:
>>
>> On 24/07/18 18:26, Richard Biener wrote:
>>> On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
>>>  wrote:


 This patch defines a new intrinsic function
 __builtin_speculation_safe_value.  A generic default implementation is
 defined which will attempt to use the backend pattern
 "speculation_safe_barrier".  If this pattern is not defined, or if it
 is not available, then the compiler will emit a warning, but
 compilation will continue.

 Note that the test spec-barrier-1.c will currently fail on all
 targets.  This is deliberate, the failure will go away when
 appropriate action is taken for each target backend.
>>>
>>> So given this series is supposed to be backported I question
>>>
>>> +rtx
>>> +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
>>> +   rtx result, rtx val,
>>> +   rtx failval ATTRIBUTE_UNUSED)
>>> +{
>>> +  emit_move_insn (result, val);
>>> +#ifdef HAVE_speculation_barrier
>>> +  /* Assume the target knows what it is doing: if it defines a
>>> + speculation barrier, but it is not enabled, then assume that one
>>> + isn't needed.  */
>>> +  if (HAVE_speculation_barrier)
>>> +emit_insn (gen_speculation_barrier ());
>>> +
>>> +#else
>>> +  warning_at (input_location, 0,
>>> + "this target does not define a speculation barrier; "
>>> + "your program will still execute correctly, but speculation "
>>> + "will not be inhibited");
>>> +#endif
>>> +  return result;
>>>
>>> which makes all but aarch64 archs warn on __bultin_speculation_safe_value
>>> uses, even those that do not suffer from Spectre like all those embedded 
>>> targets
>>> where implementations usually do not speculate at all.
>>>
>>> In fact for those targets the builtin stays in the way of optimization on 
>>> GIMPLE
>>> as well so we should fold it away early if neither the target hook is
>>> implemented
>>> nor there is a speculation_barrier insn.
>>>
>>> So, please make resolve_overloaded_builtin return a no-op on such targets
>>> which means you can remove the above warning.  Maybe such targets
>>> shouldn't advertise / initialize the builtins at all?
>>
>> I disagree with your approach here.  Why would users not want to know
>> when the compiler is failing to implement a security feature when it
>> should?  As for targets that don't need something, they can easily
>> define the hook as described to suppress the warning.
>>
>> Or are you just suggesting moving the warning to resolve overloaded builtin.
> 
> Well.  You could argue I say we shouldn't even support
> __builtin_sepeculation_safe_value
> for archs that do not need it or have it not implemented.  That way users can
> decide:
> 
> #if __HAVE_SPECULATION_SAFE_VALUE
>  
> #else
> #warning oops // or nothing
> #endif
> 

So how about removing the predefine of __HAVE_S_S_V when the builtin is
a nop, but then leaving the warning in if people try to use it anyway?

>> Other ports will need to take action, but in general, it can be as
>> simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
>> simpler still if nothing is needed for that architecture.
> 
> Then that should be the default.  You might argue we'll only see
> __builtin_speculation_safe_value uses for things like Firefox which
> is unlikely built for AVR (just to make an example).  But people
> are going to test build just on x86 and if they build with -Werror
> this will break builds on all targets that didn't even get the chance
> to implement this feature.
> 
>> There is a test which is intended to fail to targets that have not yet
>> been patched - I thought that was better than hard-failing the build,
>> especially given that we want to back-port.
>>
>> Port maintainers DO need to decide what to do about speculation, even if
>> it is explicitly that no mitigation is needed.
> 
> Agreed.  But I didn't yet see a request for maintainers to decide that?
> 

consider it made, then :-)

>>>
>>> The builtins also have no attributes which mean they are assumed to be
>>> 1) calling back into the CU via exported functions, 2) possibly throwing
>>> exceptions, 3) affecting memory state.  I think you at least want
>>> to use ATTR_NOTHROW_LEAF_LIST.
>>>
>>> The builtins are not designed to be optimization or memory barriers as
>>> far as I can see and should thus be CONST as well.
>>>
>>
>> I think they should be barriers.  They do need to ensure that they can't
>> be moved past other operations that might depend on the speculation
>> state.  Consider, for example,
> 
> That makes eliding them for targets that do not need mitigation even
> more important.
> 
>>  ...
>>  t = untrusted_value;
>>  ...
>>  if (t + 5 < limit)
>>  {
>>v = mem[__builtin_speculation_safe_value (untrusted_value)];
>>...
>>
>> The

optimize std::vector move assignment

2018-07-25 Thread Marc Glisse


Hello,

I talked about this last year 
(https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01063.html and thread), 
this tweaks std::vector move assignment to help gcc generate better code 
for it.


For this code

#include 
#include 
typedef std::vector V;
void f(V,V){a=std::move(b);}

with -O2 -fdump-tree-optimized on powerpc64le-unknown-linux-gnu, we 
currently have


  _5 = MEM[(int * &)a_3(D)];
  MEM[(int * &)a_3(D)] = 0B;
  MEM[(int * &)a_3(D) + 8] = 0B;
  MEM[(int * &)a_3(D) + 16] = 0B;
  _6 = MEM[(int * &)b_2(D)];
  MEM[(int * &)a_3(D)] = _6;
  MEM[(int * &)b_2(D)] = 0B;
  _7 = MEM[(int * &)a_3(D) + 8];
  _8 = MEM[(int * &)b_2(D) + 8];
  MEM[(int * &)a_3(D) + 8] = _8;
  MEM[(int * &)b_2(D) + 8] = _7;
  _9 = MEM[(int * &)a_3(D) + 16];
  _10 = MEM[(int * &)b_2(D) + 16];
  MEM[(int * &)a_3(D) + 16] = _10;
  MEM[(int * &)b_2(D) + 16] = _9;
  if (_5 != 0B)

with the patch, we go down to

  _3 = MEM[(const struct _Vector_impl_data &)a_4(D)]._M_start;
  _5 = MEM[(const struct _Vector_impl_data &)b_2(D)]._M_start;
  MEM[(struct _Vector_impl_data *)a_4(D)]._M_start = _5;
  _6 = MEM[(const struct _Vector_impl_data &)b_2(D)]._M_finish;
  MEM[(struct _Vector_impl_data *)a_4(D)]._M_finish = _6;
  _7 = MEM[(const struct _Vector_impl_data &)b_2(D)]._M_end_of_storage;
  MEM[(struct _Vector_impl_data *)a_4(D)]._M_end_of_storage = _7;
  MEM[(struct _Vector_impl_data *)b_2(D)]._M_start = 0B;
  MEM[(struct _Vector_impl_data *)b_2(D)]._M_finish = 0B;
  MEM[(struct _Vector_impl_data *)b_2(D)]._M_end_of_storage = 0B;
  if (_3 != 0B)

removing 2 reads and 3 writes. At -O3 we also get more vectorization.

The main idea is to give the compiler more type information. Currently, 
the only type information the compiler cares about is the type used for a 
memory read/write. Using std::swap as before, the reads/writes are done on 
int&. Doing it directly, they are done on _Vector_impl_data::_M_start, a 
more precise information. Maybe some day the compiler will get stricter 
and be able to deduce the same information, but not yet.


The second point is to rotate { new, old, tmp } in an order that's simpler 
for the compiler. I was going to remove the swaps and use _M_copy_data 
directly, but since doing the swaps in a different order works...


_M_copy_data is not really needed, we could add a defaulted assignment 
operator, or remove the move constructor (and call a new _M_clear() from 
the 2 users), etc. However, it seemed the least intrusive, the least 
likely to have weird consequences.


I didn't add a testcase because I don't see any dg-final scan-tree-dump in 
the libstdc++ testsuite. The closest would be g++.dg/pr83239.C, 
g++.dg/vect/pr64410.cc, g++.dg/vect/pr33426-ivdep-4.cc that include 
, but from previous experience (already with vector), adding 
libstdc++ testcases to the C++ testsuite is not very popular.


Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2018-07-25  Marc Glisse  

* include/bits/stl_vector.h (_Vector_impl_data::_M_copy_data): New.
(_Vector_impl_data::_M_swap_data): Use _M_copy_data.
(vector::_M_move_assign): Reorder the swaps.

--
Marc GlisseIndex: libstdc++-v3/include/bits/stl_vector.h
===
--- libstdc++-v3/include/bits/stl_vector.h	(revision 262948)
+++ libstdc++-v3/include/bits/stl_vector.h	(working copy)
@@ -96,25 +96,36 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	{ }
 
 #if __cplusplus >= 201103L
 	_Vector_impl_data(_Vector_impl_data&& __x) noexcept
 	: _M_start(__x._M_start), _M_finish(__x._M_finish),
 	  _M_end_of_storage(__x._M_end_of_storage)
 	{ __x._M_start = __x._M_finish = __x._M_end_of_storage = pointer(); }
 #endif
 
 	void
+	_M_copy_data(_Vector_impl_data const& __x) _GLIBCXX_NOEXCEPT
+	{
+	  _M_start = __x._M_start;
+	  _M_finish = __x._M_finish;
+	  _M_end_of_storage = __x._M_end_of_storage;
+	}
+
+	void
 	_M_swap_data(_Vector_impl_data& __x) _GLIBCXX_NOEXCEPT
 	{
-	  std::swap(_M_start, __x._M_start);
-	  std::swap(_M_finish, __x._M_finish);
-	  std::swap(_M_end_of_storage, __x._M_end_of_storage);
+	  // Do not use std::swap(_M_start, __x._M_start), etc as it looses
+	  // information used by TBAA.
+	  _Vector_impl_data __tmp;
+	  __tmp._M_copy_data(*this);
+	  _M_copy_data(__x);
+	  __x._M_copy_data(__tmp);
 	}
   };
 
   struct _Vector_impl
 	: public _Tp_alloc_type, public _Vector_impl_data
   {
 	_Vector_impl() _GLIBCXX_NOEXCEPT_IF( noexcept(_Tp_alloc_type()) )
 	: _Tp_alloc_type()
 	{ }
 
@@ -1724,22 +1735,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
 #if __cplusplus >= 201103L
 private:
   // Constant-time move assignment when source object's memory can be
   // moved, either because the source's allocator will move too
   // or because the allocators are equal.
   void
   _M_move_assign(vector&& __x, true_type) noexcept
   {
 	vector __tmp(get_allocator());
-	this->_M_impl._M_swap_data(__tmp._M_impl);
 	this->_M_impl._M_swap_data(__x._M_impl);
+

Re: [PATCH] Fix PR86654

2018-07-25 Thread Richard Biener

On Tue, 24 Jul 2018, Richard Biener wrote:

> 
> I am testing the following patch to avoid forcing DIEs for a type context
> for method clones late via limbo processing.  Instead hang them off
> comp_unit_die if there is no early DIE for the function.
> 
> One question is whether the comment "If we're a nested function"
> matches up with the decl_function_context (decl) check or whether
> we really wanted to check DECL_CONTEXT (decl) == FUNCTION_DECL
> which would have made this particular case not match (DECL_CONTEXT
> is a RECORD_TYPE which context is a FUNCTION_DECL).
> 
> Another option is of course to make clones not inherit
> DECL_CONTEXT from the cloned function (for an artificial
> instance the "context" is somewhat arbitrary since it isn't
> going to be found by lookup).  In fact I long wanted to
> represent clones as artificial containers for an
> inline instance of the cloned function...
> 
> LTO bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> If that succeeds I'm going to apply it as a stop-gap measure to
> make Firefox build again with LTO but I'm open to revising it.

So there was fallout, namely

FAIL: g++.dg/debug/dwarf2/lambda1.C  -std=gnu++11  scan-assembler-times 
DW_TAG_variable[^.]*.ascii "this.0" 2
FAIL: g++.dg/debug/dwarf2/lambda1.C  -std=gnu++14  scan-assembler-times 
DW_TAG_variable[^.]*.ascii "this.0" 2

see the PR for analysis.  The following revised patch mitigates this
by relying not on !context_die but on an explicit local_scope_p
for fulfilling the last part of the overall comment of this section:

 to have the same parent.  For local class methods, this doesn't
 apply; we just use the old DIE.  */

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, bootstrapped
and tested on x86_64-unknown-linux-gnu.

I have applied the patch for now, as said, I'm happy to revise later.

Richard.

2018-07-24  Richard Biener  

PR debug/86654
* dwarf2out.c (dwarf2out_decl): Do not handle nested functions
special wrt context_die late.
(gen_subprogram_die): Re-use DIEs in local scope.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 262940)
+++ gcc/dwarf2out.c (working copy)
@@ -22766,6 +22766,7 @@ gen_subprogram_die (tree decl, dw_die_re
 */
|| (old_die->die_parent
&& old_die->die_parent->die_tag == DW_TAG_module)
+   || local_scope_p (old_die->die_parent)
|| context_die == NULL)
   && (DECL_ARTIFICIAL (decl)
   || (get_AT_file (old_die, DW_AT_decl_file) == file_index
@@ -26702,8 +26703,11 @@ dwarf2out_decl (tree decl)
 case FUNCTION_DECL:
   /* If we're a nested function, initially use a parent of NULL; if we're
 a plain function, this will be fixed up in decls_for_scope.  If
-we're a method, it will be ignored, since we already have a DIE.  */
-  if (decl_function_context (decl)
+we're a method, it will be ignored, since we already have a DIE.
+Avoid doing this late though since clones of class methods may
+otherwise end up in limbo and create type DIEs late.  */
+  if (early_dwarf
+ && decl_function_context (decl)
  /* But if we're in terse mode, we don't care about scope.  */
  && debug_info_level > DINFO_LEVEL_TERSE)
context_die = NULL;

[PATCH][OBVIOUS] Fix wrong declaration.

2018-07-25 Thread Martin Liška

Hi.

It's obvious fix of return type. It's the same what's
in gcc/config/rs6000/rs6000.h.

Martin

gcc/ChangeLog:

2018-07-25  Martin Liska  

* config/powerpcspe/powerpcspe-protos.h (rs6000_loop_align): Fix
return type.
---
 gcc/config/powerpcspe/powerpcspe-protos.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/config/powerpcspe/powerpcspe-protos.h b/gcc/config/powerpcspe/powerpcspe-protos.h
index ff03ac0d1f5..8ff9f4b0df3 100644
--- a/gcc/config/powerpcspe/powerpcspe-protos.h
+++ b/gcc/config/powerpcspe/powerpcspe-protos.h
@@ -162,7 +162,7 @@ extern rtx rs6000_machopic_legitimize_pic_address (rtx, machine_mode,
 extern rtx rs6000_address_for_fpconvert (rtx);
 extern rtx rs6000_address_for_altivec (rtx);
 extern rtx rs6000_allocate_stack_temp (machine_mode, bool, bool);
-extern int rs6000_loop_align (rtx);
+extern align_flags rs6000_loop_align (rtx);
 extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool);
 #endif /* RTX_CODE */

Re: [36/46] Add a pattern_stmt_p field to stmt_vec_info

2018-07-25 Thread Richard Biener

On Wed, Jul 25, 2018 at 1:09 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Jul 24, 2018 at 12:07 PM Richard Sandiford
> >  wrote:
> >>
> >> This patch adds a pattern_stmt_p field to stmt_vec_info, so that it's
> >> possible to tell whether the statement is a pattern statement without
> >> referring to other statements.  The new field goes in what was
> >> previously a hole in the structure, so the size is the same as before.
> >
> > Not sure what the advantage is?  is_pattern_stmt_p () looks nicer
> > than ->is_pattern_p
>
> I can keep the function wrapper if you prefer that.  But having a
> statement "know" whether it's a pattern stmt makes things like
> freeing stmt_vec_infos simpler (see later patches in the series).

Ah, ok.

> It should also be cheaper to test, but that's much more minor.

So please keep the wrapper.

I guess at some point we should decide what to do with all
the STMT_VINFO_ macros (and the others, {LOOP,BB}_ stuff
is already used inconsistently).

Richard.

> Thanks,
> Richard
>
> >
> >>
> >> 2018-07-24  Richard Sandiford  
> >>
> >> gcc/
> >> * tree-vectorizer.h (_stmt_vec_info::pattern_stmt_p): New field.
> >> (is_pattern_stmt_p): Delete.
> >> * tree-vect-patterns.c (vect_init_pattern_stmt): Set pattern_stmt_p
> >> on pattern statements.
> >> (vect_split_statement, vect_mark_pattern_stmts): Use the new
> >> pattern_stmt_p field instead of is_pattern_stmt_p.
> >> * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Likewise.
> >> * tree-vect-loop.c (vectorizable_live_operation): Likewise.
> >> * tree-vect-slp.c (vect_build_slp_tree_2): Likewise.
> >> (vect_find_last_scalar_stmt_in_slp, vect_remove_slp_scalar_calls)
> >> (vect_schedule_slp): Likewise.
> >> * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
> >> (vectorizable_call, vectorizable_simd_clone_call, 
> >> vectorizable_shift)
> >> (vectorizable_store, vect_remove_stores): Likewise.
> >>
> >> Index: gcc/tree-vectorizer.h
> >> ===
> >> --- gcc/tree-vectorizer.h   2018-07-24 10:23:56.440544995 +0100
> >> +++ gcc/tree-vectorizer.h   2018-07-24 10:24:02.364492386 +0100
> >> @@ -791,6 +791,12 @@ struct _stmt_vec_info {
> >>/* Stmt is part of some pattern (computation idiom)  */
> >>bool in_pattern_p;
> >>
> >> +  /* True if the statement was created during pattern recognition as
> >> + part of the replacement for RELATED_STMT.  This implies that the
> >> + statement isn't part of any basic block, although for convenience
> >> + its gimple_bb is the same as for RELATED_STMT.  */
> >> +  bool pattern_stmt_p;
> >> +
> >>/* Is this statement vectorizable or should it be skipped in (partial)
> >>   vectorization.  */
> >>bool vectorizable;
> >> @@ -1151,16 +1157,6 @@ get_later_stmt (stmt_vec_info stmt1_info
> >>  return stmt2_info;
> >>  }
> >>
> >> -/* Return TRUE if a statement represented by STMT_INFO is a part of a
> >> -   pattern.  */
> >> -
> >> -static inline bool
> >> -is_pattern_stmt_p (stmt_vec_info stmt_info)
> >> -{
> >> -  stmt_vec_info related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
> >> -  return related_stmt_info && STMT_VINFO_IN_PATTERN_P (related_stmt_info);
> >> -}
> >> -
> >>  /* Return true if BB is a loop header.  */
> >>
> >>  static inline bool
> >> Index: gcc/tree-vect-patterns.c
> >> ===
> >> --- gcc/tree-vect-patterns.c2018-07-24 10:23:59.408518638 +0100
> >> +++ gcc/tree-vect-patterns.c2018-07-24 10:24:02.360492422 +0100
> >> @@ -108,6 +108,7 @@ vect_init_pattern_stmt (gimple *pattern_
> >>  pattern_stmt_info = orig_stmt_info->vinfo->add_stmt (pattern_stmt);
> >>gimple_set_bb (pattern_stmt, gimple_bb (orig_stmt_info->stmt));
> >>
> >> +  pattern_stmt_info->pattern_stmt_p = true;
> >>STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
> >>STMT_VINFO_DEF_TYPE (pattern_stmt_info)
> >>  = STMT_VINFO_DEF_TYPE (orig_stmt_info);
> >> @@ -630,7 +631,7 @@ vect_recog_temp_ssa_var (tree type, gimp
> >>  vect_split_statement (stmt_vec_info stmt2_info, tree new_rhs,
> >>   gimple *stmt1, tree vectype)
> >>  {
> >> -  if (is_pattern_stmt_p (stmt2_info))
> >> +  if (stmt2_info->pattern_stmt_p)
> >>  {
> >>/* STMT2_INFO is part of a pattern.  Get the statement to which
> >>  the pattern is attached.  */
> >> @@ -4726,7 +4727,7 @@ vect_mark_pattern_stmts (stmt_vec_info o
> >>gimple *def_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
> >>
> >>gimple *orig_pattern_stmt = NULL;
> >> -  if (is_pattern_stmt_p (orig_stmt_info))
> >> +  if (orig_stmt_info->pattern_stmt_p)
> >>  {
> >>/* We're replacing a statement in an existing pattern definition
> >>  sequence.  */
> >> Index:

Re: [PATCH, debug] Add fkeep-vars-live

2018-07-25 Thread Richard Biener

On Wed, Jul 25, 2018 at 1:41 PM Richard Biener  wrote:
>
> On Tue, 24 Jul 2018, Tom de Vries wrote:
>
> > On Tue, Jul 24, 2018 at 02:34:26PM +0200, Tom de Vries wrote:
> > > On 07/24/2018 01:46 PM, Jakub Jelinek wrote:
> > > > On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
> > > >> Another drawback is that the fake uses confuse the unitialized warning
> > > >> analysis, so that is switched off for -fkeep-vars-live.
> > > >
> > > > Is that really needed?  I.e. can't you for the purpose of uninitialized
> > > > warning analysis ignore the clobber = var uses?
> > > >
> > >
> > > This seems to work on the test-case that failed during testing
> > > (g++.dg/uninit-pred-4.C):
> > > ...
> > > diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
> > > index 77f090bfa80..953db9ed02d 100644
> > > --- a/gcc/tree-ssa-uninit.c
> > > +++ b/gcc/tree-ssa-uninit.c
> > > @@ -132,6 +132,9 @@ warn_uninit (enum opt_code wc, tree t, tree expr,
> > > tree var,
> > >if (is_gimple_assign (context)
> > >&& gimple_assign_rhs_code (context) == COMPLEX_EXPR)
> > >  return;
> > > +  if (gimple_assign_single_p (context)
> > > +  && TREE_CLOBBER_P (gimple_assign_lhs (context)))
> > > +return;
> > >if (!has_undefined_value_p (t))
> > >  return;
> > > ...
> > > But I don't know the pass well enough to know whether this is a
> > > sufficient fix.
> > >
> >
> > Updated and re-tested patch.
> >
> > > +Add artificial use for each local variable at the end of the
> > > declaration scope
> >
> > Is this a better option description?
> >
> >
> > OK for trunk?
> >
> > Thanks,
> > - Tom
> >
> > [debug] Add fkeep-vars-live
> >
> > This patch adds fake uses of user variables at the point where they go out 
> > of
> > scope, to keep user variables inspectable throughout the application.
> >
> > This approach will generate sub-optimal code: in some cases, the executable
> > code will go through efforts to keep a var alive, while var-tracking can
> > easily compute the value of the var from something else.
> >
> > Also, the compiler treats the fake use as any other use, so it may keep an
> > expensive resource like a register occupied (if we could mark the use as a
> > cold use or some such, we could tell optimizers that we need the value, but
> > it's ok if getting the value is expensive, so it could be spilled instead of
> > occupying a register).
> >
> > The current implementation is expected to increase register pressure, and
> > therefore spilling, but we'd still expect less memory accesses then with O0.
>
> Few comments inline.
>
> > 2018-07-24  Tom de Vries  
> >
> >   PR debug/78685
> >   * cfgexpand.c (expand_use): New function.
> >   (expand_gimple_stmt_1): Handle TREE_CLOBBER_P as lhs of assignment.
> >   * common.opt (fkeep-vars-live): New option.
> >   * function.c (instantiate_virtual_regs): Instantiate in USEs as well.
> >   * gimplify.c (gimple_build_uses): New function.
> >   (gimplify_bind_expr): Build clobber uses for variables that don't have
> >   to be in memory.
> >   (gimplify_body): Build clobber uses for arguments.
> >   * tree-cfg.c (verify_gimple_assign_single): Handle TREE_CLOBBER_P as 
> > lhs
> >   of assignment.
> >   * tree-sra.c (sra_modify_assign): Same.
> >   * tree-ssa-alias.c (refs_may_alias_p_1): Same.
> >   * tree-ssa-structalias.c (find_func_aliases): Same.
> >   * tree-ssa-uninit.c (warn_uninit): Same.
> >
> >   * gcc.dg/guality/pr54200-2.c: Update.
> >
> > ---
> >  gcc/cfgexpand.c  | 11 
> >  gcc/common.opt   |  4 +++
> >  gcc/function.c   |  5 ++--
> >  gcc/gimplify.c   | 46 
> > +++-
> >  gcc/testsuite/gcc.dg/guality/pr54200-2.c |  3 +--
> >  gcc/tree-cfg.c   |  1 +
> >  gcc/tree-sra.c   |  7 +
> >  gcc/tree-ssa-alias.c |  4 +++
> >  gcc/tree-ssa-structalias.c   |  3 ++-
> >  gcc/tree-ssa-uninit.c|  3 +++
> >  10 files changed, 76 insertions(+), 11 deletions(-)
> >
> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> > index d6e3c382085..e28e8ceec75 100644
> > --- a/gcc/cfgexpand.c
> > +++ b/gcc/cfgexpand.c
> > @@ -3533,6 +3533,15 @@ expand_clobber (tree lhs)
> >  }
> >  }
> >
> > +/* Expand a use of RHS.  */
> > +
> > +static void
> > +expand_use (tree rhs)
> > +{
> > +  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> > +  emit_use (target);
> > +}
> > +
> >  /* A subroutine of expand_gimple_stmt, expanding one gimple statement
> > STMT that doesn't require special handling for outgoing edges.  That
> > is no tailcalls and no GIMPLE_COND.  */
> > @@ -3632,6 +3641,8 @@ expand_gimple_stmt_1 (gimple *stmt)
> > /* This is a clobber to mark the going out of scope for
> >this LHS.  */
> >

Re: [PATCH, debug] Add fkeep-vars-live

2018-07-25 Thread Richard Biener

On Tue, 24 Jul 2018, Tom de Vries wrote:

> On Tue, Jul 24, 2018 at 02:34:26PM +0200, Tom de Vries wrote:
> > On 07/24/2018 01:46 PM, Jakub Jelinek wrote:
> > > On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
> > >> Another drawback is that the fake uses confuse the unitialized warning
> > >> analysis, so that is switched off for -fkeep-vars-live.
> > > 
> > > Is that really needed?  I.e. can't you for the purpose of uninitialized
> > > warning analysis ignore the clobber = var uses?
> > > 
> > 
> > This seems to work on the test-case that failed during testing
> > (g++.dg/uninit-pred-4.C):
> > ...
> > diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
> > index 77f090bfa80..953db9ed02d 100644
> > --- a/gcc/tree-ssa-uninit.c
> > +++ b/gcc/tree-ssa-uninit.c
> > @@ -132,6 +132,9 @@ warn_uninit (enum opt_code wc, tree t, tree expr,
> > tree var,
> >if (is_gimple_assign (context)
> >&& gimple_assign_rhs_code (context) == COMPLEX_EXPR)
> >  return;
> > +  if (gimple_assign_single_p (context)
> > +  && TREE_CLOBBER_P (gimple_assign_lhs (context)))
> > +return;
> >if (!has_undefined_value_p (t))
> >  return;
> > ...
> > But I don't know the pass well enough to know whether this is a
> > sufficient fix.
> > 
> 
> Updated and re-tested patch.
> 
> > +Add artificial use for each local variable at the end of the
> > declaration scope
> 
> Is this a better option description?
> 
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 
> [debug] Add fkeep-vars-live
> 
> This patch adds fake uses of user variables at the point where they go out of
> scope, to keep user variables inspectable throughout the application.
> 
> This approach will generate sub-optimal code: in some cases, the executable
> code will go through efforts to keep a var alive, while var-tracking can
> easily compute the value of the var from something else.
> 
> Also, the compiler treats the fake use as any other use, so it may keep an
> expensive resource like a register occupied (if we could mark the use as a
> cold use or some such, we could tell optimizers that we need the value, but
> it's ok if getting the value is expensive, so it could be spilled instead of
> occupying a register).
> 
> The current implementation is expected to increase register pressure, and
> therefore spilling, but we'd still expect less memory accesses then with O0.

Few comments inline.

> 2018-07-24  Tom de Vries  
> 
>   PR debug/78685
>   * cfgexpand.c (expand_use): New function.
>   (expand_gimple_stmt_1): Handle TREE_CLOBBER_P as lhs of assignment.
>   * common.opt (fkeep-vars-live): New option.
>   * function.c (instantiate_virtual_regs): Instantiate in USEs as well.
>   * gimplify.c (gimple_build_uses): New function.
>   (gimplify_bind_expr): Build clobber uses for variables that don't have
>   to be in memory.
>   (gimplify_body): Build clobber uses for arguments.
>   * tree-cfg.c (verify_gimple_assign_single): Handle TREE_CLOBBER_P as lhs
>   of assignment.
>   * tree-sra.c (sra_modify_assign): Same.
>   * tree-ssa-alias.c (refs_may_alias_p_1): Same.
>   * tree-ssa-structalias.c (find_func_aliases): Same.
>   * tree-ssa-uninit.c (warn_uninit): Same.
> 
>   * gcc.dg/guality/pr54200-2.c: Update.
> 
> ---
>  gcc/cfgexpand.c  | 11 
>  gcc/common.opt   |  4 +++
>  gcc/function.c   |  5 ++--
>  gcc/gimplify.c   | 46 
> +++-
>  gcc/testsuite/gcc.dg/guality/pr54200-2.c |  3 +--
>  gcc/tree-cfg.c   |  1 +
>  gcc/tree-sra.c   |  7 +
>  gcc/tree-ssa-alias.c |  4 +++
>  gcc/tree-ssa-structalias.c   |  3 ++-
>  gcc/tree-ssa-uninit.c|  3 +++
>  10 files changed, 76 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index d6e3c382085..e28e8ceec75 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -3533,6 +3533,15 @@ expand_clobber (tree lhs)
>  }
>  }
>  
> +/* Expand a use of RHS.  */
> +
> +static void
> +expand_use (tree rhs)
> +{
> +  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  emit_use (target);
> +}
> +
>  /* A subroutine of expand_gimple_stmt, expanding one gimple statement
> STMT that doesn't require special handling for outgoing edges.  That
> is no tailcalls and no GIMPLE_COND.  */
> @@ -3632,6 +3641,8 @@ expand_gimple_stmt_1 (gimple *stmt)
> /* This is a clobber to mark the going out of scope for
>this LHS.  */
> expand_clobber (lhs);
> + else if (TREE_CLOBBER_P (lhs))
> +   expand_use (rhs);
>   else
> expand_assignment (lhs, rhs,
>gimple_assign_nontemporal_move_p (
> diff --git a/gcc/common.opt b/gcc/common.opt
> index

Re: [38/46] Pass stmt_vec_infos instead of data_references where relevant

2018-07-25 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
>  wrote:
>>
>> This patch makes various routines (mostly in tree-vect-data-refs.c)
>> take stmt_vec_infos rather than data_references.  The affected routines
>> are really dealing with the way that an access is going to vectorised
>> for a particular stmt_vec_info, rather than with the original scalar
>> access described by the data_reference.
>
> Similar.  Doesn't it make more sense to pass both stmt_info and DR to
> the functions?

Not sure.  If we...

> We currently cannot handle aggregate copies in the to-be-vectorized IL
> but rely on SRA and friends to elide those.  That's the only two-DR
> stmt I can think of for vectorization.  Maybe aggregate by-value / return
> function calls with OMP SIMD if that supports this somehow.

...did this then I don't think a data_refrence would be the natural
way of identifying a DR within a stmt_vec_info.  Presumably the
stmt_vec_info would need multiple STMT_VINFO_DATA_REFS and dr_auxs.
If both of those were vectors then a (stmt_vec_info, index) pair
might make more sense than (stmt_vec_info, data_reference).

Alternatively we could move STMT_VINFO_DATA_REF into dataref_aux,
so that there's a back-pointer to the DR, add a stmt_vec_info
field to dataref_aux too, and then use dataref_aux instead of
stmt_vec_info as the key.

Thanks,
Richard

>
> Richard.
>
>>
>> 2018-07-24  Richard Sandiford  
>>
>> gcc/
>> * tree-vectorizer.h (vect_supportable_dr_alignment): Take
>> a stmt_vec_info rather than a data_reference.
>> * tree-vect-data-refs.c (vect_calculate_target_alignment)
>> (vect_compute_data_ref_alignment, vect_update_misalignment_for_peel)
>> (verify_data_ref_alignment, vector_alignment_reachable_p)
>> (vect_get_data_access_cost, vect_get_peeling_costs_all_drs)
>> (vect_peeling_supportable, vect_analyze_group_access_1)
>> (vect_analyze_group_access, vect_analyze_data_ref_access)
>> (vect_vfa_segment_size, vect_vfa_access_size, vect_small_gap_p)
>> (vectorizable_with_step_bound_p, vect_duplicate_ssa_name_ptr_info)
>> (vect_supportable_dr_alignment): Likewise.  Update calls to other
>> functions for which the same change is being made.
>> (vect_verify_datarefs_alignment, vect_find_same_alignment_drs)
>> (vect_analyze_data_refs_alignment): Update calls accordingly.
>> (vect_slp_analyze_and_verify_node_alignment): Likewise.
>> (vect_analyze_data_ref_accesses): Likewise.
>> (vect_prune_runtime_alias_test_list): Likewise.
>> (vect_create_addr_base_for_vector_ref): Likewise.
>> (vect_create_data_ref_ptr): Likewise.
>> (_vect_peel_info::dr): Replace with...
>> (_vect_peel_info::stmt_info): ...this new field.
>> (vect_peeling_hash_get_most_frequent): Update _vect_peel_info uses
>> accordingly, and update after above interface changes.
>> (vect_peeling_hash_get_lowest_cost): Likewise
>> (vect_peeling_hash_choose_best_peeling): Likewise.
>> (vect_enhance_data_refs_alignment): Likewise.
>> (vect_peeling_hash_insert): Likewise.  Take a stmt_vec_info
>> rather than a data_reference.
>> * tree-vect-stmts.c (vect_get_store_cost, vect_get_load_cost)
>> (get_negative_load_store_type): Update calls to
>> vect_supportable_dr_alignment.
>> (vect_get_data_ptr_increment, ensure_base_align): Take a
>> stmt_vec_info instead of a data_reference.
>> (vectorizable_store, vectorizable_load): Update calls after
>> above interface changes.
>>
>> Index: gcc/tree-vectorizer.h
>> ===
>> --- gcc/tree-vectorizer.h   2018-07-24 10:24:05.744462369 +0100
>> +++ gcc/tree-vectorizer.h   2018-07-24 10:24:08.924434128 +0100
>> @@ -1541,7 +1541,7 @@ extern tree vect_get_mask_type_for_stmt
>>  /* In tree-vect-data-refs.c.  */
>>  extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
>>  extern enum dr_alignment_support vect_supportable_dr_alignment
>> -   (struct data_reference *, bool);
>> +  (stmt_vec_info, bool);
>>  extern tree vect_get_smallest_scalar_type (stmt_vec_info, HOST_WIDE_INT *,
>> HOST_WIDE_INT *);
>> extern bool vect_analyze_data_ref_dependences (loop_vec_info, unsigned
> int *);
>> Index: gcc/tree-vect-data-refs.c
>> ===
>> --- gcc/tree-vect-data-refs.c   2018-07-24 10:24:05.740462405 +0100
>> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:24:08.924434128 +0100
>> @@ -858,19 +858,19 @@ vect_record_base_alignments (vec_info *v
>>  }
>>  }
>>
>> -/* Return the target alignment for the vectorized form of DR.  */
>> +/* Return the target alignment for the vectorized form of the load or store
>> +   in

Re: [36/46] Add a pattern_stmt_p field to stmt_vec_info

2018-07-25 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Jul 24, 2018 at 12:07 PM Richard Sandiford
>  wrote:
>>
>> This patch adds a pattern_stmt_p field to stmt_vec_info, so that it's
>> possible to tell whether the statement is a pattern statement without
>> referring to other statements.  The new field goes in what was
>> previously a hole in the structure, so the size is the same as before.
>
> Not sure what the advantage is?  is_pattern_stmt_p () looks nicer
> than ->is_pattern_p

I can keep the function wrapper if you prefer that.  But having a
statement "know" whether it's a pattern stmt makes things like
freeing stmt_vec_infos simpler (see later patches in the series).
It should also be cheaper to test, but that's much more minor.

Thanks,
Richard

>
>>
>> 2018-07-24  Richard Sandiford  
>>
>> gcc/
>> * tree-vectorizer.h (_stmt_vec_info::pattern_stmt_p): New field.
>> (is_pattern_stmt_p): Delete.
>> * tree-vect-patterns.c (vect_init_pattern_stmt): Set pattern_stmt_p
>> on pattern statements.
>> (vect_split_statement, vect_mark_pattern_stmts): Use the new
>> pattern_stmt_p field instead of is_pattern_stmt_p.
>> * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Likewise.
>> * tree-vect-loop.c (vectorizable_live_operation): Likewise.
>> * tree-vect-slp.c (vect_build_slp_tree_2): Likewise.
>> (vect_find_last_scalar_stmt_in_slp, vect_remove_slp_scalar_calls)
>> (vect_schedule_slp): Likewise.
>> * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
>> (vectorizable_call, vectorizable_simd_clone_call, vectorizable_shift)
>> (vectorizable_store, vect_remove_stores): Likewise.
>>
>> Index: gcc/tree-vectorizer.h
>> ===
>> --- gcc/tree-vectorizer.h   2018-07-24 10:23:56.440544995 +0100
>> +++ gcc/tree-vectorizer.h   2018-07-24 10:24:02.364492386 +0100
>> @@ -791,6 +791,12 @@ struct _stmt_vec_info {
>>/* Stmt is part of some pattern (computation idiom)  */
>>bool in_pattern_p;
>>
>> +  /* True if the statement was created during pattern recognition as
>> + part of the replacement for RELATED_STMT.  This implies that the
>> + statement isn't part of any basic block, although for convenience
>> + its gimple_bb is the same as for RELATED_STMT.  */
>> +  bool pattern_stmt_p;
>> +
>>/* Is this statement vectorizable or should it be skipped in (partial)
>>   vectorization.  */
>>bool vectorizable;
>> @@ -1151,16 +1157,6 @@ get_later_stmt (stmt_vec_info stmt1_info
>>  return stmt2_info;
>>  }
>>
>> -/* Return TRUE if a statement represented by STMT_INFO is a part of a
>> -   pattern.  */
>> -
>> -static inline bool
>> -is_pattern_stmt_p (stmt_vec_info stmt_info)
>> -{
>> -  stmt_vec_info related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
>> -  return related_stmt_info && STMT_VINFO_IN_PATTERN_P (related_stmt_info);
>> -}
>> -
>>  /* Return true if BB is a loop header.  */
>>
>>  static inline bool
>> Index: gcc/tree-vect-patterns.c
>> ===
>> --- gcc/tree-vect-patterns.c2018-07-24 10:23:59.408518638 +0100
>> +++ gcc/tree-vect-patterns.c2018-07-24 10:24:02.360492422 +0100
>> @@ -108,6 +108,7 @@ vect_init_pattern_stmt (gimple *pattern_
>>  pattern_stmt_info = orig_stmt_info->vinfo->add_stmt (pattern_stmt);
>>gimple_set_bb (pattern_stmt, gimple_bb (orig_stmt_info->stmt));
>>
>> +  pattern_stmt_info->pattern_stmt_p = true;
>>STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
>>STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>>  = STMT_VINFO_DEF_TYPE (orig_stmt_info);
>> @@ -630,7 +631,7 @@ vect_recog_temp_ssa_var (tree type, gimp
>>  vect_split_statement (stmt_vec_info stmt2_info, tree new_rhs,
>>   gimple *stmt1, tree vectype)
>>  {
>> -  if (is_pattern_stmt_p (stmt2_info))
>> +  if (stmt2_info->pattern_stmt_p)
>>  {
>>/* STMT2_INFO is part of a pattern.  Get the statement to which
>>  the pattern is attached.  */
>> @@ -4726,7 +4727,7 @@ vect_mark_pattern_stmts (stmt_vec_info o
>>gimple *def_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
>>
>>gimple *orig_pattern_stmt = NULL;
>> -  if (is_pattern_stmt_p (orig_stmt_info))
>> +  if (orig_stmt_info->pattern_stmt_p)
>>  {
>>/* We're replacing a statement in an existing pattern definition
>>  sequence.  */
>> Index: gcc/tree-vect-data-refs.c
>> ===
>> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:53.204573732 +0100
>> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:24:02.356492457 +0100
>> @@ -212,9 +212,9 @@ vect_preserves_scalar_order_p (stmt_vec_
>>   (but could happen later) while reads will happen no later than their
>>   current position (but could happen earlier).  Reordering is therefore
>>   only possible

RE: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at least 8 bytes when alloca and stack-clash. [Patch (3/6)]

2018-07-25 Thread Tamar Christina

Hi All,

Attached an updated patch which documents what the test cases are expecting as 
requested.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-25  Tamar Christina  

PR target/86486
* config/aarch64/aarch64.h (STACK_CLASH_OUTGOING_ARGS,
STACK_DYNAMIC_OFFSET): New.
* config/aarch64/aarch64.c (aarch64_layout_frame):
Update outgoing args size.
(aarch64_stack_clash_protection_alloca_probe_range,
TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New.

gcc/testsuite/
2018-07-25  Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-alloca-1.c: New.
* gcc.target/aarch64/stack-check-alloca-10.c: New.
* gcc.target/aarch64/stack-check-alloca-2.c: New.
* gcc.target/aarch64/stack-check-alloca-3.c: New.
* gcc.target/aarch64/stack-check-alloca-4.c: New.
* gcc.target/aarch64/stack-check-alloca-5.c: New.
* gcc.target/aarch64/stack-check-alloca-6.c: New.
* gcc.target/aarch64/stack-check-alloca-7.c: New.
* gcc.target/aarch64/stack-check-alloca-8.c: New.
* gcc.target/aarch64/stack-check-alloca-9.c: New.
* gcc.target/aarch64/stack-check-alloca.h: New.
* gcc.target/aarch64/stack-check-14.c: New.
* gcc.target/aarch64/stack-check-15.c: New.

> -Original Message-
> From: Tamar Christina
> Sent: Friday, July 13, 2018 17:36
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: RE: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at
> least 8 bytes when alloca and stack-clash. [Patch (3/6)]
> 
> Hi All,
> 
> I'm sending an updated patch which updates a testcase that  hits one of our
> corner cases.
> This is an assembler scan only update in a testcase.
> 
> Regards,
> Tamar
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, July 11, 2018 12:21
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; James Greenhalgh ;
> > Richard Earnshaw ; Marcus Shawcroft
> > 
> > Subject: [PATCH][GCC][AArch64] Ensure that outgoing argument size is
> > at least 8 bytes when alloca and stack-clash. [Patch (3/6)]
> >
> > Hi All,
> >
> > This patch adds a requirement that the number of outgoing arguments
> > for a function is at least 8 bytes when using stack-clash protection.
> >
> > By using this condition we can avoid a check in the alloca code and so
> > have smaller and simpler code there.
> >
> > A simplified version of the AArch64 stack frames is:
> >
> >+---+
> >|   |
> >|   |
> >|   |
> >+---+
> >|LR |
> >+---+
> >|FP |
> >+---+
> >|dynamic allocations|   expanding area which will push the 
> > outgoing
> >+---+   args down during each allocation.
> >|padding|
> >+---+
> >|outgoing stack args|  safety buffer of 8 bytes (aligned)
> >+---+
> >
> > By always defining an outgoing argument, alloca(0) effectively is safe
> > to probe at $sp due to the reserved buffer being there.  It will never
> > corrupt the stack.
> >
> > This is also safe for alloca(x) where x is 0 or x % page_size == 0.
> > In the former it is the same case as alloca(0) while the latter is
> > safe because any allocation pushes the outgoing stack args down:
> >
> >|FP |
> >+---+
> >|   |
> >|dynamic allocations|   alloca (x)
> >|   |
> >+---+
> >|padding|
> >+---+
> >|outgoing stack args|  safety buffer of 8 bytes (aligned)
> >+---+
> >
> > Which means when you probe for the residual, if it's 0 you'll again
> > just probe in the outgoing stack args range, which we know is non-zero (at
> least 8 bytes).
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > Target was tested with stack clash on and off by default.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-07-11  Tamar Christina  
> >
> > PR target/86486
> > * config/aarch64/aarch64.h (STACK_CLASH_OUTGOING_ARGS,
> > STACK_DYNAMIC_OFFSET): New.
> > * config/aarch64/aarch64.c (aarch64_layout_frame):
> > Update outgoing args size.
> > (aarch64_stack_clash_protection_alloca_probe_range,
> > TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE):
> > New.
> >
> > gcc/testsuite/
> > 2018-07-11  Tamar Christina  
> >
> > PR target/86486
> > * gcc.target/aarch64/stack-check-alloca-1.c: New.
> > * gcc.target/aarch64/stack-check-alloca-10.c: New.
> > * gcc.target/aarch64/stack-check-alloca-2.c: New.
> > *

RE: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-25 Thread Tamar Christina

HI Alexandre,

Thanks for the review. Attached is the updated patch and new changelog below:

Thanks,
Tamar

gcc/
2018-07-25  Tamar Christina  

PR target/86486
* configure.ac: Add stack-clash-protection-guard-size.
* doc/install.texi: Document it.
* config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
* params.def: Update comment for guard-size.
(PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE,
PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Update description.
* configure: Regenerate.

> -Original Message-
> From: Alexandre Oliva 
> Sent: Tuesday, July 24, 2018 20:24
> To: Tamar Christina 
> Cc: Joseph Myers ; Jeff Law
> ; gcc-patches@gcc.gnu.org; nd ;
> bonz...@gnu.org; d...@redhat.com; nero...@gcc.gnu.org;
> ralf.wildenh...@gmx.de
> Subject: Re: [PATCH][GCC][front-end][build-machinery][opt-framework]
> Allow setting of stack-clash via configure options. [Patch (4/6)]
> 
> Hello, Christina,
> 
> On Jul 24, 2018, Tamar Christina  wrote:
> 
> > gcc/
> > 2018-07-24  Tamar Christina  
> 
> > PR target/86486
> > * configure.ac: Add stack-clash-protection-guard-size.
> > * doc/install.texi: Document it.
> > * config.in (DEFAULT_STK_CLASH_GUARD_SIZE): New.
> > * params.def: Update comment for guard-size.
> > * configure: Regenerate.
> 
> The configury bits look almost good to me.
> 
> I wish the help message, comments and docs expressed somehow that the
> given power of two expresses a size in bytes, rather than in kilobytes, bits 
> or
> any other unit that might be reasonably assumed to express stack sizes.  I'm
> afraid I don't know the best way to accomplish that in a few words.
> 
> > +stk_clash_default=12
> 
> This seems to be left-over from an earlier patch, as it is now unused AFAICT.
> 
> Thanks,
> 
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
diff --git a/gcc/config.in b/gcc/config.in
index 2856e72d627df537a301a6c7ab6b5bbb75f6b43f..32118ef1f94cece06a1d870416c3a1705716d21b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -55,6 +55,13 @@
 #endif
 
 
+/* Define to larger than zero set to the default stack clash protector size as
+   a power of two in bytes. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_STK_CLASH_GUARD_SIZE
+#endif
+
+
 /* Define if you want to use __cxa_atexit, rather than atexit, to register C++
destructors for local statics and global objects. This is essential for
fully standards-compliant handling of destructors, but requires
diff --git a/gcc/configure b/gcc/configure
index 60d373982fd38fe51c285e2b02941754d1b833d6..77ed62d199b272dfe39de156771dfb774184dd7d 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -905,6 +905,7 @@ enable_valgrind_annotations
 with_stabs
 enable_multilib
 enable_multiarch
+with_stack_clash_protection_guard_size
 enable___cxa_atexit
 enable_decimal_float
 enable_fixed_point
@@ -1724,6 +1725,9 @@ Optional Packages:
   --with-gnu-as   arrange to work with GNU as
   --with-as   arrange to use the specified as (full pathname)
   --with-stabsarrange to use stabs instead of host debug format
+  --with-stack-clash-protection-guard-size=size
+  Set the default stack clash protection guard size
+  for specific targets as a power of two in bytes.
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
   --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
@@ -7436,6 +7440,34 @@ $as_echo "$enable_multiarch$ma_msg_suffix" >&6; }
 
 
 
+# default stack clash protection guard size as power of twos in bytes.
+# Please keep these in sync with params.def.
+stk_clash_min=12
+stk_clash_max=30
+
+# Keep the default value when the option is not used to 0, this allows us to
+# distinguish between the cases where the user specifially set a value via
+# configure and when the normal default value is used.
+
+# Check whether --with-stack-clash-protection-guard-size was given.
+if test "${with_stack_clash_protection_guard_size+set}" = set; then :
+  withval=$with_stack_clash_protection_guard_size; DEFAULT_STK_CLASH_GUARD_SIZE="$with_stack_clash_protection_guard_size"
+else
+  DEFAULT_STK_CLASH_GUARD_SIZE=0
+fi
+
+if test $DEFAULT_STK_CLASH_GUARD_SIZE -ne 0 \
+ && (test $DEFAULT_STK_CLASH_GUARD_SIZE -lt $stk_clash_min \
+	 || test $DEFAULT_STK_CLASH_GUARD_SIZE -gt $stk_clash_max); then
+  as_fn_error "Invalid value $DEFAULT_STK_CLASH_GUARD_SIZE for --with-stack-clash-protection-guard-size. Must be between $stk_clash_min and $stk_clash_max." "$LINENO" 5
+fi
+
+
+cat >>confdefs.h <<_ACEOF
+#define DEFAULT_STK_CLASH_GUARD_SIZE $DEFAULT_STK_CLASH_GUARD_SIZE
+_ACEOF
+
+
 # Enable __cxa_atexit for C++.
 # Check whether --enable-__cxa_atexit was given.
 if test

RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-07-25 Thread Tamar Christina

Hi All,

Attached is an updated patch that clarifies some of the comments in the patch 
and adds comments to the individual testcases
as requested.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-25  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* config/aarch64/aarch64.md (cmp,
probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-07-25  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* gcc.target/aarch64/stack-check-prologue-9.c: New.
* gcc.target/aarch64/stack-check-prologue.h: New.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Add AArch64.

> -Original Message-
> From: Jeff Law 
> Sent: Monday, July 16, 2018 18:03
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/6)]
> 
> On 07/16/2018 07:54 AM, Tamar Christina wrote:
> > The 07/13/2018 17:46, Jeff Law wrote:
> >> On 07/12/2018 11:39 AM, Tamar Christina wrote:
> > +
> > +  /* Round size to the nearest multiple of guard_size, and calculate
> the
> > + residual as the difference between the original size and the
> rounded
> > + size. */
> > +  HOST_WIDE_INT rounded_size = size & -guard_size;
> HOST_WIDE_INT
> > + residual = size - rounded_size;
> > +
> > +  /* We can handle a small number of allocations/probes inline.
> Otherwise
> > + punt to a loop.  */
> > +  if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES *
> guard_size)
> > +{
> > +  for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
> > +   {
> > + aarch64_sub_sp (NULL, temp2, guard_size, true);
> > + emit_stack_probe (plus_constant (Pmode,
> stack_pointer_rtx,
> > +
> STACK_CLASH_CALLER_GUARD));
> > +   }
>  So the only concern with this code is that it'll be inefficient and
>  possibly incorrect for probe sizes larger than ARITH_FACTOR.
>  Ultimately, that's a case I don't think we really care that much about.
>  I wouldn't lose sleep if the port clamped the requested probing
>  interval at ARITH_FACTOR.
> 
> >>> I'm a bit confused here, the ARITH_FACTOR seems to have to do with
> >>> the Ada stack probing implementation, which isn't used by this new
> >>> code aside from the part that emits the actual probe when doing a
> >>> variable or large allocation in aarch64_output_probe_stack_range.
> >>>
> >>> Clamping the probing interval at ARITH_FACTOR means we can't do 64KB
> >>> probing intervals.
> >> It may have been a misunderstanding on my part.  My understanding is
> >> that ARITH_FACTOR represents the largest immediate constant we could
> >> handle in this code using a single insn.  Anything above ARITH_FACTOR
> >> needed a scratch register and I couldn't convince myself that we had
> >> a suitable scratch register available.
> >>
> >> But I'm really not well versed on the aarch64 architecture or the
> >> various implementation details in aarch64.c.  So I'm happy to defer
> >> to you and others @ ARM on what's best to do here.
> >
> > Ah no, that 12 bit immediate for str offset is unscaled. Scaled it's 15 
> > bits for
> the 64bit store case.
> > So the actual limit is 32760, so it's quite a bit larger than ARITH_FACTOR.
> >
> > The value of STACK_CLASH_CALLER_GUARD is fixed in the back-end and
> > can't be changed, and if it's made too big will just give a compile error.
> ACK.  Thanks for explaining.
> 
> 
> >
> >>
> >>
>

Re: [PATCH, debug] Add fkeep-vars-live

2018-07-25 Thread Jakub Jelinek

On Tue, Jul 24, 2018 at 05:04:06PM +0200, Tom de Vries wrote:
> 
> > +Add artificial use for each local variable at the end of the
> > declaration scope
> 
> Is this a better option description?

Yes (with a period at the end).  Or perhaps "its" instead of "the".

Looks ok to me, just would like to ask one more question, does this prevent
tail-calls or not?  If we tail call optimize some call, then I think there
is no need to keep the vars live, because the caller doesn't appear anymore
in the backtrace, on the other side if it is not a tail call, we want to
keep the vars live across the call so that they can be inspected.

Jakub

Re: [14/46] Make STMT_VINFO_VEC_STMT a stmt_vec_info

2018-07-25 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Jul 24, 2018 at 11:58 AM Richard Sandiford
>  wrote:
>>
>> This patch changes STMT_VINFO_VEC_STMT from a gimple stmt to a
>> stmt_vec_info and makes the vectorizable_* routines pass back
>> a stmt_vec_info to vect_transform_stmt.
>
> OK, but - I don't think we ever "use" that stmt_info on vectorized stmts apart
> from the chaining via related-stmt?  I'd also like to get rid of that chaining
> and instead do sth similar to SLP where we simply have a vec<> of
> vectorized stmts.

Yeah, agree that would be better.

Thanks,
Richard

>
> Richard.
>
>>
>> 2018-07-24  Richard Sandiford  
>>
>> gcc/
>> * tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Change from
>> a gimple stmt to a stmt_vec_info.
>> (vectorizable_condition, vectorizable_live_operation)
>> (vectorizable_reduction, vectorizable_induction): Pass back the
>> vectorized statement as a stmt_vec_info.
>> * tree-vect-data-refs.c (vect_record_grouped_load_vectors): Update
>> use of STMT_VINFO_VEC_STMT.
>> * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise,
>> accumulating the inner phis that feed the STMT_VINFO_VEC_STMT
>> as stmt_vec_infos rather than gimple stmts.
>> (vectorize_fold_left_reduction): Change vec_stmt from a gimple stmt
>> to a stmt_vec_info.
>> (vectorizable_live_operation): Likewise.
>> (vectorizable_reduction, vectorizable_induction): Likewise,
>> updating use of STMT_VINFO_VEC_STMT.
>> * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Update use
>> of STMT_VINFO_VEC_STMT.
>> (vect_build_gather_load_calls, vectorizable_bswap, vectorizable_call)
>> (vectorizable_simd_clone_call, vectorizable_conversion)
>> (vectorizable_assignment, vectorizable_shift, vectorizable_operation)
>> (vectorizable_store, vectorizable_load, vectorizable_condition)
>> (vectorizable_comparison, can_vectorize_live_stmts): Change vec_stmt
>> from a gimple stmt to a stmt_vec_info.
>> (vect_transform_stmt): Update use of STMT_VINFO_VEC_STMT.  Pass a
>> pointer to a stmt_vec_info to the vectorizable_* routines.
>>
>> Index: gcc/tree-vectorizer.h
>> ===
>> --- gcc/tree-vectorizer.h   2018-07-24 10:22:44.297185652 +0100
>> +++ gcc/tree-vectorizer.h   2018-07-24 10:22:47.489157307 +0100
>> @@ -812,7 +812,7 @@ struct _stmt_vec_info {
>>tree vectype;
>>
>>/* The vectorized version of the stmt.  */
>> -  gimple *vectorized_stmt;
>> +  stmt_vec_info vectorized_stmt;
>>
>>
>>/* The following is relevant only for stmts that contain a non-scalar
>> @@ -1560,7 +1560,7 @@ extern void vect_remove_stores (gimple *
>>  extern bool vect_analyze_stmt (gimple *, bool *, slp_tree, slp_instance,
>>stmt_vector_for_cost *);
>>  extern bool vectorizable_condition (gimple *, gimple_stmt_iterator *,
>> -   gimple **, tree, int, slp_tree,
>> +   stmt_vec_info *, tree, int, slp_tree,
>> stmt_vector_for_cost *);
>>  extern void vect_get_load_cost (stmt_vec_info, int, bool,
>> unsigned int *, unsigned int *,
>> @@ -1649,13 +1649,13 @@ extern tree vect_get_loop_mask (gimple_s
>>  extern struct loop *vect_transform_loop (loop_vec_info);
>> extern loop_vec_info vect_analyze_loop_form (struct loop *,
> vec_info_shared *);
>>  extern bool vectorizable_live_operation (gimple *, gimple_stmt_iterator *,
>> -slp_tree, int, gimple **,
>> +slp_tree, int, stmt_vec_info *,
>>  stmt_vector_for_cost *);
>>  extern bool vectorizable_reduction (gimple *, gimple_stmt_iterator *,
>> -   gimple **, slp_tree, slp_instance,
>> +   stmt_vec_info *, slp_tree, slp_instance,
>> stmt_vector_for_cost *);
>>  extern bool vectorizable_induction (gimple *, gimple_stmt_iterator *,
>> -   gimple **, slp_tree,
>> +   stmt_vec_info *, slp_tree,
>> stmt_vector_for_cost *);
>>  extern tree get_initial_def_for_reduction (gimple *, tree, tree *);
>>  extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code);
>> Index: gcc/tree-vect-data-refs.c
>> ===
>> --- gcc/tree-vect-data-refs.c   2018-07-24 10:22:44.285185759 +0100
>> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:22:47.485157343 +0100
>> @@ -6401,18 +6401,17 @@ vect_record_grouped_load_vectors (gimple
>>  {
>>if (!DR_GROUP_SAME_DR_STMT (vinfo_for_stmt (next_stmt)))
>>

Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-25 Thread Jakub Jelinek

On Tue, Jul 24, 2018 at 04:11:11PM -0300, Alexandre Oliva wrote:
> On Jul 24, 2018, Tom de Vries  wrote:
> 
> > This patch adds fake uses of user variables at the point where they go out 
> > of
> > scope, to keep user variables inspectable throughout the application.
> 
> I suggest also adding such uses before sets, so that variables aren't
> regarded as dead and get optimized out in ranges between the end of a
> live range and a subsequent assignment.

But that can be done incrementally, right, and perhaps being controllable
by a level of this option, because such extra uses might make it even more
costly.  Though, if the extra uses and sets aren't in the same stmt, then
the optimizers could still move it appart (in addition of it making the IL
larger).
We need to think about different cases (non-gimple reg vars are probably ok,
they just live in memory unless converted (sra etc.) into gimple reg vars,
then what to do about SSA_NAMEs with underlying user variable if it doesn't
have overlapping ranges, if it does have overlapping ranges).

Jakub

Re: [PATCH] Add initial version of C++17 header

2018-07-25 Thread Jonathan Wakely


On 24/07/18 22:12 +0100, Jonathan Wakely wrote:

This is missing the synchronized_pool_resource and
unsynchronized_pool_resource classes but is otherwise complete.

This is a new implementation, not based on the existing code in
, but memory_resource and
polymorphic_allocator ended up looking almost the same anyway.

The constant_init kluge in src/c++17/memory_resource.cc is apparently
due to Richard Smith and ensures that the objects are constructed during
constant initialiation phase and not destroyed (because the
constant_init destructor doesn't destroy the union member and the
storage is not reused).

* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include  for C++17.
* include/std/memory_resource: New header.
(memory_resource, polymorphic_allocator, new_delete_resource)
(null_memory_resource, set_default_resource, get_default_resource)
(pool_options, monotonic_buffer_resource): Define.
* src/Makefile.am: Add c++17 directory.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.am: Fix comment.
* src/c++17/Makefile.am: Add makefile for new sub-directory.
* src/c++17/Makefile.in: Generate.
* src/c++17/memory_resource.cc: New.
(newdel_res_t, null_res_t, constant_init, newdel_res, null_res)
(default_res, new_delete_resource, null_memory_resource)
(set_default_resource, get_default_resource): Define.
* testsuite/20_util/memory_resource/1.cc: New test.
* testsuite/20_util/memory_resource/2.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/1.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/release.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
New test.
* testsuite/20_util/polymorphic_allocator/1.cc: New test.
* testsuite/20_util/polymorphic_allocator/resource.cc: New test.
* testsuite/20_util/polymorphic_allocator/select.cc: New test.
* testsuite/util/testsuite_allocator.h (__gnu_test::memory_resource):
Define concrete memory resource for testing.
(__gnu_test::default_resource_mgr): Define RAII helper for changing
default resource.

Tested powerpc64le-linux, committed to trunk.


I missed a change to acinclude.m4 that should have gone with this
patch. Now also committed to trunk.


commit cdaaa2b47d3fa093c741086e86d631d420a93663
Author: Jonathan Wakely 
Date:   Wed Jul 25 11:52:19 2018 +0100

Add new src/c++17 directory to list in acinclude.m4

* acinclude.m4 (glibcxx_SUBDIRS): Add src/c++17.
* src/Makefile.am: Add comment.
* src/c++17/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index bbf3c8df3e1..6d68e907426 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
   # Keep these sync'd with the list in Makefile.am.  The first provides an
   # expandable list at autoconf time; the second provides an expandable list
   # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 src/filesystem doc po testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 src/c++17 src/filesystem doc po testsuite python])
   SUBDIRS='glibcxx_SUBDIRS'
 
   # These need to be absolute paths, yet at the same time need to
diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index 09edcdbc471..9a2fe297ddb 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -28,6 +28,7 @@ else
 filesystem_dir =
 endif
 
+## Keep this list sync'd with acinclude.m4:GLIBCXX_CONFIGURE.
 SUBDIRS = c++98 c++11 c++17 $(filesystem_dir)
 
 # Cross compiler support.

Re: [Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-25 Thread Ramana Radhakrishnan

On Tue, Jul 24, 2018 at 10:55 PM, Steve Ellcey  wrote:
> On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote:
>>
>>
>> I'd say this patch isn't desirable for trunk. I'd be interested in use cases
>> that need a static decision on presence of LSE that are not better expressed
>> using higher level language features.
>>
>> Thanks,
>> James
>
> How about when building the higher level features?


> Right now,
> in sysdeps/aarch64/atomic-machine.h, we
> hardcode ATOMIC_EXCHANGE_USES_CAS to 0.  If we had __ARM_FEATURE_LSE we
> could use that to determine if we wanted to set
> ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call
> generated in nptl/pthread_spin_lock.c.  That would be useful if we
> built a lipthread specifically for a platform that had LSE.


No, you don't need to define ATOMIC_EXCHANGE_USES_CAS=1 to get LSE
instructions in libpthread. You get that already with
-march=armv8-a+lse on the command line.

ATOMIC_EXCHANGE_USES_CAS =1 in glibc implies that a CAS is faster than
a SWP and that on AArch64 is a per micro-architectural decision *not*
an architectural decision for the port unless someone can
categorically say that the majority of implementations that glibc
cares about *have* better CAS performance than SWP performance. Both
the SWP and CAS instructions are part of v8.1-A / LSE thus all you
need to build libpthread with lse is merely the command line option
-march=armv8-a+lse. So, no you don't need this macro to build
libpthread for v8.1 or LSE . You need that macro to statically choose
a cas implementation over a swp implementation.

See comment in include/atomic.h :

/* ATOMIC_EXCHANGE_USES_CAS is non-zero if atomic_exchange operations
   are implemented based on a CAS loop; otherwise, this is zero and we assume
   that the atomic_exchange operations could provide better performance
   than a CAS loop.  */


regards
Ramana


>
> Steve Ellcey
> sell...@cavium.com
>

[PATCH] Move std::unique_lock definition to a separate header

2018-07-25 Thread Jonathan Wakely


This will allow std::mutex and std::lock_guard to be used elsewhere in
the library without pulling in the whole of .

Previously the whole of  was conditional on the
_GLIBCXX_USE_C99_STDINT_TR1 macro, but only the std::unique_lock members
that use  facilities should depend on that. std::mutex only
needs to depend on _GLIBCXX_HAS_GTHREADS and std::lock_guard can be
defined unconditionally.

Some parts of  and  are based on code in
 which dates from 2003. However, the std::unique_lock
implementation was added in 2008 by r135007, without using any earlier
code. Therefore the new header file has copyright years 2008-2018.

* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/bits/std_mutex.h [!_GLIBCXX_USE_C99_STDINT_TR1] (mutex)
(lock_guard): Define independent of _GLIBCXX_USE_C99_STDINT_TR1.
(unique_lock): Move definition to ...
* include/bits/unique_lock.h: New header.
[!_GLIBCXX_USE_C99_STDINT_TR1] (unique_lock): Define unconditionally.
[_GLIBCXX_USE_C99_STDINT_TR1] (unique_lock(mutex_type&, time_point))
(unique_lock(mutex_type&, duration), unique_lock::try_lock_until)
(unique_lock::try_lock_for): Define only when  is usable.
* include/std/condition_variable: Include .
* include/std/mutex: Likewise.

Tested powerpc64le-linux, committed to trunk.


commit c84285a3a0e3d252d3f8e1ffec6dd56997a87fe8
Author: Jonathan Wakely 
Date:   Wed Jul 25 10:50:33 2018 +0100

Move std::unique_lock definition to a separate header

This will allow std::mutex and std::lock_guard to be used elsewhere in
the library without pulling in the whole of .

Previously the whole of  was conditional on the
_GLIBCXX_USE_C99_STDINT_TR1 macro, but only the std::unique_lock members
that use  facilities should depend on that. std::mutex only
needs to depend on _GLIBCXX_HAS_GTHREADS and std::lock_guard can be
defined unconditionally.

Some parts of  and  are based on code in
 which dates from 2003. However, the std::unique_lock
implementation was added in 2008 by r135007, without using any earlier
code. Therefore the new header file has copyright years 2008-2018.

* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/bits/std_mutex.h [!_GLIBCXX_USE_C99_STDINT_TR1] (mutex)
(lock_guard): Define independent of _GLIBCXX_USE_C99_STDINT_TR1.
(unique_lock): Move definition to ...
* include/bits/unique_lock.h: New header.
[!_GLIBCXX_USE_C99_STDINT_TR1] (unique_lock): Define 
unconditionally.
[_GLIBCXX_USE_C99_STDINT_TR1] (unique_lock(mutex_type&, time_point))
(unique_lock(mutex_type&, duration), unique_lock::try_lock_until)
(unique_lock::try_lock_for): Define only when  is usable.
* include/std/condition_variable: Include .
* include/std/mutex: Likewise.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 9daa8856e70..70db3cb6260 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -199,6 +199,7 @@ bits_headers = \
${bits_srcdir}/stringfwd.h \
${bits_srcdir}/string_view.tcc \
${bits_srcdir}/uniform_int_dist.h \
+   ${bits_srcdir}/unique_lock.h \
${bits_srcdir}/unique_ptr.h \
${bits_srcdir}/unordered_map.h \
${bits_srcdir}/unordered_set.h \
diff --git a/libstdc++-v3/include/bits/std_mutex.h 
b/libstdc++-v3/include/bits/std_mutex.h
index 34d22907a06..41a9b30636a 100644
--- a/libstdc++-v3/include/bits/std_mutex.h
+++ b/libstdc++-v3/include/bits/std_mutex.h
@@ -39,9 +39,6 @@
 #include 
 #include 
 #include 
-#include  // for std::swap
-
-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -174,200 +171,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   mutex_type&  _M_device;
 };
 
-  /** @brief A movable scoped lock type.
-   *
-   * A unique_lock controls mutex ownership within a scope. Ownership of the
-   * mutex can be delayed until after construction and can be transferred
-   * to another unique_lock by move construction or move assignment. If a
-   * mutex lock is owned when the destructor runs ownership will be released.
-   */
-  template
-class unique_lock
-{
-public:
-  typedef _Mutex mutex_type;
-
-  unique_lock() noexcept
-  : _M_device(0), _M_owns(false)
-  { }
-
-  explicit unique_lock(mutex_type& __m)
-  : _M_device(std::__addressof(__m)), _M_owns(false)
-  {
-   lock();
-   _M_owns = true;
-  }
-
-  unique_lock(mutex_type& __m, defer_lock_t) noexcept
-  : _M_device(std::__addressof(__m)), _M_owns(false)
-  { }
-
-  unique_lock(mutex_type& __m, try_to_lock_t)
-  : _M_device(std::__addressof(__m)), _M_owns(_M_device->try_lock())
-  { }
-
-

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-25 Thread Richard Biener

On Wed, Jul 25, 2018 at 11:50 AM Segher Boessenkool
 wrote:
>
> On Wed, Jul 25, 2018 at 10:28:30AM +0200, Richard Biener wrote:
> > On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool
> >  wrote:
> > >
> > > This patch allows combine to combine two insns into two.  This helps
> > > in many cases, by reducing instruction path length, and also allowing
> > > further combinations to happen.  PR85160 is a typical example of code
> > > that it can improve.
> > >
> > > This patch does not allow such combinations if either of the original
> > > instructions was a simple move instruction.  In those cases combining
> > > the two instructions increases register pressure without improving the
> > > code.  With this move test register pressure does no longer increase
> > > noticably as far as I can tell.
> > >
> > > (At first I also didn't allow either of the resulting insns to be a
> > > move instruction.  But that is actually a very good thing to have, as
> > > should have been obvious).
> > >
> > > Tested for many months; tested on about 30 targets.
> > >
> > > I'll commit this later this week if there are no objections.
> >
> > Sounds good - but, _any_ testcase?  Please! ;)
>
> I only have target-specific ones.

Works for me.

>  Most *simple* ones will already be
> optimised by current code (via 3->2 combination).  But I've now got one
> that trunk does not optimise, and it can be confirmed with looking at
> the resulting machine code even (no need to look at the combine dump,
> which is a very good thing).  And it is a proper thing to test even: it
> tests that some source is compiled to properly optimised machine code.
>
> Any other kind of testcase is worse than useless, of course.
>
> Testing it results in working code isn't very feasible or useful either.
>
>
> Segher

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-25 Thread Richard Biener

On Wed, Jul 25, 2018 at 11:49 AM Richard Earnshaw (lists)
 wrote:
>
> On 24/07/18 18:26, Richard Biener wrote:
> > On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
> >  wrote:
> >>
> >>
> >> This patch defines a new intrinsic function
> >> __builtin_speculation_safe_value.  A generic default implementation is
> >> defined which will attempt to use the backend pattern
> >> "speculation_safe_barrier".  If this pattern is not defined, or if it
> >> is not available, then the compiler will emit a warning, but
> >> compilation will continue.
> >>
> >> Note that the test spec-barrier-1.c will currently fail on all
> >> targets.  This is deliberate, the failure will go away when
> >> appropriate action is taken for each target backend.
> >
> > So given this series is supposed to be backported I question
> >
> > +rtx
> > +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
> > +   rtx result, rtx val,
> > +   rtx failval ATTRIBUTE_UNUSED)
> > +{
> > +  emit_move_insn (result, val);
> > +#ifdef HAVE_speculation_barrier
> > +  /* Assume the target knows what it is doing: if it defines a
> > + speculation barrier, but it is not enabled, then assume that one
> > + isn't needed.  */
> > +  if (HAVE_speculation_barrier)
> > +emit_insn (gen_speculation_barrier ());
> > +
> > +#else
> > +  warning_at (input_location, 0,
> > + "this target does not define a speculation barrier; "
> > + "your program will still execute correctly, but speculation "
> > + "will not be inhibited");
> > +#endif
> > +  return result;
> >
> > which makes all but aarch64 archs warn on __bultin_speculation_safe_value
> > uses, even those that do not suffer from Spectre like all those embedded 
> > targets
> > where implementations usually do not speculate at all.
> >
> > In fact for those targets the builtin stays in the way of optimization on 
> > GIMPLE
> > as well so we should fold it away early if neither the target hook is
> > implemented
> > nor there is a speculation_barrier insn.
> >
> > So, please make resolve_overloaded_builtin return a no-op on such targets
> > which means you can remove the above warning.  Maybe such targets
> > shouldn't advertise / initialize the builtins at all?
>
> I disagree with your approach here.  Why would users not want to know
> when the compiler is failing to implement a security feature when it
> should?  As for targets that don't need something, they can easily
> define the hook as described to suppress the warning.
>
> Or are you just suggesting moving the warning to resolve overloaded builtin.

Well.  You could argue I say we shouldn't even support
__builtin_sepeculation_safe_value
for archs that do not need it or have it not implemented.  That way users can
decide:

#if __HAVE_SPECULATION_SAFE_VALUE
 
#else
#warning oops // or nothing
#endif

> Other ports will need to take action, but in general, it can be as
> simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
> simpler still if nothing is needed for that architecture.

Then that should be the default.  You might argue we'll only see
__builtin_speculation_safe_value uses for things like Firefox which
is unlikely built for AVR (just to make an example).  But people
are going to test build just on x86 and if they build with -Werror
this will break builds on all targets that didn't even get the chance
to implement this feature.

> There is a test which is intended to fail to targets that have not yet
> been patched - I thought that was better than hard-failing the build,
> especially given that we want to back-port.
>
> Port maintainers DO need to decide what to do about speculation, even if
> it is explicitly that no mitigation is needed.

Agreed.  But I didn't yet see a request for maintainers to decide that?

> >
> > The builtins also have no attributes which mean they are assumed to be
> > 1) calling back into the CU via exported functions, 2) possibly throwing
> > exceptions, 3) affecting memory state.  I think you at least want
> > to use ATTR_NOTHROW_LEAF_LIST.
> >
> > The builtins are not designed to be optimization or memory barriers as
> > far as I can see and should thus be CONST as well.
> >
>
> I think they should be barriers.  They do need to ensure that they can't
> be moved past other operations that might depend on the speculation
> state.  Consider, for example,

That makes eliding them for targets that do not need mitigation even
more important.

>  ...
>  t = untrusted_value;
>  ...
>  if (t + 5 < limit)
>  {
>v = mem[__builtin_speculation_safe_value (untrusted_value)];
>...
>
> The compiler must never lift the builtin outside the bounds check as
> that is part of the speculation state.

OK, so you are relying on the fact that with the current setup GCC has
to assume the builtin has side-effects (GCC may not move it to a place that
the original location is not post-dominated on).  It

Re: [38/46] Pass stmt_vec_infos instead of data_references where relevant

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
 wrote:
>
> This patch makes various routines (mostly in tree-vect-data-refs.c)
> take stmt_vec_infos rather than data_references.  The affected routines
> are really dealing with the way that an access is going to vectorised
> for a particular stmt_vec_info, rather than with the original scalar
> access described by the data_reference.

Similar.  Doesn't it make more sense to pass both stmt_info and DR to
the functions?

We currently cannot handle aggregate copies in the to-be-vectorized IL
but rely on SRA and friends to elide those.  That's the only two-DR
stmt I can think of for vectorization.  Maybe aggregate by-value / return
function calls with OMP SIMD if that supports this somehow.

Richard.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_supportable_dr_alignment): Take
> a stmt_vec_info rather than a data_reference.
> * tree-vect-data-refs.c (vect_calculate_target_alignment)
> (vect_compute_data_ref_alignment, vect_update_misalignment_for_peel)
> (verify_data_ref_alignment, vector_alignment_reachable_p)
> (vect_get_data_access_cost, vect_get_peeling_costs_all_drs)
> (vect_peeling_supportable, vect_analyze_group_access_1)
> (vect_analyze_group_access, vect_analyze_data_ref_access)
> (vect_vfa_segment_size, vect_vfa_access_size, vect_small_gap_p)
> (vectorizable_with_step_bound_p, vect_duplicate_ssa_name_ptr_info)
> (vect_supportable_dr_alignment): Likewise.  Update calls to other
> functions for which the same change is being made.
> (vect_verify_datarefs_alignment, vect_find_same_alignment_drs)
> (vect_analyze_data_refs_alignment): Update calls accordingly.
> (vect_slp_analyze_and_verify_node_alignment): Likewise.
> (vect_analyze_data_ref_accesses): Likewise.
> (vect_prune_runtime_alias_test_list): Likewise.
> (vect_create_addr_base_for_vector_ref): Likewise.
> (vect_create_data_ref_ptr): Likewise.
> (_vect_peel_info::dr): Replace with...
> (_vect_peel_info::stmt_info): ...this new field.
> (vect_peeling_hash_get_most_frequent): Update _vect_peel_info uses
> accordingly, and update after above interface changes.
> (vect_peeling_hash_get_lowest_cost): Likewise
> (vect_peeling_hash_choose_best_peeling): Likewise.
> (vect_enhance_data_refs_alignment): Likewise.
> (vect_peeling_hash_insert): Likewise.  Take a stmt_vec_info
> rather than a data_reference.
> * tree-vect-stmts.c (vect_get_store_cost, vect_get_load_cost)
> (get_negative_load_store_type): Update calls to
> vect_supportable_dr_alignment.
> (vect_get_data_ptr_increment, ensure_base_align): Take a
> stmt_vec_info instead of a data_reference.
> (vectorizable_store, vectorizable_load): Update calls after
> above interface changes.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:24:05.744462369 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:24:08.924434128 +0100
> @@ -1541,7 +1541,7 @@ extern tree vect_get_mask_type_for_stmt
>  /* In tree-vect-data-refs.c.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
>  extern enum dr_alignment_support vect_supportable_dr_alignment
> -   (struct data_reference *, bool);
> +  (stmt_vec_info, bool);
>  extern tree vect_get_smallest_scalar_type (stmt_vec_info, HOST_WIDE_INT *,
> HOST_WIDE_INT *);
>  extern bool vect_analyze_data_ref_dependences (loop_vec_info, unsigned int 
> *);
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:24:05.740462405 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:24:08.924434128 +0100
> @@ -858,19 +858,19 @@ vect_record_base_alignments (vec_info *v
>  }
>  }
>
> -/* Return the target alignment for the vectorized form of DR.  */
> +/* Return the target alignment for the vectorized form of the load or store
> +   in STMT_INFO.  */
>
>  static unsigned int
> -vect_calculate_target_alignment (struct data_reference *dr)
> +vect_calculate_target_alignment (stmt_vec_info stmt_info)
>  {
> -  stmt_vec_info stmt_info = vect_dr_stmt (dr);
>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>return targetm.vectorize.preferred_vector_alignment (vectype);
>  }
>
>  /* Function vect_compute_data_ref_alignment
>
> -   Compute the misalignment of the data reference DR.
> +   Compute the misalignment of the load or store in STMT_INFO.
>
> Output:
> 1. dr_misalignment (STMT_INFO) is defined.
> @@ -879,9 +879,9 @@ vect_calculate_target_alignment (struct
> only for trivial cases. TODO.  */
>
>  static void

Re: [37/46] Associate alignment information with stmt_vec_infos

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:08 PM Richard Sandiford
 wrote:
>
> Alignment information is really a property of a stmt_vec_info
> (and the way we want to vectorise it) rather than the original scalar dr.
> I think that was true even before the recent dr sharing.

But that is only so as long as we handle only stmts with a single DR.
In reality alignment info _is_ a property of the DR and not of the stmt.

So you're doing a shortcut here, shouldn't we rename
dr_misalignment to stmt_dr_misalignment then?

Otherwise I don't see how this makes sense semantically.

> This patch therefore makes the alignment-related interfaces take
> stmt_vec_infos rather than data_references.
>
>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (STMT_VINFO_TARGET_ALIGNMENT): New macro.
> (DR_VECT_AUX, DR_MISALIGNMENT, SET_DR_MISALIGNMENT)
> (DR_TARGET_ALIGNMENT): Delete.
> (set_dr_misalignment, dr_misalignment, aligned_access_p)
> (known_alignment_for_access_p, vect_known_alignment_in_bytes)
> (vect_dr_behavior): Take a stmt_vec_info rather than a data_reference.
> * tree-vect-data-refs.c (vect_calculate_target_alignment)
> (vect_compute_data_ref_alignment, vect_update_misalignment_for_peel)
> (vector_alignment_reachable_p, vect_get_peeling_costs_all_drs)
> (vect_peeling_supportable, vect_enhance_data_refs_alignment)
> (vect_duplicate_ssa_name_ptr_info): Update after above changes.
> (vect_create_addr_base_for_vector_ref, vect_create_data_ref_ptr)
> (vect_setup_realignment, vect_supportable_dr_alignment): Likewise.
> * tree-vect-loop-manip.c (get_misalign_in_elems): Likewise.
> (vect_gen_prolog_loop_niters): Likewise.
> * tree-vect-stmts.c (vect_get_store_cost, vect_get_load_cost)
> (compare_step_with_zero, get_group_load_store_type): Likewise.
> (vect_get_data_ptr_increment, ensure_base_align, vectorizable_store)
> (vectorizable_load): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:24:02.364492386 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:24:05.744462369 +0100
> @@ -1031,6 +1031,9 @@ #define STMT_VINFO_NUM_SLP_USES(S)(S)->
>  #define STMT_VINFO_REDUC_TYPE(S)   (S)->reduc_type
>  #define STMT_VINFO_REDUC_DEF(S)(S)->reduc_def
>
> +/* Only defined once dr_misalignment is defined.  */
> +#define STMT_VINFO_TARGET_ALIGNMENT(S) (S)->dr_aux.target_alignment
> +
>  #define DR_GROUP_FIRST_ELEMENT(S)  (gcc_checking_assert 
> ((S)->data_ref_info), (S)->first_element)
>  #define DR_GROUP_NEXT_ELEMENT(S)   (gcc_checking_assert 
> ((S)->data_ref_info), (S)->next_element)
>  #define DR_GROUP_SIZE(S)   (gcc_checking_assert 
> ((S)->data_ref_info), (S)->size)
> @@ -1048,8 +1051,6 @@ #define HYBRID_SLP_STMT(S)
>  #define PURE_SLP_STMT(S)  ((S)->slp_type == pure_slp)
>  #define STMT_SLP_TYPE(S)   (S)->slp_type
>
> -#define DR_VECT_AUX(dr) (_for_stmt (DR_STMT (dr))->dr_aux)
> -
>  #define VECT_MAX_COST 1000
>
>  /* The maximum number of intermediate steps required in multi-step type
> @@ -1256,73 +1257,72 @@ add_stmt_costs (void *data, stmt_vector_
>  #define DR_MISALIGNMENT_UNKNOWN (-1)
>  #define DR_MISALIGNMENT_UNINITIALIZED (-2)
>
> +/* Record that the vectorized form of the data access in STMT_INFO
> +   will be misaligned by VAL bytes wrt its target alignment.
> +   Negative values have the meanings above.  */
> +
>  inline void
> -set_dr_misalignment (struct data_reference *dr, int val)
> +set_dr_misalignment (stmt_vec_info stmt_info, int val)
>  {
> -  dataref_aux *data_aux = DR_VECT_AUX (dr);
> -  data_aux->misalignment = val;
> +  stmt_info->dr_aux.misalignment = val;
>  }
>
> +/* Return the misalignment in bytes of the vectorized form of the data
> +   access in STMT_INFO, relative to its target alignment.  Negative
> +   values have the meanings above.  */
> +
>  inline int
> -dr_misalignment (struct data_reference *dr)
> +dr_misalignment (stmt_vec_info stmt_info)
>  {
> -  int misalign = DR_VECT_AUX (dr)->misalignment;
> +  int misalign = stmt_info->dr_aux.misalignment;
>gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>return misalign;
>  }
>
> -/* Reflects actual alignment of first access in the vectorized loop,
> -   taking into account peeling/versioning if applied.  */
> -#define DR_MISALIGNMENT(DR) dr_misalignment (DR)
> -#define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL)
> -
> -/* Only defined once DR_MISALIGNMENT is defined.  */
> -#define DR_TARGET_ALIGNMENT(DR) DR_VECT_AUX (DR)->target_alignment
> -
> -/* Return true if data access DR is aligned to its target alignment
> -   (which may be less than a full vector).  */
> +/* Return true if the vectorized form of the data access in STMT_INFO is
> +   aligned to its target alignment (which may

Re: [36/46] Add a pattern_stmt_p field to stmt_vec_info

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:07 PM Richard Sandiford
 wrote:
>
> This patch adds a pattern_stmt_p field to stmt_vec_info, so that it's
> possible to tell whether the statement is a pattern statement without
> referring to other statements.  The new field goes in what was
> previously a hole in the structure, so the size is the same as before.

Not sure what the advantage is?  is_pattern_stmt_p () looks nicer
than ->is_pattern_p

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_stmt_vec_info::pattern_stmt_p): New field.
> (is_pattern_stmt_p): Delete.
> * tree-vect-patterns.c (vect_init_pattern_stmt): Set pattern_stmt_p
> on pattern statements.
> (vect_split_statement, vect_mark_pattern_stmts): Use the new
> pattern_stmt_p field instead of is_pattern_stmt_p.
> * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Likewise.
> * tree-vect-loop.c (vectorizable_live_operation): Likewise.
> * tree-vect-slp.c (vect_build_slp_tree_2): Likewise.
> (vect_find_last_scalar_stmt_in_slp, vect_remove_slp_scalar_calls)
> (vect_schedule_slp): Likewise.
> * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
> (vectorizable_call, vectorizable_simd_clone_call, vectorizable_shift)
> (vectorizable_store, vect_remove_stores): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:56.440544995 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:24:02.364492386 +0100
> @@ -791,6 +791,12 @@ struct _stmt_vec_info {
>/* Stmt is part of some pattern (computation idiom)  */
>bool in_pattern_p;
>
> +  /* True if the statement was created during pattern recognition as
> + part of the replacement for RELATED_STMT.  This implies that the
> + statement isn't part of any basic block, although for convenience
> + its gimple_bb is the same as for RELATED_STMT.  */
> +  bool pattern_stmt_p;
> +
>/* Is this statement vectorizable or should it be skipped in (partial)
>   vectorization.  */
>bool vectorizable;
> @@ -1151,16 +1157,6 @@ get_later_stmt (stmt_vec_info stmt1_info
>  return stmt2_info;
>  }
>
> -/* Return TRUE if a statement represented by STMT_INFO is a part of a
> -   pattern.  */
> -
> -static inline bool
> -is_pattern_stmt_p (stmt_vec_info stmt_info)
> -{
> -  stmt_vec_info related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
> -  return related_stmt_info && STMT_VINFO_IN_PATTERN_P (related_stmt_info);
> -}
> -
>  /* Return true if BB is a loop header.  */
>
>  static inline bool
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2018-07-24 10:23:59.408518638 +0100
> +++ gcc/tree-vect-patterns.c2018-07-24 10:24:02.360492422 +0100
> @@ -108,6 +108,7 @@ vect_init_pattern_stmt (gimple *pattern_
>  pattern_stmt_info = orig_stmt_info->vinfo->add_stmt (pattern_stmt);
>gimple_set_bb (pattern_stmt, gimple_bb (orig_stmt_info->stmt));
>
> +  pattern_stmt_info->pattern_stmt_p = true;
>STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
>STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>  = STMT_VINFO_DEF_TYPE (orig_stmt_info);
> @@ -630,7 +631,7 @@ vect_recog_temp_ssa_var (tree type, gimp
>  vect_split_statement (stmt_vec_info stmt2_info, tree new_rhs,
>   gimple *stmt1, tree vectype)
>  {
> -  if (is_pattern_stmt_p (stmt2_info))
> +  if (stmt2_info->pattern_stmt_p)
>  {
>/* STMT2_INFO is part of a pattern.  Get the statement to which
>  the pattern is attached.  */
> @@ -4726,7 +4727,7 @@ vect_mark_pattern_stmts (stmt_vec_info o
>gimple *def_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
>
>gimple *orig_pattern_stmt = NULL;
> -  if (is_pattern_stmt_p (orig_stmt_info))
> +  if (orig_stmt_info->pattern_stmt_p)
>  {
>/* We're replacing a statement in an existing pattern definition
>  sequence.  */
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:53.204573732 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:24:02.356492457 +0100
> @@ -212,9 +212,9 @@ vect_preserves_scalar_order_p (stmt_vec_
>   (but could happen later) while reads will happen no later than their
>   current position (but could happen earlier).  Reordering is therefore
>   only possible if the first access is a write.  */
> -  if (is_pattern_stmt_p (stmtinfo_a))
> +  if (stmtinfo_a->pattern_stmt_p)
>  stmtinfo_a = STMT_VINFO_RELATED_STMT (stmtinfo_a);
> -  if (is_pattern_stmt_p (stmtinfo_b))
> +  if (stmtinfo_b->pattern_stmt_p)
>  stmtinfo_b = STMT_VINFO_RELATED_STMT (stmtinfo_b);
>stmt_vec_info earlier_stmt_info = get_earlier_stmt (stmtinfo_a, 
> stmtinfo_b);
>return

Re: [35/46] Alter interfaces within vect_pattern_recog

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:06 PM Richard Sandiford
 wrote:
>
> vect_pattern_recog_1 took a gimple_stmt_iterator as argument, but was
> only interested in the gsi_stmt, not anything else.  This patch makes
> the associated routines operate directly on stmt_vec_infos.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-patterns.c (vect_mark_pattern_stmts): Take the
> original stmt as a stmt_vec_info rather than a gimple stmt.
> (vect_pattern_recog_1): Take the statement directly as a
> stmt_vec_info, rather than via a gimple_stmt_iterator.
> Update call to vect_mark_pattern_stmts.
> (vect_pattern_recog): Update calls accordingly.
>
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2018-07-24 10:23:50.004602150 +0100
> +++ gcc/tree-vect-patterns.c2018-07-24 10:23:59.408518638 +0100
> @@ -4720,29 +4720,29 @@ const unsigned int NUM_PATTERNS = ARRAY_
>  /* Mark statements that are involved in a pattern.  */
>
>  static inline void
> -vect_mark_pattern_stmts (gimple *orig_stmt, gimple *pattern_stmt,
> +vect_mark_pattern_stmts (stmt_vec_info orig_stmt_info, gimple *pattern_stmt,
>   tree pattern_vectype)
>  {
> -  stmt_vec_info orig_stmt_info = vinfo_for_stmt (orig_stmt);
>gimple *def_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
>
> -  bool old_pattern_p = is_pattern_stmt_p (orig_stmt_info);
> -  if (old_pattern_p)
> +  gimple *orig_pattern_stmt = NULL;
> +  if (is_pattern_stmt_p (orig_stmt_info))
>  {
>/* We're replacing a statement in an existing pattern definition
>  sequence.  */
> +  orig_pattern_stmt = orig_stmt_info->stmt;
>if (dump_enabled_p ())
> {
>   dump_printf_loc (MSG_NOTE, vect_location,
>"replacing earlier pattern ");
> - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, orig_stmt, 0);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, orig_pattern_stmt, 0);
> }
>
>/* To keep the book-keeping simple, just swap the lhs of the
>  old and new statements, so that the old one has a valid but
>  unused lhs.  */
> -  tree old_lhs = gimple_get_lhs (orig_stmt);
> -  gimple_set_lhs (orig_stmt, gimple_get_lhs (pattern_stmt));
> +  tree old_lhs = gimple_get_lhs (orig_pattern_stmt);
> +  gimple_set_lhs (orig_pattern_stmt, gimple_get_lhs (pattern_stmt));
>gimple_set_lhs (pattern_stmt, old_lhs);
>
>if (dump_enabled_p ())
> @@ -4755,7 +4755,8 @@ vect_mark_pattern_stmts (gimple *orig_st
>orig_stmt_info = STMT_VINFO_RELATED_STMT (orig_stmt_info);
>
>/* We shouldn't be replacing the main pattern statement.  */
> -  gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) != orig_stmt);
> +  gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info)->stmt
> + != orig_pattern_stmt);
>  }
>
>if (def_seq)
> @@ -4763,13 +4764,14 @@ vect_mark_pattern_stmts (gimple *orig_st
>  !gsi_end_p (si); gsi_next ())
>vect_init_pattern_stmt (gsi_stmt (si), orig_stmt_info, 
> pattern_vectype);
>
> -  if (old_pattern_p)
> +  if (orig_pattern_stmt)
>  {
>vect_init_pattern_stmt (pattern_stmt, orig_stmt_info, pattern_vectype);
>
>/* Insert all the new pattern statements before the original one.  */
>gimple_seq *orig_def_seq = _VINFO_PATTERN_DEF_SEQ 
> (orig_stmt_info);
> -  gimple_stmt_iterator gsi = gsi_for_stmt (orig_stmt, orig_def_seq);
> +  gimple_stmt_iterator gsi = gsi_for_stmt (orig_pattern_stmt,
> +  orig_def_seq);
>gsi_insert_seq_before_without_update (, def_seq, GSI_SAME_STMT);
>gsi_insert_before_without_update (, pattern_stmt, GSI_SAME_STMT);
>
> @@ -4785,12 +4787,12 @@ vect_mark_pattern_stmts (gimple *orig_st
> Input:
> PATTERN_RECOG_FUNC: A pointer to a function that detects a certain
>  computation pattern.
> -   STMT: A stmt from which the pattern search should start.
> +   STMT_INFO: A stmt from which the pattern search should start.
>
> If PATTERN_RECOG_FUNC successfully detected the pattern, it creates
> a sequence of statements that has the same functionality and can be
> -   used to replace STMT.  It returns the last statement in the sequence
> -   and adds any earlier statements to STMT's STMT_VINFO_PATTERN_DEF_SEQ.
> +   used to replace STMT_INFO.  It returns the last statement in the sequence
> +   and adds any earlier statements to STMT_INFO's STMT_VINFO_PATTERN_DEF_SEQ.
> PATTERN_RECOG_FUNC also sets *TYPE_OUT to the vector type of the final
> statement, having first checked that the target supports the new operation
> in that type.
> @@ -4799,10 +4801,10 @@ vect_mark_pattern_stmts (gimple *orig_st
> for vect_recog_pattern.  */
>
>  static void
> -vect_pattern_recog_1 (vect_recog_func *recog_func,

Re: [34/46] Alter interface to vect_get_vec_def_for_stmt_copy

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:06 PM Richard Sandiford
 wrote:
>
> This patch makes vect_get_vec_def_for_stmt_copy take a vec_info
> rather than a vect_def_type.  If the vector operand passed in is
> defined in the vectorised region, we should look for copies in
> the normal way.  If it's defined in an external statement
> (such as by vect_init_vector_1) we should just use the original value.

Ok, that works for non-SLP (which this is all about).

Would be nice to refactor this to a iterator interface somehow...

> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_get_vec_defs_for_stmt_copy)
> (vect_get_vec_def_for_stmt_copy): Take a vec_info rather than
> a vect_def_type for the first argument.
> * tree-vect-stmts.c (vect_get_vec_defs_for_stmt_copy): Likewise.
> (vect_get_vec_def_for_stmt_copy): Likewise.  Return the original
> operand if it isn't defined by a vectorized statement.
> (vect_build_gather_load_calls): Remove the mask_dt argument and
> update calls to vect_get_vec_def_for_stmt_copy.
> (vectorizable_bswap): Likewise the dt argument.
> (vectorizable_call): Update calls to vectorizable_bswap and
> vect_get_vec_def_for_stmt_copy.
> (vectorizable_simd_clone_call, vectorizable_assignment)
> (vectorizable_shift, vectorizable_operation, vectorizable_condition)
> (vectorizable_comparison): Update calls to
> vect_get_vec_def_for_stmt_copy.
> (vectorizable_store): Likewise.  Remove now-unnecessary calls to
> vect_is_simple_use.
> (vect_get_loop_based_defs): Remove dt argument and update call
> to vect_get_vec_def_for_stmt_copy.
> (vectorizable_conversion): Update calls to vect_get_loop_based_defs
> and vect_get_vec_def_for_stmt_copy.
> (vectorizable_load): Update calls to vect_build_gather_load_calls
> and vect_get_vec_def_for_stmt_copy.
> * tree-vect-loop.c (vect_create_epilog_for_reduction)
> (vectorizable_reduction, vectorizable_live_operation): Update calls
> to vect_get_vec_def_for_stmt_copy.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:50.008602115 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:56.440544995 +0100
> @@ -1514,11 +1514,11 @@ extern tree vect_get_vec_def_for_operand
>  extern tree vect_get_vec_def_for_operand (tree, stmt_vec_info, tree = NULL);
>  extern void vect_get_vec_defs (tree, tree, stmt_vec_info, vec *,
>vec *, slp_tree);
> -extern void vect_get_vec_defs_for_stmt_copy (enum vect_def_type *,
> +extern void vect_get_vec_defs_for_stmt_copy (vec_info *,
>  vec *, vec *);
>  extern tree vect_init_vector (stmt_vec_info, tree, tree,
>gimple_stmt_iterator *);
> -extern tree vect_get_vec_def_for_stmt_copy (enum vect_def_type, tree);
> +extern tree vect_get_vec_def_for_stmt_copy (vec_info *, tree);
>  extern bool vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
>   bool *, slp_tree, slp_instance);
>  extern void vect_remove_stores (stmt_vec_info);
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-07-24 10:23:50.008602115 +0100
> +++ gcc/tree-vect-stmts.c   2018-07-24 10:23:56.440544995 +0100
> @@ -1580,8 +1580,7 @@ vect_get_vec_def_for_operand (tree op, s
> created in case the vectorized result cannot fit in one vector, and 
> several
> copies of the vector-stmt are required.  In this case the vector-def is
> retrieved from the vector stmt recorded in the STMT_VINFO_RELATED_STMT 
> field
> -   of the stmt that defines VEC_OPRND.
> -   DT is the type of the vector def VEC_OPRND.
> +   of the stmt that defines VEC_OPRND.  VINFO describes the vectorization.
>
> Context:
>  In case the vectorization factor (VF) is bigger than the number
> @@ -1625,29 +1624,24 @@ vect_get_vec_def_for_operand (tree op, s
> STMT_VINFO_RELATED_STMT field of 'VS1.0' we obtain the next copy - 
> 'VS1.1',
> and return its def ('vx.1').
> Overall, to create the above sequence this function will be called 3 
> times:
> -vx.1 = vect_get_vec_def_for_stmt_copy (dt, vx.0);
> -vx.2 = vect_get_vec_def_for_stmt_copy (dt, vx.1);
> -vx.3 = vect_get_vec_def_for_stmt_copy (dt, vx.2);  */
> +   vx.1 = vect_get_vec_def_for_stmt_copy (vinfo, vx.0);
> +   vx.2 = vect_get_vec_def_for_stmt_copy (vinfo, vx.1);
> +   vx.3 = vect_get_vec_def_for_stmt_copy (vinfo, vx.2);  */
>
>  tree
> -vect_get_vec_def_for_stmt_copy (enum vect_def_type dt, tree vec_oprnd)
> +vect_get_vec_def_for_stmt_copy (vec_info *vinfo, tree vec_oprnd)
>  {
> -  gimple *vec_stmt_for_operand;
> -  stmt_vec_info

Re: [32/46] Use stmt_vec_info in function interfaces (part 2)

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:06 PM Richard Sandiford
 wrote:
>
> This second part handles the mechanical change from a gimple stmt
> argument to a stmt_vec_info argument.  It updates the function
> comments if they referred to the argument by name, but it doesn't
> try to retrofit mentions to other functions.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (nested_in_vect_loop_p): Move further down
> file and take a stmt_vec_info instead of a gimple stmt.
> (supportable_widening_operation, vect_finish_replace_stmt)
> (vect_finish_stmt_generation, vect_get_store_rhs)
> (vect_get_vec_def_for_operand_1, vect_get_vec_def_for_operand)
> (vect_get_vec_defs, vect_init_vector, vect_transform_stmt)
> (vect_remove_stores, vect_analyze_stmt, vectorizable_condition)
> (vect_get_smallest_scalar_type, vect_check_gather_scatter)
> (vect_create_data_ref_ptr, bump_vector_ptr)
> (vect_permute_store_chain, vect_setup_realignment)
> (vect_transform_grouped_load, vect_record_grouped_load_vectors)
> (vect_create_addr_base_for_vector_ref, vectorizable_live_operation)
> (vectorizable_reduction, vectorizable_induction)
> (get_initial_def_for_reduction, is_simple_and_all_uses_invariant)
> (vect_get_place_in_interleaving_chain): Take stmt_vec_infos rather
> than gimple stmts as arguments.
> * tree-vect-data-refs.c (vect_get_smallest_scalar_type)
> (vect_preserves_scalar_order_p, vect_slp_analyze_node_dependences)
> (can_group_stmts_p, vect_check_gather_scatter)
> (vect_create_addr_base_for_vector_ref, vect_create_data_ref_ptr)
> (bump_vector_ptr, vect_permute_store_chain, vect_setup_realignment)
> (vect_permute_load_chain, vect_shift_permute_load_chain)
> (vect_transform_grouped_load)
> (vect_record_grouped_load_vectors): Likewise.
> * tree-vect-loop.c (vect_fixup_reduc_chain)
> (get_initial_def_for_reduction, vect_create_epilog_for_reduction)
> (vectorize_fold_left_reduction, is_nonwrapping_integer_induction)
> (vectorizable_reduction, vectorizable_induction)
> (vectorizable_live_operation, vect_loop_kill_debug_uses): Likewise.
> * tree-vect-patterns.c (type_conversion_p, adjust_bool_stmts)
> (vect_get_load_store_mask): Likewise.
> * tree-vect-slp.c (vect_get_place_in_interleaving_chain)
> (vect_analyze_slp_instance, vect_mask_constant_operand_p): Likewise.
> * tree-vect-stmts.c (vect_mark_relevant)
> (is_simple_and_all_uses_invariant)
> (exist_non_indexing_operands_for_use_p, process_use)
> (vect_init_vector_1, vect_init_vector, vect_get_vec_def_for_operand_1)
> (vect_get_vec_def_for_operand, vect_get_vec_defs)
> (vect_finish_stmt_generation_1, vect_finish_replace_stmt)
> (vect_finish_stmt_generation, vect_truncate_gather_scatter_offset)
> (compare_step_with_zero, vect_get_store_rhs, 
> get_group_load_store_type)
> (get_negative_load_store_type, get_load_store_type)
> (vect_check_load_store_mask, vect_check_store_rhs)
> (vect_build_gather_load_calls, vect_get_strided_load_store_ops)
> (vectorizable_bswap, vectorizable_call, vectorizable_simd_clone_call)
> (vect_create_vectorized_demotion_stmts, vectorizable_conversion)
> (vectorizable_assignment, vectorizable_shift, vectorizable_operation)
> (get_group_alias_ptr_type, vectorizable_store, hoist_defs_of_uses)
> (vectorizable_load, vectorizable_condition, vectorizable_comparison)
> (vect_analyze_stmt, vect_transform_stmt, vect_remove_stores)
> (supportable_widening_operation): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:35.384731983 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:50.008602115 +0100
> @@ -627,13 +627,6 @@ loop_vec_info_for_loop (struct loop *loo
>return (loop_vec_info) loop->aux;
>  }
>
> -static inline bool
> -nested_in_vect_loop_p (struct loop *loop, gimple *stmt)
> -{
> -  return (loop->inner
> -  && (loop->inner == (gimple_bb (stmt))->loop_father));
> -}
> -
>  typedef struct _bb_vec_info : public vec_info
>  {
>_bb_vec_info (gimple_stmt_iterator, gimple_stmt_iterator, vec_info_shared 
> *);
> @@ -1119,6 +1112,13 @@ set_vinfo_for_stmt (gimple *stmt, stmt_v
>  }
>  }
>
> +static inline bool
> +nested_in_vect_loop_p (struct loop *loop, stmt_vec_info stmt_info)
> +{
> +  return (loop->inner
> + && (loop->inner == (gimple_bb (stmt_info->stmt))->loop_father));
> +}
> +
>  /* Return the earlier statement between STMT1_INFO and STMT2_INFO.  */
>
>  static inline stmt_vec_info
> @@ -1493,8 +1493,8 @@ extern bool vect_is_simple_use (tree, ve
>  extern bool vect_is_simple_use (tree, vec_info *,

Re: [33/46] Use stmt_vec_infos instead of vec_info/gimple stmt pairs

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:06 PM Richard Sandiford
 wrote:
>
> This patch makes vect_record_max_nunits and vect_record_base_alignment
> take a stmt_vec_info instead of a vec_info/gimple pair.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_record_base_alignment): Replace vec_info
> and gimple stmt arguments with a stmt_vec_info.
> (vect_record_base_alignments): Update calls accordingly.
> * tree-vect-slp.c (vect_record_max_nunits): Replace vec_info
> and gimple stmt arguments with a stmt_vec_info.
> (vect_build_slp_tree_1): Remove vinfo argument and update call
> to vect_record_max_nunits.
> (vect_build_slp_tree_2): Update calls to vect_build_slp_tree_1
> and vect_record_max_nunits.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:50.000602186 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:53.204573732 +0100
> @@ -794,14 +794,14 @@ vect_slp_analyze_instance_dependence (sl
>return res;
>  }
>
> -/* Record in VINFO the base alignment guarantee given by DRB.  STMT is
> -   the statement that contains DRB, which is useful for recording in the
> -   dump file.  */
> +/* Record the base alignment guarantee given by DRB, which occurs
> +   in STMT_INFO.  */
>
>  static void
> -vect_record_base_alignment (vec_info *vinfo, gimple *stmt,
> +vect_record_base_alignment (stmt_vec_info stmt_info,
> innermost_loop_behavior *drb)
>  {
> +  vec_info *vinfo = stmt_info->vinfo;
>bool existed;
>innermost_loop_behavior *
>  = vinfo->base_alignments.get_or_insert (drb->base_address, );
> @@ -820,7 +820,7 @@ vect_record_base_alignment (vec_info *vi
>"  misalignment: %d\n", drb->base_misalignment);
>   dump_printf_loc (MSG_NOTE, vect_location,
>"  based on: ");
> - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);
> }
>  }
>  }
> @@ -847,13 +847,13 @@ vect_record_base_alignments (vec_info *v
>   && STMT_VINFO_VECTORIZABLE (stmt_info)
>   && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> {
> - vect_record_base_alignment (vinfo, stmt_info, _INNERMOST (dr));
> + vect_record_base_alignment (stmt_info, _INNERMOST (dr));
>
>   /* If DR is nested in the loop that is being vectorized, we can also
>  record the alignment of the base wrt the outer loop.  */
>   if (loop && nested_in_vect_loop_p (loop, stmt_info))
> vect_record_base_alignment
> -   (vinfo, stmt_info, _VINFO_DR_WRT_VEC_LOOP (stmt_info));
> + (stmt_info, _VINFO_DR_WRT_VEC_LOOP (stmt_info));
> }
>  }
>  }
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-24 10:23:50.004602150 +0100
> +++ gcc/tree-vect-slp.c 2018-07-24 10:23:53.204573732 +0100
> @@ -609,14 +609,14 @@ compatible_calls_p (gcall *call1, gcall
>  }
>
>  /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
> -   caller's attempt to find the vector type in STMT with the narrowest
> +   caller's attempt to find the vector type in STMT_INFO with the narrowest
> element type.  Return true if VECTYPE is nonnull and if it is valid
> -   for VINFO.  When returning true, update MAX_NUNITS to reflect the
> -   number of units in VECTYPE.  VINFO, GORUP_SIZE and MAX_NUNITS are
> -   as for vect_build_slp_tree.  */
> +   for STMT_INFO.  When returning true, update MAX_NUNITS to reflect the
> +   number of units in VECTYPE.  GROUP_SIZE and MAX_NUNITS are as for
> +   vect_build_slp_tree.  */
>
>  static bool
> -vect_record_max_nunits (vec_info *vinfo, gimple *stmt, unsigned int 
> group_size,
> +vect_record_max_nunits (stmt_vec_info stmt_info, unsigned int group_size,
> tree vectype, poly_uint64 *max_nunits)
>  {
>if (!vectype)
> @@ -625,7 +625,8 @@ vect_record_max_nunits (vec_info *vinfo,
> {
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"Build SLP failed: unsupported data-type in ");
> - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> +   stmt_info->stmt, 0);
>   dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> }
>/* Fatal mismatch.  */
> @@ -636,7 +637,7 @@ vect_record_max_nunits (vec_info *vinfo,
>   before adjusting *max_nunits for basic-block vectorization.  */
>poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>unsigned HOST_WIDE_INT const_nunits;
> -  if (is_a  (vinfo)
> +  if (STMT_VINFO_BB_VINFO (stmt_info)
>&& (!nunits.is_constant (_nunits)

Re: [31/46] Use stmt_vec_info in function interfaces (part 1)

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:05 PM Richard Sandiford
 wrote:
>
> This first (less mechanical) part handles cases that involve changes in
> the callers or non-trivial changes in the functions themselves.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_describe_gather_scatter_call): Take
> a stmt_vec_info instead of a gcall.
> (vect_check_gather_scatter): Update call accordingly.
> * tree-vect-loop-manip.c (iv_phi_p): Take a stmt_vec_info instead
> of a gphi.
> (vect_can_advance_ivs_p, vect_update_ivs_after_vectorizer)
> (slpeel_update_phi_nodes_for_loops):): Update calls accordingly.
> * tree-vect-loop.c (vect_transform_loop_stmt): Take a stmt_vec_info
> instead of a gimple stmt.
> (vect_transform_loop): Update calls accordingly.
> * tree-vect-slp.c (vect_split_slp_store_group): Take and return
> stmt_vec_infos instead of gimple stmts.
> (vect_analyze_slp_instance): Update use accordingly.
> * tree-vect-stmts.c (read_vector_array, write_vector_array)
> (vect_clobber_variable, vect_stmt_relevant_p, permute_vec_elements)
> (vect_use_strided_gather_scatters_p, vect_build_all_ones_mask)
> (vect_build_zero_merge_argument, vect_get_gather_scatter_ops)
> (vect_gen_widened_results_half, vect_get_loop_based_defs)
> (vect_create_vectorized_promotion_stmts, can_vectorize_live_stmts):
> Take a stmt_vec_info instead of a gimple stmt and pass stmt_vec_infos
> down to subroutines.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:35.376732054 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:46.108636749 +0100
> @@ -3621,13 +3621,14 @@ vect_gather_scatter_fn_p (bool read_p, b
>return true;
>  }
>
> -/* CALL is a call to an internal gather load or scatter store function.
> +/* STMT_INFO is a call to an internal gather load or scatter store function.
> Describe the operation in INFO.  */
>
>  static void
> -vect_describe_gather_scatter_call (gcall *call, gather_scatter_info *info)
> +vect_describe_gather_scatter_call (stmt_vec_info stmt_info,
> +  gather_scatter_info *info)
>  {
> -  stmt_vec_info stmt_info = vinfo_for_stmt (call);
> +  gcall *call = as_a  (stmt_info->stmt);
>tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
>
> @@ -3672,7 +3673,7 @@ vect_check_gather_scatter (gimple *stmt,
>ifn = gimple_call_internal_fn (call);
>if (internal_gather_scatter_fn_p (ifn))
> {
> - vect_describe_gather_scatter_call (call, info);
> + vect_describe_gather_scatter_call (stmt_info, info);
>   return true;
> }
>masked_p = (ifn == IFN_MASK_LOAD || ifn == IFN_MASK_STORE);
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2018-07-24 10:23:35.376732054 +0100
> +++ gcc/tree-vect-loop-manip.c  2018-07-24 10:23:46.112636713 +0100
> @@ -1335,16 +1335,16 @@ find_loop_location (struct loop *loop)
>return dump_user_location_t ();
>  }
>
> -/* Return true if PHI defines an IV of the loop to be vectorized.  */
> +/* Return true if the phi described by STMT_INFO defines an IV of the
> +   loop to be vectorized.  */
>
>  static bool
> -iv_phi_p (gphi *phi)
> +iv_phi_p (stmt_vec_info stmt_info)
>  {
> +  gphi *phi = as_a  (stmt_info->stmt);
>if (virtual_operand_p (PHI_RESULT (phi)))
>  return false;
>
> -  stmt_vec_info stmt_info = vinfo_for_stmt (phi);
> -  gcc_assert (stmt_info != NULL_STMT_VEC_INFO);
>if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
>|| STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
>  return false;
> @@ -1388,7 +1388,7 @@ vect_can_advance_ivs_p (loop_vec_info lo
>  virtual defs/uses (i.e., memory accesses) are analyzed elsewhere.
>
>  Skip reduction phis.  */
> -  if (!iv_phi_p (phi))
> +  if (!iv_phi_p (phi_info))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1509,7 +1509,7 @@ vect_update_ivs_after_vectorizer (loop_v
> }
>
>/* Skip reduction and virtual phis.  */
> -  if (!iv_phi_p (phi))
> +  if (!iv_phi_p (phi_info))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -2088,7 +2088,8 @@ slpeel_update_phi_nodes_for_loops (loop_
>tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
>/* Generate lcssa PHI node for the first loop.  */
>gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> -  if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi))
> +  stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt

Re: [30/46] Use stmt_vec_infos rather than gimple stmts for worklists

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:05 PM Richard Sandiford
 wrote:
>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_analyze_scalar_cycles_1): Change the type
> of the worklist from a vector of gimple stmts to a vector of
> stmt_vec_infos.
> * tree-vect-stmts.c (vect_mark_relevant, process_use)
> (vect_mark_stmts_to_be_vectorized): Likewise

OK

> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-24 10:23:38.964700191 +0100
> +++ gcc/tree-vect-loop.c2018-07-24 10:23:42.472669038 +0100
> @@ -474,7 +474,7 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
>  {
>basic_block bb = loop->header;
>tree init, step;
> -  auto_vec worklist;
> +  auto_vec worklist;
>gphi_iterator gsi;
>bool double_reduc;
>
> @@ -543,9 +543,9 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
>/* Second - identify all reductions and nested cycles.  */
>while (worklist.length () > 0)
>  {
> -  gimple *phi = worklist.pop ();
> +  stmt_vec_info stmt_vinfo = worklist.pop ();
> +  gphi *phi = as_a  (stmt_vinfo->stmt);
>tree def = PHI_RESULT (phi);
> -  stmt_vec_info stmt_vinfo = vinfo_for_stmt (phi);
>
>if (dump_enabled_p ())
>  {
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-07-24 10:23:38.968700155 +0100
> +++ gcc/tree-vect-stmts.c   2018-07-24 10:23:42.472669038 +0100
> @@ -194,7 +194,7 @@ vect_clobber_variable (gimple *stmt, gim
> Mark STMT as "relevant for vectorization" and add it to WORKLIST.  */
>
>  static void
> -vect_mark_relevant (vec *worklist, gimple *stmt,
> +vect_mark_relevant (vec *worklist, gimple *stmt,
> enum vect_relevant relevant, bool live_p)
>  {
>stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> @@ -453,7 +453,7 @@ exist_non_indexing_operands_for_use_p (t
>
>  static bool
>  process_use (gimple *stmt, tree use, loop_vec_info loop_vinfo,
> -enum vect_relevant relevant, vec *worklist,
> +enum vect_relevant relevant, vec *worklist,
>  bool force)
>  {
>stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
> @@ -618,16 +618,14 @@ vect_mark_stmts_to_be_vectorized (loop_v
>basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
>unsigned int nbbs = loop->num_nodes;
>gimple_stmt_iterator si;
> -  gimple *stmt;
>unsigned int i;
> -  stmt_vec_info stmt_vinfo;
>basic_block bb;
>bool live_p;
>enum vect_relevant relevant;
>
>DUMP_VECT_SCOPE ("vect_mark_stmts_to_be_vectorized");
>
> -  auto_vec worklist;
> +  auto_vec worklist;
>
>/* 1. Init worklist.  */
>for (i = 0; i < nbbs; i++)
> @@ -665,17 +663,17 @@ vect_mark_stmts_to_be_vectorized (loop_v
>use_operand_p use_p;
>ssa_op_iter iter;
>
> -  stmt = worklist.pop ();
> +  stmt_vec_info stmt_vinfo = worklist.pop ();
>if (dump_enabled_p ())
> {
> -  dump_printf_loc (MSG_NOTE, vect_location, "worklist: examine stmt: 
> ");
> -  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "worklist: examine stmt: ");
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_vinfo->stmt, 0);
> }
>
>/* Examine the USEs of STMT. For each USE, mark the stmt that defines 
> it
>  (DEF_STMT) as relevant/irrelevant according to the relevance property
>  of STMT.  */
> -  stmt_vinfo = vinfo_for_stmt (stmt);
>relevant = STMT_VINFO_RELEVANT (stmt_vinfo);
>
>/* Generally, the relevance property of STMT (in STMT_VINFO_RELEVANT) 
> is

Re: [Patch] [Aarch64] PR 86538 - Define __ARM_FEATURE_LSE if LSE is available

2018-07-25 Thread Richard Earnshaw (lists)

On 24/07/18 22:55, Steve Ellcey wrote:
> On Tue, 2018-07-24 at 22:04 +0100, James Greenhalgh wrote:
>>  
>>
>> I'd say this patch isn't desirable for trunk. I'd be interested in use cases
>> that need a static decision on presence of LSE that are not better expressed
>> using higher level language features.
>>
>> Thanks,
>> James
> 
> How about when building the higher level features?  Right now,
> in sysdeps/aarch64/atomic-machine.h, we
> hardcode ATOMIC_EXCHANGE_USES_CAS to 0.  If we had __ARM_FEATURE_LSE we
> could use that to determine if we wanted to set
> ATOMIC_EXCHANGE_USES_CAS to 0 or 1 which would affect the call
> generated in nptl/pthread_spin_lock.c.  That would be useful if we
> built a lipthread specifically for a platform that had LSE.
> 
> Steve Ellcey
> sell...@cavium.com
> 

If there is a case for such a define, it needs to be made with the ACLE
specification maintainers.  I don't think GCC should be ploughing a
separate furrow here.

So make your case to the ACLE maintainers.  If that adopts a pre-define,
then implementing it in GCC would go through on the nod.

R.

Re: [29/46] Use stmt_vec_info instead of gimple stmts internally (part 2)

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:04 PM Richard Sandiford
 wrote:
>
> This second part handles the less mechnical cases, i.e. those that don't
> just involve swapping a gimple stmt for an existing stmt_vec_info.

OK.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_analyze_loop_operations): Look up the
> statement before passing it to vect_analyze_stmt.
> (vect_create_epilog_for_reduction): Use a stmt_vec_info to walk
> the chain of phi vector definitions.  Track the exit phi via its
> stmt_vec_info.
> (vectorizable_reduction): Set cond_stmt_vinfo directly from the
> STMT_VINFO_REDUC_DEF.
> * tree-vect-slp.c (vect_get_place_in_interleaving_chain): Use
> stmt_vec_infos to handle the statement chains.
> (vect_get_slp_defs): Record the first statement in the node
> using a stmt_vec_info.
> * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Look up
> statements here and pass their stmt_vec_info down to subroutines.
> (vect_init_vector_1): Hoist call to vinfo_for_stmt and pass it
> down to vect_finish_stmt_generation.
> (vect_init_vector, vect_get_vec_defs, vect_finish_replace_stmt)
> (vect_finish_stmt_generation): Call vinfo_for_stmt and pass
> stmt_vec_infos to subroutines.
> (vect_remove_stores): Use stmt_vec_infos to handle the statement
> chains.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-24 10:23:35.376732054 +0100
> +++ gcc/tree-vect-loop.c2018-07-24 10:23:38.964700191 +0100
> @@ -1629,8 +1629,9 @@ vect_analyze_loop_operations (loop_vec_i
>  {
>   gimple *stmt = gsi_stmt (si);
>   if (!gimple_clobber_p (stmt)
> - && !vect_analyze_stmt (stmt, _to_vectorize, NULL, NULL,
> -_vec))
> + && !vect_analyze_stmt (loop_vinfo->lookup_stmt (stmt),
> +_to_vectorize,
> +NULL, NULL, _vec))
> return false;
>  }
>  } /* bbs */
> @@ -4832,11 +4833,11 @@ vect_create_epilog_for_reduction (vectree first_vect = PHI_RESULT (new_phis[0]);
>gassign *new_vec_stmt = NULL;
>vec_dest = vect_create_destination_var (scalar_dest, vectype);
> -  gimple *next_phi = new_phis[0];
> +  stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]);
>for (int k = 1; k < ncopies; ++k)
> {
> - next_phi = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (next_phi));
> - tree second_vect = PHI_RESULT (next_phi);
> + next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info);
> + tree second_vect = PHI_RESULT (next_phi_info->stmt);
>tree tem = make_ssa_name (vec_dest, new_vec_stmt);
>new_vec_stmt = gimple_build_assign (tem, code,
>   first_vect, second_vect);
> @@ -5573,11 +5574,12 @@ vect_create_epilog_for_reduction (vecelse
>  ratio = 1;
>
> +  stmt_vec_info epilog_stmt_info = NULL;
>for (k = 0; k < group_size; k++)
>  {
>if (k % ratio == 0)
>  {
> -  epilog_stmt = new_phis[k / ratio];
> + epilog_stmt_info = loop_vinfo->lookup_stmt (new_phis[k / ratio]);
>   reduction_phi_info = reduction_phis[k / ratio];
>   if (double_reduc)
> inner_phi = inner_phis[k / ratio];
> @@ -5623,8 +5625,7 @@ vect_create_epilog_for_reduction (vec   if (double_reduc)
> STMT_VINFO_VEC_STMT (exit_phi_vinfo) = inner_phi;
>   else
> -   STMT_VINFO_VEC_STMT (exit_phi_vinfo)
> - = vinfo_for_stmt (epilog_stmt);
> +   STMT_VINFO_VEC_STMT (exit_phi_vinfo) = epilog_stmt_info;
>if (!double_reduc
>|| STMT_VINFO_DEF_TYPE (exit_phi_vinfo)
>!= vect_double_reduction_def)
> @@ -6070,7 +6071,7 @@ vectorizable_reduction (gimple *stmt, gi
>optab optab;
>tree new_temp = NULL_TREE;
>enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
> -  gimple *cond_reduc_def_stmt = NULL;
> +  stmt_vec_info cond_stmt_vinfo = NULL;
>enum tree_code cond_reduc_op_code = ERROR_MARK;
>tree scalar_type;
>bool is_simple_use;
> @@ -6348,7 +6349,7 @@ vectorizable_reduction (gimple *stmt, gi
>   && is_nonwrapping_integer_induction (def_stmt_info, loop))
> {
>   cond_reduc_dt = dt;
> - cond_reduc_def_stmt = def_stmt_info;
> + cond_stmt_vinfo = def_stmt_info;
> }
> }
>  }
> @@ -6454,7 +6455,6 @@ vectorizable_reduction (gimple *stmt, gi
> }
>else if (cond_reduc_dt == vect_induction_def)
> {
> - stmt_vec_info cond_stmt_vinfo = vinfo_for_stmt 
>

Re: GCC 8.2 Status Report (2018-07-19), branch frozen for release

2018-07-25 Thread Richard Biener

On Wed, 25 Jul 2018, Richard Earnshaw (lists) wrote:

> On 24/07/18 17:30, Richard Biener wrote:
> > On July 24, 2018 5:50:33 PM GMT+02:00, Ramana Radhakrishnan 
> >  wrote:
> >> On Thu, Jul 19, 2018 at 10:11 AM, Richard Biener 
> >> wrote:
> >>>
> >>> Status
> >>> ==
> >>>
> >>> The GCC 8 branch is frozen for preparation of the GCC 8.2 release.
> >>> All changes to the branch now require release manager approval.
> >>>
> >>>
> >>> Previous Report
> >>> ===
> >>>
> >>> https://gcc.gnu.org/ml/gcc/2018-07/msg00194.html
> >>
> >> Is there any chance we can get some of the spectrev1 mitigation
> >> patches reviewed and into 8.2 .
> > 
> > It's now too late for that and it has to wait for 8.2.
> 
> Why?  It was waiting on YOU for a review and was posted before you
> announced the freeze.

There's nothing like posting before freeze gives you extra time
like we do when transitioning to stage3.  You are requesting a
feature backport anyway.

Anyway, you got a review from ME that showed the series isn't ready,
esp. for going into a stable code-base last minute before a release.
But I assume "YOU" addressed all volunteer GCC reviewers.

Richard.

Re: [PATCH][GCC][Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

2018-07-25 Thread Thomas Preudhomme

Hi Tamar,

On Mon, 23 Jul 2018 at 17:56, Tamar Christina  wrote:
>
> Hi All,
>
> My previous patch changed arm_can_change_mode_class to allow subregs of
> 64bit registers on arm big-endian.  However it seems that we can't do this
> because a the data in 64 bit VFP registers are stored in little-endian order,
> even on big-endian.
>
> Allowing this change had a knock on effect that caused GCC's no-op detection
> to think that loading from the first lane on arm big-endian is a no-op.  this
> because we can't describe the weird ordering we have on D registers on 
> big-endian.
>
> The original issue comes from the fact that the code does
>
> ... foo (... bar)
> {
>   return bar;
> }
>
> The expansion of the return statement causes GCC to try to return the value in
> a register.  GCC will try to emit the move then, from MEM to REG (due to the 
> SSA
> temporary.).  It checks for a mov optab for this which isn't available and
> then tries to do the move in bits using emit_move_multi_word.
>
> emit_move_multi_word will split the move into sub parts, but then needs to get
> the sub parts and does this using subregs, but it's told it can't do subregs!
>
> The compiler is now stuck in an infinite loop.
>
> The way this is worked around in the back-end is that we have move patterns in
> neon.md that usually just force the register instead of checking with the
> back-end. This prevents emit_move_multi_word from being needed.  However the
> pattern for V4HF and V8HF were guarded by TARGET_NEON && TARGET_FP16.
>
> I don't believe the TARGET_FP16 guard to be needed, because the pattern 
> doesn't
> actually generate code and requires another pattern for that, and a reg to 
> reg move
> should always be possible anyway. So allowing the force to register here is 
> safe
> and it allows the compiler to generate a correct error instead of ICEing in an
> infinite loop.

How about subreg to subreg move? Doesn't that expand to more insns
(subreg to reg and reg to subreg)? Couldn't you improve the logic to
check that there is actually a mode change so that if there isn't
(like moving from one subreg to another) just expand to a single move?

Best regards,

Thomas

>
> This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without 
> introducing
> any regressions while fixing
>
> gcc.dg/vect/vect-nop-move.c execution test
> g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test
> g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test
> g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test
>
> Regtested on armeb-none-eabi and no regressions.
> Bootstrapped on arm-none-linux-gnueabihf and no issues.
>
>
> Ok for trunk?
>
> Thanks,
> Tamar
>
> gcc/
> 2018-07-23  Tamar Christina  
>
> PR target/84711
> * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.
> * config/arm/neon.md (movv4hf, movv8hf): Refactored to..
> (mov): ..this and enable unconditionally.
>
> --

Re: [PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-25 Thread Richard Earnshaw (lists)

On 24/07/18 18:26, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 6:40 PM Richard Earnshaw
>  wrote:
>>
>>
>> This patch defines a new intrinsic function
>> __builtin_speculation_safe_value.  A generic default implementation is
>> defined which will attempt to use the backend pattern
>> "speculation_safe_barrier".  If this pattern is not defined, or if it
>> is not available, then the compiler will emit a warning, but
>> compilation will continue.
>>
>> Note that the test spec-barrier-1.c will currently fail on all
>> targets.  This is deliberate, the failure will go away when
>> appropriate action is taken for each target backend.
> 
> So given this series is supposed to be backported I question
> 
> +rtx
> +default_speculation_safe_value (machine_mode mode ATTRIBUTE_UNUSED,
> +   rtx result, rtx val,
> +   rtx failval ATTRIBUTE_UNUSED)
> +{
> +  emit_move_insn (result, val);
> +#ifdef HAVE_speculation_barrier
> +  /* Assume the target knows what it is doing: if it defines a
> + speculation barrier, but it is not enabled, then assume that one
> + isn't needed.  */
> +  if (HAVE_speculation_barrier)
> +emit_insn (gen_speculation_barrier ());
> +
> +#else
> +  warning_at (input_location, 0,
> + "this target does not define a speculation barrier; "
> + "your program will still execute correctly, but speculation "
> + "will not be inhibited");
> +#endif
> +  return result;
> 
> which makes all but aarch64 archs warn on __bultin_speculation_safe_value
> uses, even those that do not suffer from Spectre like all those embedded 
> targets
> where implementations usually do not speculate at all.
> 
> In fact for those targets the builtin stays in the way of optimization on 
> GIMPLE
> as well so we should fold it away early if neither the target hook is
> implemented
> nor there is a speculation_barrier insn.
> 
> So, please make resolve_overloaded_builtin return a no-op on such targets
> which means you can remove the above warning.  Maybe such targets
> shouldn't advertise / initialize the builtins at all?

I disagree with your approach here.  Why would users not want to know
when the compiler is failing to implement a security feature when it
should?  As for targets that don't need something, they can easily
define the hook as described to suppress the warning.

Or are you just suggesting moving the warning to resolve overloaded builtin.

Other ports will need to take action, but in general, it can be as
simple as, eg patch 2 or 3 do for the Arm and AArch64 backends - or
simpler still if nothing is needed for that architecture.

There is a test which is intended to fail to targets that have not yet
been patched - I thought that was better than hard-failing the build,
especially given that we want to back-port.

Port maintainers DO need to decide what to do about speculation, even if
it is explicitly that no mitigation is needed.

> 
> The builtins also have no attributes which mean they are assumed to be
> 1) calling back into the CU via exported functions, 2) possibly throwing
> exceptions, 3) affecting memory state.  I think you at least want
> to use ATTR_NOTHROW_LEAF_LIST.
> 
> The builtins are not designed to be optimization or memory barriers as
> far as I can see and should thus be CONST as well.
> 

I think they should be barriers.  They do need to ensure that they can't
be moved past other operations that might depend on the speculation
state.  Consider, for example,

 ...
 t = untrusted_value;
 ...
 if (t + 5 < limit)
 {
   v = mem[__builtin_speculation_safe_value (untrusted_value)];
   ...

The compiler must never lift the builtin outside the bounds check as
that is part of the speculation state.


> BUILT_IN_SPECULATION_SAFE_VALUE_PTR is declared but
> nowhere generated?  Maybe
> 
> +case BUILT_IN_SPECULATION_SAFE_VALUE_N:
> +  {
> +   int n = speculation_safe_value_resolve_size (function, params);
> +   tree new_function, first_param, result;
> +   enum built_in_function fncode;
> +
> +   if (n == -1)
> + return error_mark_node;
> +   else if (n == 0)
> + fncode = (enum built_in_function)((int)orig_code + 1);
> +   else
> + fncode
> +   = (enum built_in_function)((int)orig_code + exact_log2 (n) + 2);
> 
> resolve_size does that?  Why can that not return the built_in_function
> itself or BUILT_IN_NONE on error to make that clearer?
> 
> Otherwise it looks reasonable but C FE maintainers should comment.
> I miss C++ testcases (or rather testcases should be in c-c++-common).
> 
> Richard.
> 
>> gcc:
>> * builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
>> (BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
>> (BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
>> * builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
>> (BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-25 Thread Segher Boessenkool

On Wed, Jul 25, 2018 at 10:28:30AM +0200, Richard Biener wrote:
> On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool
>  wrote:
> >
> > This patch allows combine to combine two insns into two.  This helps
> > in many cases, by reducing instruction path length, and also allowing
> > further combinations to happen.  PR85160 is a typical example of code
> > that it can improve.
> >
> > This patch does not allow such combinations if either of the original
> > instructions was a simple move instruction.  In those cases combining
> > the two instructions increases register pressure without improving the
> > code.  With this move test register pressure does no longer increase
> > noticably as far as I can tell.
> >
> > (At first I also didn't allow either of the resulting insns to be a
> > move instruction.  But that is actually a very good thing to have, as
> > should have been obvious).
> >
> > Tested for many months; tested on about 30 targets.
> >
> > I'll commit this later this week if there are no objections.
> 
> Sounds good - but, _any_ testcase?  Please! ;)

I only have target-specific ones.  Most *simple* ones will already be
optimised by current code (via 3->2 combination).  But I've now got one
that trunk does not optimise, and it can be confirmed with looking at
the resulting machine code even (no need to look at the combine dump,
which is a very good thing).  And it is a proper thing to test even: it
tests that some source is compiled to properly optimised machine code.

Any other kind of testcase is worse than useless, of course.

Testing it results in working code isn't very feasible or useful either.

Segher

Re: GCC 8.2 Status Report (2018-07-19), branch frozen for release

2018-07-25 Thread Richard Earnshaw (lists)

On 24/07/18 17:30, Richard Biener wrote:
> On July 24, 2018 5:50:33 PM GMT+02:00, Ramana Radhakrishnan 
>  wrote:
>> On Thu, Jul 19, 2018 at 10:11 AM, Richard Biener 
>> wrote:
>>>
>>> Status
>>> ==
>>>
>>> The GCC 8 branch is frozen for preparation of the GCC 8.2 release.
>>> All changes to the branch now require release manager approval.
>>>
>>>
>>> Previous Report
>>> ===
>>>
>>> https://gcc.gnu.org/ml/gcc/2018-07/msg00194.html
>>
>> Is there any chance we can get some of the spectrev1 mitigation
>> patches reviewed and into 8.2 .
> 
> It's now too late for that and it has to wait for 8.2.

Why?  It was waiting on YOU for a review and was posted before you
announced the freeze.

R.

> 
>> It would be quite useful to get these into a release as I see that the
>> reviews are kinda petering out and there hasn't been any objection to
>> the approach.
> 
> It's not that people only use release tarballs.
> 
> Richard. 
>>
>>
>> regards
>> Ramana
>

Re: [28/46] Use stmt_vec_info instead of gimple stmts internally (part 1)

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:04 PM Richard Sandiford
 wrote:
>
> This first part makes functions use stmt_vec_infos instead of
> gimple stmts in cases where the stmt_vec_info was already available
> and where the change is mechanical.  Most of it is just replacing
> "stmt" with "stmt_info".

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
> (vect_check_gather_scatter, vect_create_data_ref_ptr, bump_vector_ptr)
> (vect_permute_store_chain, vect_setup_realignment)
> (vect_permute_load_chain, vect_shift_permute_load_chain)
> (vect_transform_grouped_load): Use stmt_vec_info rather than gimple
> stmts internally, and when passing values to other vectorizer 
> routines.
> * tree-vect-loop-manip.c (vect_can_advance_ivs_p): Likewise.
> * tree-vect-loop.c (vect_analyze_scalar_cycles_1)
> (vect_analyze_loop_operations, get_initial_def_for_reduction)
> (vect_create_epilog_for_reduction, vectorize_fold_left_reduction)
> (vectorizable_reduction, vectorizable_induction)
> (vectorizable_live_operation, vect_transform_loop_stmt)
> (vect_transform_loop): Likewise.
> * tree-vect-patterns.c (vect_reassociating_reduction_p)
> (vect_recog_widen_op_pattern, vect_recog_mixed_size_cond_pattern)
> (vect_recog_bool_pattern, vect_recog_gather_scatter_pattern): 
> Likewise.
> * tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
> (vect_slp_analyze_node_operations_1): Likewise.
> * tree-vect-stmts.c (vect_mark_relevant, process_use)
> (exist_non_indexing_operands_for_use_p, vect_init_vector_1)
> (vect_mark_stmts_to_be_vectorized, vect_get_vec_def_for_operand)
> (vect_finish_stmt_generation_1, get_group_load_store_type)
> (get_load_store_type, vect_build_gather_load_calls)
> (vectorizable_bswap, vectorizable_call, vectorizable_simd_clone_call)
> (vect_create_vectorized_demotion_stmts, vectorizable_conversion)
> (vectorizable_assignment, vectorizable_shift, vectorizable_operation)
> (vectorizable_store, vectorizable_load, vectorizable_condition)
> (vectorizable_comparison, vect_analyze_stmt, vect_transform_stmt)
> (supportable_widening_operation): Likewise.
> (vect_get_vector_types_for_stmt): Likewise.
> * tree-vectorizer.h (vect_dr_behavior): Likewise.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:31.736764378 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:35.376732054 +0100
> @@ -712,7 +712,7 @@ vect_slp_analyze_node_dependences (slp_i
>  been sunk to (and we verify if we can do that as well).  */
>   if (gimple_visited_p (stmt))
> {
> - if (stmt != last_store)
> + if (stmt_info != last_store)
> continue;
>   unsigned i;
>   stmt_vec_info store_info;
> @@ -3666,7 +3666,7 @@ vect_check_gather_scatter (gimple *stmt,
>
>/* See whether this is already a call to a gather/scatter internal 
> function.
>   If not, see whether it's a masked load or store.  */
> -  gcall *call = dyn_cast  (stmt);
> +  gcall *call = dyn_cast  (stmt_info->stmt);
>if (call && gimple_call_internal_p (call))
>  {
>ifn = gimple_call_internal_fn (call);
> @@ -4677,8 +4677,8 @@ vect_create_data_ref_ptr (gimple *stmt,
>if (loop_vinfo)
>  {
>loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
> -  containing_loop = (gimple_bb (stmt))->loop_father;
> +  nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt_info);
> +  containing_loop = (gimple_bb (stmt_info->stmt))->loop_father;
>pe = loop_preheader_edge (loop);
>  }
>else
> @@ -4786,7 +4786,7 @@ vect_create_data_ref_ptr (gimple *stmt,
>
>/* Create: (&(base[init_val+offset]+byte_offset) in the loop preheader.  */
>
> -  new_temp = vect_create_addr_base_for_vector_ref (stmt, _stmt_list,
> +  new_temp = vect_create_addr_base_for_vector_ref (stmt_info, _stmt_list,
>offset, byte_offset);
>if (new_stmt_list)
>  {
> @@ -4934,7 +4934,7 @@ bump_vector_ptr (tree dataref_ptr, gimpl
>  new_dataref_ptr = make_ssa_name (TREE_TYPE (dataref_ptr));
>incr_stmt = gimple_build_assign (new_dataref_ptr, POINTER_PLUS_EXPR,
>dataref_ptr, update);
> -  vect_finish_stmt_generation (stmt, incr_stmt, gsi);
> +  vect_finish_stmt_generation (stmt_info, incr_stmt, gsi);
>
>/* Copy the points-to information if it exists. */
>if (DR_PTR_INFO (dr))
> @@ -5282,7 +5282,7 @@ vect_permute_store_chain (vec dr_c
>   data_ref = make_temp_ssa_name (vectype, NULL, "vect_shuffle3_low");
>

Re: [27/46] Remove duplicated stmt_vec_info lookups

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:03 PM Richard Sandiford
 wrote:
>
> Various places called vect_dr_stmt or vinfo_for_stmt multiple times
> on the same input.  This patch makes them reuse the earlier result.
> It also splits a couple of single vinfo_for_stmt calls out into
> separate statements so that they can be reused in later patches.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence)
> (vect_slp_analyze_node_dependences, vect_analyze_data_ref_accesses)
> (vect_permute_store_chain, vect_permute_load_chain)
> (vect_shift_permute_load_chain, vect_transform_grouped_load): Avoid
> repeated stmt_vec_info lookups.
> * tree-vect-loop-manip.c (vect_can_advance_ivs_p): Likewise.
> (vect_update_ivs_after_vectorizer): Likewise.
> * tree-vect-loop.c (vect_is_simple_reduction): Likewise.
> (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise.
> * tree-vect-patterns.c (adjust_bool_stmts): Likewise.
> * tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
> (vect_bb_slp_scalar_cost): Likewise.
> * tree-vect-stmts.c (get_group_alias_ptr_type): Likewise.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:28.452793542 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:31.736764378 +0100
> @@ -472,8 +472,7 @@ vect_analyze_data_ref_dependence (struct
> ... = a[i];
> a[i+1] = ...;
>  where loads from the group interleave with the store.  */
> - if (!vect_preserves_scalar_order_p (vect_dr_stmt(dra),
> - vect_dr_stmt (drb)))
> + if (!vect_preserves_scalar_order_p (stmtinfo_a, stmtinfo_b))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -673,6 +672,7 @@ vect_slp_analyze_node_dependences (slp_i
>   in NODE verifying we can sink them up to the last stmt in the
>   group.  */
>stmt_vec_info last_access_info = vect_find_last_scalar_stmt_in_slp (node);
> +  vec_info *vinfo = last_access_info->vinfo;
>for (unsigned k = 0; k < SLP_INSTANCE_GROUP_SIZE (instance); ++k)
>  {
>stmt_vec_info access_info = SLP_TREE_SCALAR_STMTS (node)[k];
> @@ -691,7 +691,8 @@ vect_slp_analyze_node_dependences (slp_i
>
>   /* If we couldn't record a (single) data reference for this
>  stmt we have to resort to the alias oracle.  */
> - data_reference *dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
> + stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
> + data_reference *dr_b = STMT_VINFO_DATA_REF (stmt_info);
>   if (!dr_b)
> {
>   /* We are moving a store or sinking a load - this means
> @@ -2951,7 +2952,7 @@ vect_analyze_data_ref_accesses (vec_info
>   || data_ref_compare_tree (DR_BASE_ADDRESS (dra),
> DR_BASE_ADDRESS (drb)) != 0
>   || data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)) != 0
> - || !can_group_stmts_p (vect_dr_stmt (dra), vect_dr_stmt (drb)))
> + || !can_group_stmts_p (stmtinfo_a, stmtinfo_b))
> break;
>
>   /* Check that the data-refs have the same constant size.  */
> @@ -3040,11 +3041,11 @@ vect_analyze_data_ref_accesses (vec_info
>   /* Link the found element into the group list.  */
>   if (!DR_GROUP_FIRST_ELEMENT (stmtinfo_a))
> {
> - DR_GROUP_FIRST_ELEMENT (stmtinfo_a) = vect_dr_stmt (dra);
> + DR_GROUP_FIRST_ELEMENT (stmtinfo_a) = stmtinfo_a;
>   lastinfo = stmtinfo_a;
> }
> - DR_GROUP_FIRST_ELEMENT (stmtinfo_b) = vect_dr_stmt (dra);
> - DR_GROUP_NEXT_ELEMENT (lastinfo) = vect_dr_stmt (drb);
> + DR_GROUP_FIRST_ELEMENT (stmtinfo_b) = stmtinfo_a;
> + DR_GROUP_NEXT_ELEMENT (lastinfo) = stmtinfo_b;
>   lastinfo = stmtinfo_b;
> }
>  }
> @@ -5219,9 +5220,10 @@ vect_permute_store_chain (vec dr_c
>   gimple_stmt_iterator *gsi,
>   vec *result_chain)
>  {
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>tree vect1, vect2, high, low;
>gimple *perm_stmt;
> -  tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>tree perm_mask_low, perm_mask_high;
>tree data_ref;
>tree perm3_mask_low, perm3_mask_high;
> @@ -5840,11 +5842,12 @@ vect_permute_load_chain (vec dr_ch
>  gimple_stmt_iterator *gsi,
>  vec *result_chain)
>  {
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>tree data_ref, first_vect, second_vect;
>tree perm_mask_even,

Re: [26/46] Make more use of dyn_cast in tree-vect*

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:03 PM Richard Sandiford
 wrote:
>
> If we use stmt_vec_infos to represent statements in the vectoriser,
> it's then more natural to use dyn_cast when processing the statement
> as an assignment, call, etc.  This patch does that in a few more places.

OK.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_check_gather_scatter): Pass the
> gcall rather than the generic gimple stmt to gimple_call_internal_fn.
> (vect_get_smallest_scalar_type, can_group_stmts_p): Use dyn_cast
> to get gassigns and gcalls, rather than operating on generc gimple
> stmts.
> * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p)
> (vect_mark_stmts_to_be_vectorized, vectorizable_store)
> (vectorizable_load, vect_analyze_stmt): Likewise.
> * tree-vect-loop.c (vectorizable_reduction): Likewise gphi.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:25.228822172 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:28.452793542 +0100
> @@ -130,15 +130,16 @@ vect_get_smallest_scalar_type (gimple *s
>
>lhs = rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
>
> -  if (is_gimple_assign (stmt)
> -  && (gimple_assign_cast_p (stmt)
> -  || gimple_assign_rhs_code (stmt) == DOT_PROD_EXPR
> -  || gimple_assign_rhs_code (stmt) == WIDEN_SUM_EXPR
> -  || gimple_assign_rhs_code (stmt) == WIDEN_MULT_EXPR
> -  || gimple_assign_rhs_code (stmt) == WIDEN_LSHIFT_EXPR
> -  || gimple_assign_rhs_code (stmt) == FLOAT_EXPR))
> +  gassign *assign = dyn_cast  (stmt);
> +  if (assign
> +  && (gimple_assign_cast_p (assign)
> + || gimple_assign_rhs_code (assign) == DOT_PROD_EXPR
> + || gimple_assign_rhs_code (assign) == WIDEN_SUM_EXPR
> + || gimple_assign_rhs_code (assign) == WIDEN_MULT_EXPR
> + || gimple_assign_rhs_code (assign) == WIDEN_LSHIFT_EXPR
> + || gimple_assign_rhs_code (assign) == FLOAT_EXPR))
>  {
> -  tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> +  tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (assign));
>
>rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (rhs_type));
>if (rhs < lhs)
> @@ -2850,21 +2851,23 @@ can_group_stmts_p (gimple *stmt1, gimple
>if (gimple_assign_single_p (stmt1))
>  return gimple_assign_single_p (stmt2);
>
> -  if (is_gimple_call (stmt1) && gimple_call_internal_p (stmt1))
> +  gcall *call1 = dyn_cast  (stmt1);
> +  if (call1 && gimple_call_internal_p (call1))
>  {
>/* Check for two masked loads or two masked stores.  */
> -  if (!is_gimple_call (stmt2) || !gimple_call_internal_p (stmt2))
> +  gcall *call2 = dyn_cast  (stmt2);
> +  if (!call2 || !gimple_call_internal_p (call2))
> return false;
> -  internal_fn ifn = gimple_call_internal_fn (stmt1);
> +  internal_fn ifn = gimple_call_internal_fn (call1);
>if (ifn != IFN_MASK_LOAD && ifn != IFN_MASK_STORE)
> return false;
> -  if (ifn != gimple_call_internal_fn (stmt2))
> +  if (ifn != gimple_call_internal_fn (call2))
> return false;
>
>/* Check that the masks are the same.  Cope with casts of masks,
>  like those created by build_mask_conversion.  */
> -  tree mask1 = gimple_call_arg (stmt1, 2);
> -  tree mask2 = gimple_call_arg (stmt2, 2);
> +  tree mask1 = gimple_call_arg (call1, 2);
> +  tree mask2 = gimple_call_arg (call2, 2);
>if (!operand_equal_p (mask1, mask2, 0))
> {
>   mask1 = strip_conversion (mask1);
> @@ -3665,7 +3668,7 @@ vect_check_gather_scatter (gimple *stmt,
>gcall *call = dyn_cast  (stmt);
>if (call && gimple_call_internal_p (call))
>  {
> -  ifn = gimple_call_internal_fn (stmt);
> +  ifn = gimple_call_internal_fn (call);
>if (internal_gather_scatter_fn_p (ifn))
> {
>   vect_describe_gather_scatter_call (call, info);
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-07-24 10:23:22.260848529 +0100
> +++ gcc/tree-vect-stmts.c   2018-07-24 10:23:28.456793506 +0100
> @@ -389,30 +389,31 @@ exist_non_indexing_operands_for_use_p (t
>   Therefore, all we need to check is if STMT falls into the
>   first case, and whether var corresponds to USE.  */
>
> -  if (!gimple_assign_copy_p (stmt))
> +  gassign *assign = dyn_cast  (stmt);
> +  if (!assign || !gimple_assign_copy_p (assign))
>  {
> -  if (is_gimple_call (stmt)
> - && gimple_call_internal_p (stmt))
> +  gcall *call = dyn_cast  (stmt);
> +  if (call && gimple_call_internal_p (call))
> {
> - internal_fn ifn = gimple_call_internal_fn (stmt);
> + internal_fn ifn = gimple_call_internal_fn (call);
>   int mask_index =

Re: [25/46] Make get_earlier/later_stmt take and return stmt_vec_infos

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:03 PM Richard Sandiford
 wrote:
>
> ...and also make vect_find_last_scalar_stmt_in_slp return a stmt_vec_info.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (get_earlier_stmt, get_later_stmt): Take and
> return stmt_vec_infos rather than gimple stmts.  Do not accept
> null arguments.
> (vect_find_last_scalar_stmt_in_slp): Return a stmt_vec_info instead
> of a gimple stmt.
> * tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Likewise.
> Update use of get_later_stmt.
> (vect_get_constant_vectors): Update call accordingly.
> (vect_schedule_slp_instance): Likewise
> * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Likewise.
> (vect_slp_analyze_instance_dependence): Likewise.
> (vect_preserves_scalar_order_p): Update use of get_earlier_stmt.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:22.264848493 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:25.232822136 +0100
> @@ -1119,68 +1119,36 @@ set_vinfo_for_stmt (gimple *stmt, stmt_v
>  }
>  }
>
> -/* Return the earlier statement between STMT1 and STMT2.  */
> +/* Return the earlier statement between STMT1_INFO and STMT2_INFO.  */
>
> -static inline gimple *
> -get_earlier_stmt (gimple *stmt1, gimple *stmt2)
> +static inline stmt_vec_info
> +get_earlier_stmt (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info)
>  {
> -  unsigned int uid1, uid2;
> +  gcc_checking_assert ((STMT_VINFO_IN_PATTERN_P (stmt1_info)
> +   || !STMT_VINFO_RELATED_STMT (stmt1_info))
> +  && (STMT_VINFO_IN_PATTERN_P (stmt2_info)
> +  || !STMT_VINFO_RELATED_STMT (stmt2_info)));
>
> -  if (stmt1 == NULL)
> -return stmt2;
> -
> -  if (stmt2 == NULL)
> -return stmt1;
> -
> -  uid1 = gimple_uid (stmt1);
> -  uid2 = gimple_uid (stmt2);
> -
> -  if (uid1 == 0 || uid2 == 0)
> -return NULL;
> -
> -  gcc_assert (uid1 <= stmt_vec_info_vec->length ()
> - && uid2 <= stmt_vec_info_vec->length ());
> -  gcc_checking_assert ((STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt1))
> -   || !STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt1)))
> -  && (STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt2))
> -  || !STMT_VINFO_RELATED_STMT (vinfo_for_stmt 
> (stmt2;
> -
> -  if (uid1 < uid2)
> -return stmt1;
> +  if (gimple_uid (stmt1_info->stmt) < gimple_uid (stmt2_info->stmt))
> +return stmt1_info;
>else
> -return stmt2;
> +return stmt2_info;
>  }
>
> -/* Return the later statement between STMT1 and STMT2.  */
> +/* Return the later statement between STMT1_INFO and STMT2_INFO.  */
>
> -static inline gimple *
> -get_later_stmt (gimple *stmt1, gimple *stmt2)
> +static inline stmt_vec_info
> +get_later_stmt (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info)
>  {
> -  unsigned int uid1, uid2;
> -
> -  if (stmt1 == NULL)
> -return stmt2;
> -
> -  if (stmt2 == NULL)
> -return stmt1;
> -
> -  uid1 = gimple_uid (stmt1);
> -  uid2 = gimple_uid (stmt2);
> -
> -  if (uid1 == 0 || uid2 == 0)
> -return NULL;
> -
> -  gcc_assert (uid1 <= stmt_vec_info_vec->length ()
> - && uid2 <= stmt_vec_info_vec->length ());
> -  gcc_checking_assert ((STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt1))
> -   || !STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt1)))
> -  && (STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (stmt2))
> -  || !STMT_VINFO_RELATED_STMT (vinfo_for_stmt 
> (stmt2;
> +  gcc_checking_assert ((STMT_VINFO_IN_PATTERN_P (stmt1_info)
> +   || !STMT_VINFO_RELATED_STMT (stmt1_info))
> +  && (STMT_VINFO_IN_PATTERN_P (stmt2_info)
> +  || !STMT_VINFO_RELATED_STMT (stmt2_info)));
>
> -  if (uid1 > uid2)
> -return stmt1;
> +  if (gimple_uid (stmt1_info->stmt) > gimple_uid (stmt2_info->stmt))
> +return stmt1_info;
>else
> -return stmt2;
> +return stmt2_info;
>  }
>
>  /* Return TRUE if a statement represented by STMT_INFO is a part of a
> @@ -1674,7 +1642,7 @@ extern bool vect_make_slp_decision (loop
>  extern void vect_detect_hybrid_slp (loop_vec_info);
>  extern void vect_get_slp_defs (vec , slp_tree, vec > *);
>  extern bool vect_slp_bb (basic_block);
> -extern gimple *vect_find_last_scalar_stmt_in_slp (slp_tree);
> +extern stmt_vec_info vect_find_last_scalar_stmt_in_slp (slp_tree);
>  extern bool is_simple_and_all_uses_invariant (gimple *, loop_vec_info);
>  extern bool can_duplicate_and_interleave_p (unsigned int, machine_mode,
> unsigned int * = NULL,
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c

Re: [24/46] Make stmt_info_for_cost use a stmt_vec_info

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:02 PM Richard Sandiford
 wrote:
>
> This patch makes stmt_info_for_cost carry a stmt_vec_info instead
> of a gimple stmt.  The structure is internal to the vectoriser,
> so targets aren't affected.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (stmt_info_for_cost::stmt): Replace with...
> (stmt_info_for_cost::stmt_info): ...this new field.
> (add_stmt_costs): Update accordingly.
> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost)
> (vect_get_known_peeling_cost): Likewise.
> (vect_estimate_min_profitable_iters): Likewise.
> * tree-vect-stmts.c (record_stmt_cost): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:18.856878757 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:22.264848493 +0100
> @@ -116,7 +116,7 @@ struct stmt_info_for_cost {
>int count;
>enum vect_cost_for_stmt kind;
>enum vect_cost_model_location where;
> -  gimple *stmt;
> +  stmt_vec_info stmt_info;
>int misalign;
>  };
>
> @@ -1282,10 +1282,7 @@ add_stmt_costs (void *data, stmt_vector_
>stmt_info_for_cost *cost;
>unsigned i;
>FOR_EACH_VEC_ELT (*cost_vec, i, cost)
> -add_stmt_cost (data, cost->count, cost->kind,
> -  (cost->stmt
> -   ? vinfo_for_stmt (cost->stmt)
> -   : NULL_STMT_VEC_INFO),
> +add_stmt_cost (data, cost->count, cost->kind, cost->stmt_info,
>cost->misalign, cost->where);
>  }
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-24 10:23:12.060939107 +0100
> +++ gcc/tree-vect-loop.c2018-07-24 10:23:22.260848529 +0100
> @@ -1136,13 +1136,9 @@ vect_compute_single_scalar_iteration_cos
>int j;
>FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
> j, si)
> -{
> -  struct _stmt_vec_info *stmt_info
> -   = si->stmt ? vinfo_for_stmt (si->stmt) : NULL_STMT_VEC_INFO;
> -  (void) add_stmt_cost (target_cost_data, si->count,
> -   si->kind, stmt_info, si->misalign,
> -   vect_body);
> -}
> +(void) add_stmt_cost (target_cost_data, si->count,
> + si->kind, si->stmt_info, si->misalign,
> + vect_body);
>unsigned dummy, body_cost = 0;
>finish_cost (target_cost_data, , _cost, );
>destroy_cost_data (target_cost_data);
> @@ -3344,24 +3340,16 @@ vect_get_known_peeling_cost (loop_vec_in
>int j;
>if (peel_iters_prologue)
>  FOR_EACH_VEC_ELT (*scalar_cost_vec, j, si)
> -   {
> - stmt_vec_info stmt_info
> -   = si->stmt ? vinfo_for_stmt (si->stmt) : NULL_STMT_VEC_INFO;
> - retval += record_stmt_cost (prologue_cost_vec,
> - si->count * peel_iters_prologue,
> - si->kind, stmt_info, si->misalign,
> - vect_prologue);
> -   }
> +  retval += record_stmt_cost (prologue_cost_vec,
> + si->count * peel_iters_prologue,
> + si->kind, si->stmt_info, si->misalign,
> + vect_prologue);
>if (*peel_iters_epilogue)
>  FOR_EACH_VEC_ELT (*scalar_cost_vec, j, si)
> -   {
> - stmt_vec_info stmt_info
> -   = si->stmt ? vinfo_for_stmt (si->stmt) : NULL_STMT_VEC_INFO;
> - retval += record_stmt_cost (epilogue_cost_vec,
> - si->count * *peel_iters_epilogue,
> - si->kind, stmt_info, si->misalign,
> - vect_epilogue);
> -   }
> +  retval += record_stmt_cost (epilogue_cost_vec,
> + si->count * *peel_iters_epilogue,
> + si->kind, si->stmt_info, si->misalign,
> + vect_epilogue);
>
>return retval;
>  }
> @@ -3497,13 +3485,9 @@ vect_estimate_min_profitable_iters (loop
>   int j;
>   FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
> j, si)
> -   {
> - struct _stmt_vec_info *stmt_info
> -   = si->stmt ? vinfo_for_stmt (si->stmt) : NULL_STMT_VEC_INFO;
> - (void) add_stmt_cost (target_cost_data, si->count,
> -   si->kind, stmt_info, si->misalign,
> -   vect_epilogue);
> -   }
> +   (void) add_stmt_cost (target_cost_data, si->count,
> + si->kind, si->stmt_info, si->misalign,
> + vect_epilogue);
> }
>  }
>else if (npeel < 0)

Re: [23/46] Make LOOP_VINFO_MAY_MISALIGN_STMTS use stmt_vec_info

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:02 PM Richard Sandiford
 wrote:
>
> This patch changes LOOP_VINFO_MAY_MISALIGN_STMTS from an
> auto_vec to an auto_vec.

OK.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_loop_vec_info::may_misalign_stmts): Change
> from an auto_vec to an auto_vec.
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Update
> accordingly.
> * tree-vect-loop-manip.c (vect_create_cond_for_align_checks): 
> Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:15.756906285 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:18.856878757 +0100
> @@ -472,7 +472,7 @@ typedef struct _loop_vec_info : public v
>
>/* Statements in the loop that have data references that are candidates 
> for a
>   runtime (loop versioning) misalignment check.  */
> -  auto_vec may_misalign_stmts;
> +  auto_vec may_misalign_stmts;
>
>/* Reduction cycles detected in the loop. Used in loop-aware SLP.  */
>auto_vec reductions;
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:08.532970436 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:18.856878757 +0100
> @@ -2231,16 +2231,15 @@ vect_enhance_data_refs_alignment (loop_v
>
>if (do_versioning)
>  {
> -  vec may_misalign_stmts
> +  vec may_misalign_stmts
>  = LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo);
> -  gimple *stmt;
> +  stmt_vec_info stmt_info;
>
>/* It can now be assumed that the data references in the statements
>   in LOOP_VINFO_MAY_MISALIGN_STMTS will be aligned in the version
>   of the loop being vectorized.  */
> -  FOR_EACH_VEC_ELT (may_misalign_stmts, i, stmt)
> +  FOR_EACH_VEC_ELT (may_misalign_stmts, i, stmt_info)
>  {
> -  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>dr = STMT_VINFO_DATA_REF (stmt_info);
>   SET_DR_MISALIGNMENT (dr, 0);
>   if (dump_enabled_p ())
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2018-07-24 10:23:04.029010432 +0100
> +++ gcc/tree-vect-loop-manip.c  2018-07-24 10:23:18.856878757 +0100
> @@ -2772,9 +2772,9 @@ vect_create_cond_for_align_checks (loop_
> tree *cond_expr,
>gimple_seq *cond_expr_stmt_list)
>  {
> -  vec may_misalign_stmts
> +  vec may_misalign_stmts
>  = LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo);
> -  gimple *ref_stmt;
> +  stmt_vec_info stmt_info;
>int mask = LOOP_VINFO_PTR_MASK (loop_vinfo);
>tree mask_cst;
>unsigned int i;
> @@ -2795,23 +2795,22 @@ vect_create_cond_for_align_checks (loop_
>/* Create expression (mask & (dr_1 || ... || dr_n)) where dr_i is the 
> address
>   of the first vector of the i'th data reference. */
>
> -  FOR_EACH_VEC_ELT (may_misalign_stmts, i, ref_stmt)
> +  FOR_EACH_VEC_ELT (may_misalign_stmts, i, stmt_info)
>  {
>gimple_seq new_stmt_list = NULL;
>tree addr_base;
>tree addr_tmp_name;
>tree new_or_tmp_name;
>gimple *addr_stmt, *or_stmt;
> -  stmt_vec_info stmt_vinfo = vinfo_for_stmt (ref_stmt);
> -  tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>bool negative = tree_int_cst_compare
> -   (DR_STEP (STMT_VINFO_DATA_REF (stmt_vinfo)), size_zero_node) < 0;
> +   (DR_STEP (STMT_VINFO_DATA_REF (stmt_info)), size_zero_node) < 0;
>tree offset = negative
> ? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) : size_zero_node;
>
>/* create: addr_tmp = (int)(address_of_first_vector) */
>addr_base =
> -   vect_create_addr_base_for_vector_ref (ref_stmt, _stmt_list,
> +   vect_create_addr_base_for_vector_ref (stmt_info, _stmt_list,
>   offset);
>if (new_stmt_list != NULL)
> gimple_seq_add_seq (cond_expr_stmt_list, new_stmt_list);

Re: [22/46] Make DR_GROUP_SAME_DR_STMT a stmt_vec_info

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:02 PM Richard Sandiford
 wrote:
>
> This patch changes STMT_VINFO_SAME_DR_STMT from a gimple stmt to a
> stmt_vec_info.

OK.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_stmt_vec_info::same_dr_stmt): Change from
> a gimple stmt to a stmt_vec_info.
> * tree-vect-stmts.c (vectorizable_load): Update accordingly.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:12.060939107 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:15.756906285 +0100
> @@ -876,7 +876,7 @@ struct _stmt_vec_info {
>stmt_vec_info next_element;
>/* For data-refs, in case that two or more stmts share data-ref, this is 
> the
>   pointer to the previously detected stmt with the same dr.  */
> -  gimple *same_dr_stmt;
> +  stmt_vec_info same_dr_stmt;
>/* The size of the group.  */
>unsigned int size;
>/* For stores, number of stores from this group seen. We vectorize the last
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-07-24 10:23:08.536970400 +0100
> +++ gcc/tree-vect-stmts.c   2018-07-24 10:23:15.756906285 +0100
> @@ -7590,8 +7590,7 @@ vectorizable_load (gimple *stmt, gimple_
>  we have to give up.  */
>if (DR_GROUP_SAME_DR_STMT (stmt_info)
>   && (STMT_SLP_TYPE (stmt_info)
> - != STMT_SLP_TYPE (vinfo_for_stmt
> -(DR_GROUP_SAME_DR_STMT (stmt_info)
> + != STMT_SLP_TYPE (DR_GROUP_SAME_DR_STMT (stmt_info
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

Re: [21/46] Make grouped_stores and reduction_chains use stmt_vec_infos

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:01 PM Richard Sandiford
 wrote:
>
> This patch changes the SLP lists grouped_stores and reduction_chains
> from auto_vec to auto_vec.  It was easier
> to do them together due to the way vect_analyze_slp is structured.

OK.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vec_info::grouped_stores): Change from
> an auto_vec to an auto_vec.
> (_loop_vec_info::reduction_chains): Likewise.
> * tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns): Update
> accordingly.
> * tree-vect-slp.c (vect_analyze_slp): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:08.536970400 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:12.060939107 +0100
> @@ -259,7 +259,7 @@ struct vec_info {
>
>/* All interleaving chains of stores, represented by the first
>   stmt in the chain.  */
> -  auto_vec grouped_stores;
> +  auto_vec grouped_stores;
>
>/* Cost data used by the target cost model.  */
>void *target_cost_data;
> @@ -479,7 +479,7 @@ typedef struct _loop_vec_info : public v
>
>/* All reduction chains in the loop, represented by the first
>   stmt in the chain.  */
> -  auto_vec reduction_chains;
> +  auto_vec reduction_chains;
>
>/* Cost vector for a single scalar iteration.  */
>auto_vec scalar_cost_vec;
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-24 10:23:08.532970436 +0100
> +++ gcc/tree-vect-loop.c2018-07-24 10:23:12.060939107 +0100
> @@ -677,13 +677,13 @@ vect_fixup_reduc_chain (gimple *stmt)
>  static void
>  vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
>  {
> -  gimple *first;
> +  stmt_vec_info first;
>unsigned i;
>
>FOR_EACH_VEC_ELT (LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo), i, first)
> -if (STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (first)))
> +if (STMT_VINFO_IN_PATTERN_P (first))
>{
> -   stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (vinfo_for_stmt 
> (first));
> +   stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (first);
> while (next)
>   {
> if (! STMT_VINFO_IN_PATTERN_P (next))
> @@ -696,7 +696,7 @@ vect_fixup_scalar_cycles_with_patterns (
>   {
> vect_fixup_reduc_chain (first);
> LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo)[i]
> - = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (first));
> + = STMT_VINFO_RELATED_STMT (first);
>   }
>}
>  }
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-24 10:23:08.536970400 +0100
> +++ gcc/tree-vect-slp.c 2018-07-24 10:23:12.060939107 +0100
> @@ -2202,7 +2202,7 @@ vect_analyze_slp_instance (vec_info *vin
>  vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
>  {
>unsigned int i;
> -  gimple *first_element;
> +  stmt_vec_info first_element;
>
>DUMP_VECT_SCOPE ("vect_analyze_slp");
>
> @@ -2220,17 +2220,15 @@ vect_analyze_slp (vec_info *vinfo, unsig
>  max_tree_size))
>   {
> /* Dissolve reduction chain group.  */
> -   gimple *stmt = first_element;
> -   while (stmt)
> +   stmt_vec_info vinfo = first_element;
> +   while (vinfo)
>   {
> -   stmt_vec_info vinfo = vinfo_for_stmt (stmt);
> stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (vinfo);
> REDUC_GROUP_FIRST_ELEMENT (vinfo) = NULL;
> REDUC_GROUP_NEXT_ELEMENT (vinfo) = NULL;
> -   stmt = next;
> +   vinfo = next;
>   }
> -   STMT_VINFO_DEF_TYPE (vinfo_for_stmt (first_element))
> - = vect_internal_def;
> +   STMT_VINFO_DEF_TYPE (first_element) = vect_internal_def;
>   }
> }
>

Re: [20/46] Make FIRST_ELEMENT and NEXT_ELEMENT stmt_vec_infos

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:01 PM Richard Sandiford
 wrote:
>
> This patch changes {REDUC,DR}_GROUP_{FIRST,NEXT} element from a
> gimple stmt to stmt_vec_info.

OK.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_stmt_vec_info::first_element): Change from
> a gimple stmt to a stmt_vec_info.
> (_stmt_vec_info::next_element): Likewise.
> * tree-vect-data-refs.c (vect_update_misalignment_for_peel)
> (vect_slp_analyze_and_verify_node_alignment)
> (vect_analyze_group_access_1, vect_analyze_group_access)
> (vect_small_gap_p, vect_prune_runtime_alias_test_list)
> (vect_create_data_ref_ptr, vect_record_grouped_load_vectors)
> (vect_supportable_dr_alignment): Update accordingly.
> * tree-vect-loop.c (vect_fixup_reduc_chain): Likewise.
> (vect_fixup_scalar_cycles_with_patterns, vect_is_slp_reduction)
> (vect_is_simple_reduction, vectorizable_reduction): Likewise.
> * tree-vect-patterns.c (vect_reassociating_reduction_p): Likewise.
> * tree-vect-slp.c (vect_build_slp_tree_1)
> (vect_attempt_slp_rearrange_stmts, vect_supported_load_permutation_p)
> (vect_split_slp_store_group, vect_analyze_slp_instance)
> (vect_analyze_slp, vect_transform_slp_perm_load): Likewise.
> * tree-vect-stmts.c (vect_model_store_cost, vect_model_load_cost)
> (get_group_load_store_type, get_load_store_type)
> (get_group_alias_ptr_type, vectorizable_store, vectorizable_load)
> (vect_transform_stmt, vect_remove_stores): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:04.033010396 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:08.536970400 +0100
> @@ -871,9 +871,9 @@ struct _stmt_vec_info {
>
>/* Interleaving and reduction chains info.  */
>/* First element in the group.  */
> -  gimple *first_element;
> +  stmt_vec_info first_element;
>/* Pointer to the next element in the group.  */
> -  gimple *next_element;
> +  stmt_vec_info next_element;
>/* For data-refs, in case that two or more stmts share data-ref, this is 
> the
>   pointer to the previously detected stmt with the same dr.  */
>gimple *same_dr_stmt;
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:04.029010432 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:08.532970436 +0100
> @@ -1077,7 +1077,7 @@ vect_update_misalignment_for_peel (struc
>   /* For interleaved data accesses the step in the loop must be multiplied by
>   the size of the interleaving group.  */
>if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> -dr_size *= DR_GROUP_SIZE (vinfo_for_stmt (DR_GROUP_FIRST_ELEMENT 
> (stmt_info)));
> +dr_size *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>if (STMT_VINFO_GROUPED_ACCESS (peel_stmt_info))
>  dr_peel_size *= DR_GROUP_SIZE (peel_stmt_info);
>
> @@ -2370,12 +2370,11 @@ vect_slp_analyze_and_verify_node_alignme
>   the node is permuted in which case we start from the first
>   element in the group.  */
>stmt_vec_info first_stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
> -  gimple *first_stmt = first_stmt_info->stmt;
>data_reference_p first_dr = STMT_VINFO_DATA_REF (first_stmt_info);
>if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
> -first_stmt = DR_GROUP_FIRST_ELEMENT (first_stmt_info);
> +first_stmt_info = DR_GROUP_FIRST_ELEMENT (first_stmt_info);
>
> -  data_reference_p dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
> +  data_reference_p dr = STMT_VINFO_DATA_REF (first_stmt_info);
>vect_compute_data_ref_alignment (dr);
>/* For creating the data-ref pointer we need alignment of the
>   first element anyway.  */
> @@ -2520,11 +2519,11 @@ vect_analyze_group_access_1 (struct data
>if (DR_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info)
>  {
>/* First stmt in the interleaving chain. Check the chain.  */
> -  gimple *next = DR_GROUP_NEXT_ELEMENT (stmt_info);
> +  stmt_vec_info next = DR_GROUP_NEXT_ELEMENT (stmt_info);
>struct data_reference *data_ref = dr;
>unsigned int count = 1;
>tree prev_init = DR_INIT (data_ref);
> -  gimple *prev = stmt_info;
> +  stmt_vec_info prev = stmt_info;
>HOST_WIDE_INT diff, gaps = 0;
>
>/* By construction, all group members have INTEGER_CST DR_INITs.  */
> @@ -2535,8 +2534,7 @@ vect_analyze_group_access_1 (struct data
>   stmt, and the rest get their vectorized loads from the first
>   one.  */
>if (!tree_int_cst_compare (DR_INIT (data_ref),
> - DR_INIT (STMT_VINFO_DATA_REF (
> -  vinfo_for_stmt (next)
> +

Re: [19/46] Make vect_dr_stmt return a stmt_vec_info

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:01 PM Richard Sandiford
 wrote:
>
> This patch makes vect_dr_stmt return a stmt_vec_info instead of a
> gimple stmt.  Rather than retain a separate gimple stmt variable
> in cases where both existed, the patch replaces uses of the gimple
> variable with the uses of the stmt_vec_info.  Later patches do this
> more generally.
>
> Many things that are keyed off a data_reference would these days
> be better keyed off a stmt_vec_info, but it's more convenient
> to do that later in the series.  The vect_dr_size calls that are
> left over do still benefit from this patch.

OK

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_dr_stmt): Return a stmt_vec_info rather
> than a gimple stmt.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence)
> (vect_slp_analyze_data_ref_dependence, vect_record_base_alignments)
> (vect_calculate_target_alignmentm, vect_compute_data_ref_alignment)
> (vect_update_misalignment_for_peel, vect_verify_datarefs_alignment)
> (vector_alignment_reachable_p, vect_get_data_access_cost)
> (vect_get_peeling_costs_all_drs, vect_peeling_hash_get_lowest_cost)
> (vect_peeling_supportable, vect_enhance_data_refs_alignment)
> (vect_find_same_alignment_drs, vect_analyze_data_refs_alignment)
> (vect_analyze_group_access_1, vect_analyze_group_access)
> (vect_analyze_data_ref_access, vect_analyze_data_ref_accesses)
> (vect_vfa_access_size, vect_small_gap_p, vect_analyze_data_refs)
> (vect_supportable_dr_alignment): Remove vinfo_for_stmt from the
> result of vect_dr_stmt and use the stmt_vec_info instead of
> the associated gimple stmt.
> * tree-vect-loop-manip.c (get_misalign_in_elems): Likewise.
> (vect_gen_prolog_loop_niters): Likewise.
> * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:23:00.401042649 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:04.033010396 +0100
> @@ -1370,7 +1370,7 @@ vect_dr_behavior (data_reference *dr)
> a pattern this returns the corresponding pattern stmt.  Otherwise
> DR_STMT is returned.  */
>
> -inline gimple *
> +inline stmt_vec_info
>  vect_dr_stmt (data_reference *dr)
>  {
>gimple *stmt = DR_STMT (dr);
> @@ -1379,7 +1379,7 @@ vect_dr_stmt (data_reference *dr)
>  return STMT_VINFO_RELATED_STMT (stmt_info);
>/* DR_STMT should never refer to a stmt in a pattern replacement.  */
>gcc_checking_assert (!STMT_VINFO_RELATED_STMT (stmt_info));
> -  return stmt;
> +  return stmt_info;
>  }
>
>  /* Return true if the vect cost model is unlimited.  */
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-24 10:23:00.397042684 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-24 10:23:04.029010432 +0100
> @@ -294,8 +294,8 @@ vect_analyze_data_ref_dependence (struct
>struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>struct data_reference *dra = DDR_A (ddr);
>struct data_reference *drb = DDR_B (ddr);
> -  stmt_vec_info stmtinfo_a = vinfo_for_stmt (vect_dr_stmt (dra));
> -  stmt_vec_info stmtinfo_b = vinfo_for_stmt (vect_dr_stmt (drb));
> +  stmt_vec_info stmtinfo_a = vect_dr_stmt (dra);
> +  stmt_vec_info stmtinfo_b = vect_dr_stmt (drb);
>lambda_vector dist_v;
>unsigned int loop_depth;
>
> @@ -627,9 +627,9 @@ vect_slp_analyze_data_ref_dependence (st
>
>/* If dra and drb are part of the same interleaving chain consider
>   them independent.  */
> -  if (STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (vect_dr_stmt (dra)))
> -  && (DR_GROUP_FIRST_ELEMENT (vinfo_for_stmt (vect_dr_stmt (dra)))
> - == DR_GROUP_FIRST_ELEMENT (vinfo_for_stmt (vect_dr_stmt (drb)
> +  if (STMT_VINFO_GROUPED_ACCESS (vect_dr_stmt (dra))
> +  && (DR_GROUP_FIRST_ELEMENT (vect_dr_stmt (dra))
> + == DR_GROUP_FIRST_ELEMENT (vect_dr_stmt (drb
>  return false;
>
>/* Unknown data dependence.  */
> @@ -841,19 +841,18 @@ vect_record_base_alignments (vec_info *v
>unsigned int i;
>FOR_EACH_VEC_ELT (vinfo->shared->datarefs, i, dr)
>  {
> -  gimple *stmt = vect_dr_stmt (dr);
> -  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  stmt_vec_info stmt_info = vect_dr_stmt (dr);
>if (!DR_IS_CONDITIONAL_IN_STMT (dr)
>   && STMT_VINFO_VECTORIZABLE (stmt_info)
>   && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> {
> - vect_record_base_alignment (vinfo, stmt, _INNERMOST (dr));
> + vect_record_base_alignment (vinfo, stmt_info, _INNERMOST (dr));
>
>   /* If DR is nested in the loop that is being vectorized, we can also
>  record the alignment of the base wrt the outer loop.  */
> - if (loop &&

Re: [18/46] Make SLP_TREE_SCALAR_STMTS a vec

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 12:01 PM Richard Sandiford
 wrote:
>
> This patch changes SLP_TREE_SCALAR_STMTS from a vec to
> a vec.  It's longer than the previous conversions
> but mostly mechanical.

OK.  I don't remember exactly but vect_external_def SLP nodes have
empty stmts vector then?  I realize we only have those for defs that
are in the vectorized region.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_slp_tree::stmts): Change from a vec
> to a vec.
> * tree-vect-slp.c (vect_free_slp_tree): Update accordingly.
> (vect_create_new_slp_node): Take a vec instead of a
> vec.
> (_slp_oprnd_info::def_stmts): Change from a vec
> to a vec.
> (bst_traits::value_type, bst_traits::value_type): Likewise.
> (bst_traits::hash): Update accordingly.
> (vect_get_and_check_slp_defs): Change the stmts parameter from
> a vec to a vec.
> (vect_two_operations_perm_ok_p, vect_build_slp_tree_1): Likewise.
> (vect_build_slp_tree): Likewise.
> (vect_build_slp_tree_2): Likewise.  Update uses of
> SLP_TREE_SCALAR_STMTS.
> (vect_print_slp_tree): Update uses of SLP_TREE_SCALAR_STMTS.
> (vect_mark_slp_stmts, vect_mark_slp_stmts_relevant)
> (vect_slp_rearrange_stmts, vect_attempt_slp_rearrange_stmts)
> (vect_supported_load_permutation_p, vect_find_last_scalar_stmt_in_slp)
> (vect_detect_hybrid_slp_stmts, vect_slp_analyze_node_operations_1)
> (vect_slp_analyze_node_operations, vect_slp_analyze_operations)
> (vect_bb_slp_scalar_cost, vect_slp_analyze_bb_1)
> (vect_get_constant_vectors, vect_get_slp_defs)
> (vect_transform_slp_perm_load, vect_schedule_slp_instance)
> (vect_remove_slp_scalar_calls, vect_schedule_slp): Likewise.
> (vect_analyze_slp_instance): Build up a vec of stmt_vec_infos
> instead of gimple stmts.
> * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Change
> the stores parameter for a vec to a vec.
> (vect_slp_analyze_instance_dependence): Update uses of
> SLP_TREE_SCALAR_STMTS.
> (vect_slp_analyze_and_verify_node_alignment): Likewise.
> (vect_slp_analyze_and_verify_instance_alignment): Likewise.
> * tree-vect-loop.c (neutral_op_for_slp_reduction): Likewise.
> (get_initial_defs_for_reduction): Likewise.
> (vect_create_epilog_for_reduction): Likewise.
> (vectorize_fold_left_reduction): Likewise.
> * tree-vect-stmts.c (vect_prologue_cost_for_slp_op): Likewise.
> (vect_model_simple_cost, vectorizable_shift, vectorizable_load)
> (can_vectorize_live_stmts): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:22:57.277070390 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:23:00.401042649 +0100
> @@ -138,7 +138,7 @@ struct _slp_tree {
>/* Nodes that contain def-stmts of this node statements operands.  */
>vec children;
>/* A group of scalar stmts to be vectorized together.  */
> -  vec stmts;
> +  vec stmts;
>/* Load permutation relative to the stores, NULL if there is no
>   permutation.  */
>vec load_permutation;
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-24 10:22:57.277070390 +0100
> +++ gcc/tree-vect-slp.c 2018-07-24 10:23:00.401042649 +0100
> @@ -66,11 +66,11 @@ vect_free_slp_tree (slp_tree node, bool
>   statements would be redundant.  */
>if (!final_p)
>  {
> -  gimple *stmt;
> -  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
> +  stmt_vec_info stmt_info;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> {
> - gcc_assert (STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmt)) > 0);
> - STMT_VINFO_NUM_SLP_USES (vinfo_for_stmt (stmt))--;
> + gcc_assert (STMT_VINFO_NUM_SLP_USES (stmt_info) > 0);
> + STMT_VINFO_NUM_SLP_USES (stmt_info)--;
> }
>  }
>
> @@ -99,21 +99,21 @@ vect_free_slp_instance (slp_instance ins
>  /* Create an SLP node for SCALAR_STMTS.  */
>
>  static slp_tree
> -vect_create_new_slp_node (vec scalar_stmts)
> +vect_create_new_slp_node (vec scalar_stmts)
>  {
>slp_tree node;
> -  gimple *stmt = scalar_stmts[0];
> +  stmt_vec_info stmt_info = scalar_stmts[0];
>unsigned int nops;
>
> -  if (is_gimple_call (stmt))
> +  if (gcall *stmt = dyn_cast  (stmt_info->stmt))
>  nops = gimple_call_num_args (stmt);
> -  else if (is_gimple_assign (stmt))
> +  else if (gassign *stmt = dyn_cast  (stmt_info->stmt))
>  {
>nops = gimple_num_ops (stmt) - 1;
>if (gimple_assign_rhs_code (stmt) == COND_EXPR)
> nops++;
>  }
> -  else if (gimple_code (stmt) == GIMPLE_PHI)
> +  else if (is_a  (stmt_info->stmt))
>

Re: [17/46] Make LOOP_VINFO_REDUCTIONS an auto_vec

2018-07-25 Thread Richard Biener

On Tue, Jul 24, 2018 at 11:59 AM Richard Sandiford
 wrote:
>
> This patch changes LOOP_VINFO_REDUCTIONS from an auto_vec
> to an auto_vec.  It also changes the associated
> vect_force_simple_reduction so that it takes and returns stmt_vec_infos
> instead of gimple stmts.

OK.

Highlights that reduction detection needs refactoring to be usable outside
of the vectorizer (see tree-parloops.c).  Exposing vinfos doesn't make the
situation better here...

Richard.

>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_loop_vec_info::reductions): Change from an
> auto_vec to an auto_vec.
> (vect_force_simple_reduction): Take and return stmt_vec_infos rather
> than gimple stmts.
> * tree-parloops.c (valid_reduction_p): Take a stmt_vec_info instead
> of a gimple stmt.
> (gather_scalar_reductions): Update after above interface changes.
> * tree-vect-loop.c (vect_analyze_scalar_cycles_1): Likewise.
> (vect_is_simple_reduction): Take and return stmt_vec_infos rather
> than gimple stmts.
> (vect_force_simple_reduction): Likewise.
> * tree-vect-patterns.c (vect_pattern_recog_1): Update use of
> LOOP_VINFO_REDUCTIONS.
> * tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-24 10:22:53.909100298 +0100
> +++ gcc/tree-vectorizer.h   2018-07-24 10:22:57.277070390 +0100
> @@ -475,7 +475,7 @@ typedef struct _loop_vec_info : public v
>auto_vec may_misalign_stmts;
>
>/* Reduction cycles detected in the loop. Used in loop-aware SLP.  */
> -  auto_vec reductions;
> +  auto_vec reductions;
>
>/* All reduction chains in the loop, represented by the first
>   stmt in the chain.  */
> @@ -1627,8 +1627,8 @@ extern tree vect_create_addr_base_for_ve
>
>  /* In tree-vect-loop.c.  */
>  /* FORNOW: Used in tree-parloops.c.  */
> -extern gimple *vect_force_simple_reduction (loop_vec_info, gimple *,
> -   bool *, bool);
> +extern stmt_vec_info vect_force_simple_reduction (loop_vec_info, 
> stmt_vec_info,
> + bool *, bool);
>  /* Used in gimple-loop-interchange.c.  */
>  extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree,
>   enum tree_code);
> Index: gcc/tree-parloops.c
> ===
> --- gcc/tree-parloops.c 2018-06-27 10:27:09.778650686 +0100
> +++ gcc/tree-parloops.c 2018-07-24 10:22:57.273070426 +0100
> @@ -2570,15 +2570,14 @@ set_reduc_phi_uids (reduction_info **slo
>return 1;
>  }
>
> -/* Return true if the type of reduction performed by STMT is suitable
> +/* Return true if the type of reduction performed by STMT_INFO is suitable
> for this pass.  */
>
>  static bool
> -valid_reduction_p (gimple *stmt)
> +valid_reduction_p (stmt_vec_info stmt_info)
>  {
>/* Parallelization would reassociate the operation, which isn't
>   allowed for in-order reductions.  */
> -  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>vect_reduction_type reduc_type = STMT_VINFO_REDUC_TYPE (stmt_info);
>return reduc_type != FOLD_LEFT_REDUCTION;
>  }
> @@ -2615,10 +2614,11 @@ gather_scalar_reductions (loop_p loop, r
>if (simple_iv (loop, loop, res, , true))
> continue;
>
> -  gimple *reduc_stmt
> -   = vect_force_simple_reduction (simple_loop_info, phi,
> +  stmt_vec_info reduc_stmt_info
> +   = vect_force_simple_reduction (simple_loop_info,
> +  simple_loop_info->lookup_stmt (phi),
>_reduc, true);
> -  if (!reduc_stmt || !valid_reduction_p (reduc_stmt))
> +  if (!reduc_stmt_info || !valid_reduction_p (reduc_stmt_info))
> continue;
>
>if (double_reduc)
> @@ -2627,11 +2627,11 @@ gather_scalar_reductions (loop_p loop, r
> continue;
>
>   double_reduc_phis.safe_push (phi);
> - double_reduc_stmts.safe_push (reduc_stmt);
> + double_reduc_stmts.safe_push (reduc_stmt_info->stmt);
>   continue;
> }
>
> -  build_new_reduction (reduction_list, reduc_stmt, phi);
> +  build_new_reduction (reduction_list, reduc_stmt_info->stmt, phi);
>  }
>delete simple_loop_info;
>
> @@ -2661,12 +2661,15 @@ gather_scalar_reductions (loop_p loop, r
>  , true))
> continue;
>
> - gimple *inner_reduc_stmt
> -   = vect_force_simple_reduction (simple_loop_info, inner_phi,
> + stmt_vec_info inner_phi_info
> +   = simple_loop_info->lookup_stmt (inner_phi);
> + stmt_vec_info inner_reduc_stmt_info
> +   = vect_force_simple_reduction (simple_loop_info,
> +

1 2 >

1 - 100 of 121 matches

Mail list logo