Re: [PATCH] Change PRED_LOOP_EXIT from 92 to 85.

2016-06-17 Thread Andrew Pinski
On Fri, Jun 17, 2016 at 7:29 AM, Martin Liška  wrote:
> Hello.
>
> After we've recently applied various changes (fixes) to predict.c, SPEC2006
> shows that PRED_LOOP_EXIT value should be amended.


This caused a 1% decrease of performance on coremarks on
aarch64-linux-gnu on ThunderX.

Thanks,
Andrew

>
> Survives regression tests & bootstrap on x86_64-linux.
> Pre-approved by Honza, installed as r237556.
>
> Thanks,
> Martin


[target/71338]: enable mulu for RL78/G10

2016-06-17 Thread DJ Delorie

Reverts https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01538.html - G10
supports MULU but not other multiplication methods.  Committed.

PR target/71338
* config/rl78/rl78-expand.c (umulqihi3): Enable for G10.
* config/rl78/rl78-virtual.c (umulhi3_shift_virt): Likewise.
(umulqihi3_virt): Likewise.
* config/rl78/rl78-real.c (umulhi3_shift_real): Likewise.
(umulqihi3_real): Likewise.

Index: gcc/config/rl78/rl78-expand.md
===
--- gcc/config/rl78/rl78-expand.md  (revision 237565)
+++ gcc/config/rl78/rl78-expand.md  (working copy)
@@ -156,13 +156,13 @@
 )
 
 (define_expand "umulqihi3"
   [(set (match_operand:HI 0 "register_operand")
 (mult:HI (zero_extend:HI (match_operand:QI 1 "register_operand"))
  (zero_extend:HI (match_operand:QI 2 "register_operand"]
-  "!TARGET_G10"
+  ""
   ""
 )
 
 (define_expand "andqi3"
   [(set (match_operand:QI 0 "rl78_nonimmediate_operand")
(and:QI (match_operand:QI 1 "rl78_general_operand")
Index: gcc/config/rl78/rl78-real.md
===
--- gcc/config/rl78/rl78-real.md(revision 237565)
+++ gcc/config/rl78/rl78-real.md(working copy)
@@ -176,23 +176,23 @@
 )
 
 (define_insn "*umulhi3_shift_real"
   [(set (match_operand:HI 0 "register_operand" "=A,A")
 (mult:HI (match_operand:HI 1 "rl78_nonfar_operand" "0,0")
  (match_operand:HI 2 "rl78_24_operand" "N,i")))]
-  "rl78_real_insns_ok () && !TARGET_G10"
+  "rl78_real_insns_ok ()"
   "@
shlw\t%0, 1
shlw\t%0, 2"
 )
 
 (define_insn "*umulqihi3_real"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=A")
 (mult:HI (zero_extend:HI (match_operand:QI 1 "general_operand" "%a"))
  (zero_extend:HI (match_operand:QI 2 "general_operand" "x"]
-  "rl78_real_insns_ok () && !TARGET_G10"
+  "rl78_real_insns_ok ()"
   "mulu\t%2"
 )
 
 (define_insn "*andqi3_real"
   [(set (match_operand:QI 0 "rl78_nonimmediate_operand"  
"=WsfWsaWhlWab,A,R,vWsa")
(and:QI (match_operand:QI 1 "rl78_general_operand"   "%0,0,0,0")
Index: gcc/config/rl78/rl78-virt.md
===
--- gcc/config/rl78/rl78-virt.md(revision 237565)
+++ gcc/config/rl78/rl78-virt.md(working copy)
@@ -113,22 +113,22 @@
 )
 
 (define_insn "*umulhi3_shift_virt"
   [(set (match_operand:HI  0 "register_operand" "=v")
 (mult:HI (match_operand:HI 1 "rl78_nonfar_operand" "%vim")
  (match_operand:HI 2 "rl78_24_operand" "Ni")))]
-  "rl78_virt_insns_ok () && !TARGET_G10"
+  "rl78_virt_insns_ok ()"
   "v.mulu\t%0, %1, %2"
   [(set_attr "valloc" "umul")]
 )
 
 (define_insn "*umulqihi3_virt"
   [(set (match_operand:HI  0 "register_operand" "=v")
 (mult:HI (zero_extend:HI (match_operand:QI 1 "rl78_nonfar_operand" 
"%vim"))
  (zero_extend:HI (match_operand:QI 2 "general_operand" 
"vim"]
-  "rl78_virt_insns_ok () && !TARGET_G10"
+  "rl78_virt_insns_ok ()"
   "v.mulu\t%0, %2"
   [(set_attr "valloc" "umul")]
 )
 
 (define_insn "*andqi3_virt"
   [(set (match_operand:QI 0 "rl78_nonimmediate_operand" "=vm,  *Wfr,  
vY")


[PATCH] input.c: add lexing selftests and a test matrix for line_table states

2016-06-17 Thread David Malcolm
This patch adds explicit testing of lexing a source file,
generalizing this (and the test of ordinary line maps) over
a 2-dimensional test matrix covering:

  (1) line_table->default_range_bits: some frontends use a non-zero value
  and others use zero

  (2) the fallback modes within line-map.c: there are various threshold
  values for source_location/location_t beyond line-map.c changes
  behavior (disabling of the range-packing optimization, disabling
  of column-tracking).  We exercise these by starting the line_table
  at interesting values at or near these thresholds.

This helps ensures that location data works in all of these states,
and that (I hope) we don't have lingering bugs relating to the
transition between line_table states.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0.

OK for trunk?  (I can self-approve much of this, but it's probably
worth having another pair of eyes look at it, if nothing else).

gcc/ChangeLog:
* input.c: Include cpplib.h.
(selftest::temp_source_file): New class.
(selftest::temp_source_file::temp_source_file): New ctor.
(selftest::temp_source_file::~temp_source_file): New dtor.
(selftest::should_have_column_data_p): New function.
(selftest::test_should_have_column_data_p): New function.
(selftest::temp_line_table): New class.
(selftest::temp_line_table::temp_line_table): New ctor.
(selftest::temp_line_table::~temp_line_table): New dtor.
(selftest::test_accessing_ordinary_linemaps): Add case_ param; use
it to create a temp_line_table.
(selftest::assert_loceq): Only verify LOCATION_COLUMN for
locations that are known to have column data.
(selftest::line_table_case): New struct.
(selftest::test_reading_source_line): Move tempfile handling
to class temp_source_file.
(ASSERT_TOKEN_AS_TEXT_EQ): New macro.
(selftest::assert_token_loc_eq): New function.
(ASSERT_TOKEN_LOC_EQ): New macro.
(selftest::test_lexer): New function.
(selftest::boundary_locations): New array.
(selftest::input_c_tests): Call test_should_have_column_data_p.
Loop over a test matrix of interesting values of location and
default_range_bits, calling test_lexer on each case in the matrix.
Move call to test_accessing_ordinary_linemaps into the matrix.
* selftest.h (ASSERT_EQ): Reimplement in terms of...
(ASSERT_EQ_AT): New macro.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/location_overflow_plugin.c (plugin_init): Avoid
hardcoding the values of LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES
and LINE_MAP_MAX_LOCATION_WITH_COLS.

libcpp/ChangeLog:
* include/line-map.h (LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES):
Move here from line-map.c.
(LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
* line-map.c (LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Move from
here to line-map.h.
(LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
---
 gcc/input.c| 323 +++--
 gcc/selftest.h |  12 +-
 .../gcc.dg/plugin/location_overflow_plugin.c   |   4 +-
 libcpp/include/line-map.h  |  10 +
 libcpp/line-map.c  |  12 -
 5 files changed, 327 insertions(+), 34 deletions(-)

diff --git a/gcc/input.c b/gcc/input.c
index 3fb4a25..0016555 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "diagnostic-core.h"
 #include "selftest.h"
+#include "cpplib.h"
 
 /* This is a cache used by get_next_line to store the content of a
file to be searched for file lines.  */
@@ -1144,6 +1145,74 @@ namespace selftest {
 
 /* Selftests of location handling.  */
 
+/* A class for writing out a temporary sourcefile for use in selftests
+   of input handling.  */
+
+class temp_source_file
+{
+ public:
+  temp_source_file (const location &loc, const char *suffix,
+   const char *content);
+  ~temp_source_file ();
+
+  const char *get_filename () const { return m_filename; }
+
+ private:
+  char *m_filename;
+};
+
+/* Constructor.  Create a tempfile using SUFFIX, and write CONTENT to
+   it.  Abort if anything goes wrong, using LOC as the effective
+   location in the problem report.  */
+
+temp_source_file::temp_source_file (const location &loc, const char *suffix,
+   const char *content)
+{
+  m_filename = make_temp_file (suffix);
+  ASSERT_NE (m_filename, NULL);
+
+  FILE *out = fopen (m_filename, "w");
+  if (!out)
+::selftest::fail_formatted (loc, "unable to open tempfile: %s",
+   m_filename);
+  fprintf (out, content);
+  fclose (out);
+}
+
+/* Destructor.  Delete the tempfile.  */
+
+temp_source_file::~te

[PATCH/AARCH64] Accept vulcan as a cpu name for the AArch64 port of GCC

2016-06-17 Thread Virendra Pathak
Hi,

Please find the patch for introducing vulcan as a cpu name for the
AArch64 port of GCC.
Broadcom's vulcan is an armv8.1-a aarch64 server processor.

Since vulcan is the first armv8.1-a processor to be introduced in
aarch64-cores.def,
I have created a new section in the file for the armv8.1 based processors.
Kindly let me know if that is okay.

Tested the patch with cross aarch64-linux-gnu, bootstrapped native
aarch64-unknown-linux-gnu
and make check (gcc, ld, gas, binutils, gdb).
No new regression failure is added by this patch.

In addition, tested -mcpu=vulcan -mtune=vulcan flags by passing them
via command line.
Also verified that above flags passes armv8.1-a option to assembler(as).

At present we are using schedule & cost model of cortex-a57 but
soon we will be submitting one for vulcan.

Please review the patch.
Ok for trunk?


gcc/ChangeLog:

Virendra Pathak 

* config/aarch64/aarch64-cores.def (vulcan): New core.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document vulcan as an available option.




with regards,
Virendra Pathak
From be0c77cce98d6dffe7b8d607df25ecb4386a1d34 Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Mon, 13 Jun 2016 03:18:08 -0700
Subject: [PATCH] [AArch64] Accept vulcan as a cpu name for the AArch64 port of
 GCC

---
 gcc/config/aarch64/aarch64-cores.def | 4 
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 4 ++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 251a3eb..ced8f94 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -49,6 +49,10 @@ AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  
AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, "0x50", "0x000")
 
+/* V8.1 Architecture Processors.  */
+
+AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516")
+
 /* V8 big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index cbc6f48..8c4a0e9 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa11209..2666592 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13063,8 +13063,8 @@ Specify the name of the target processor for which GCC 
should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
 @samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
-@samp{xgene1}, @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
-@samp{native}.
+@samp{xgene1}, @samp{vulcan}, @samp{cortex-a57.cortex-a53},
+@samp{cortex-a72.cortex-a53}, @samp{native}.
 
 The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}
 specify that GCC should tune for a big.LITTLE system.
-- 
2.1.0



[PATCH] C++ FE: Show both locations in string literal concatenation error

2016-06-17 Thread David Malcolm
We can use rich_location and the new diagnostic_show_locus to print
both locations when complaining about a bogus string concatenation
in the C++ FE, giving e.g.:

test.C:3:24: error: unsupported non-standard concatenation of string literals
 const void *s = u8"a"  u"b";
 ~  ^~~~

Earlier versions of this were posted as part of
  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00730.html
"[PATCH 10/22] C++ FE: Use token ranges for various diagnostics"
and:
  https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01497.html
though the implementation has changed slightly.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
adds 7 PASS results to g++.sum.

OK for trunk?

gcc/cp/ChangeLog:
* parser.c (cp_parser_string_literal): Convert non-standard
concatenation error to directly use a rich_location, and
use that to add the location of the first literal to the
diagnostic.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/string-literal-concat.C: New test case.
---
 gcc/cp/parser.c| 15 +-
 .../g++.dg/diagnostic/string-literal-concat.C  | 23 ++
 2 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 632b25f..e1e9271 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3893,13 +3893,12 @@ cp_parser_string_literal (cp_parser *parser, bool 
translate, bool wide_ok,
 }
   else
 {
-  location_t last_tok_loc;
+  location_t last_tok_loc = tok->location;
   gcc_obstack_init (&str_ob);
   count = 0;
 
   do
{
- last_tok_loc = tok->location;
  cp_lexer_consume_token (parser->lexer);
  count++;
  str.text = (const unsigned char *)TREE_STRING_POINTER (string_tree);
@@ -3931,13 +3930,19 @@ cp_parser_string_literal (cp_parser *parser, bool 
translate, bool wide_ok,
  if (type == CPP_STRING)
type = curr_type;
  else if (curr_type != CPP_STRING)
-   error_at (tok->location,
- "unsupported non-standard concatenation "
- "of string literals");
+   {
+ rich_location rich_loc (line_table, tok->location);
+ rich_loc.add_range (last_tok_loc, false);
+ error_at_rich_loc (&rich_loc,
+"unsupported non-standard concatenation "
+"of string literals");
+   }
}
 
  obstack_grow (&str_ob, &str, sizeof (cpp_string));
 
+ last_tok_loc = tok->location;
+
  tok = cp_lexer_peek_token (parser->lexer);
  if (cpp_userdef_string_p (tok->type))
{
diff --git a/gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C 
b/gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C
new file mode 100644
index 000..4ede799
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C
@@ -0,0 +1,23 @@
+/* { dg-options "-fdiagnostics-show-caret -std=c++11" } */
+
+const void *s = u8"a"  u"b";  // { dg-error "24: non-standard concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s = u8"a"  u"b";
+ ~  ^~~~
+   { dg-end-multiline-output "" } */
+
+const void *s2 = u"a"  u"b"  u8"c";  // { dg-error "30: non-standard 
concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s2 = u"a"  u"b"  u8"c";
+  ^
+  { dg-end-multiline-output "" } */
+
+#define TEST_U8_LITERAL u8"a"
+
+const void *s3 = TEST_U8_LITERAL u8"b";
+
+const void *s4 = TEST_U8_LITERAL u"b"; // { dg-error "34: non-standard 
concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s4 = TEST_U8_LITERAL u"b";
+  ^~~~
+  { dg-end-multiline-output "" } */
-- 
1.8.5.3



[PATCH] libstdc++/71545 fix debug checks in binary search algorithms

2016-06-17 Thread Jonathan Wakely

PR libstdc++/71545
* include/bits/stl_algobase.h (lower_bound, lexicographical_compare):
Remove irreflexive checks.
* include/bits/stl_algo.h (lower_bound, upper_bound, equal_range,
binary_search): Likewise.
* testsuite/25_algorithms/equal_range/partitioned.cc: New test.
* testsuite/25_algorithms/lexicographical_compare/71545.cc: New test.
* testsuite/25_algorithms/lower_bound/partitioned.cc: New test.
* testsuite/25_algorithms/upper_bound/partitioned.cc: New test.
* testsuite/util/testsuite_iterators.h (__gnu_test::test_container):
Add constructor from array.

The binary search algos and lexicographical_compare do not require the
comparison function to be irreflexive, so the recently-added debug
mode checks need to be removed.

Tested x86_64-linux, committed to trunk. gcc-6-branch commit to
follow.
commit e775b35ff6cb0b6843ec2f1c8bf3a136deb898dd
Author: Jonathan Wakely 
Date:   Fri Jun 17 11:09:18 2016 +0100

libstdc++/71545 fix debug checks in binary search algorithms

PR libstdc++/71545
* include/bits/stl_algobase.h (lower_bound, lexicographical_compare):
Remove irreflexive checks.
* include/bits/stl_algo.h (lower_bound, upper_bound, equal_range,
binary_search): Likewise.
* testsuite/25_algorithms/equal_range/partitioned.cc: New test.
* testsuite/25_algorithms/lexicographical_compare/71545.cc: New test.
* testsuite/25_algorithms/lower_bound/partitioned.cc: New test.
* testsuite/25_algorithms/upper_bound/partitioned.cc: New test.
* testsuite/util/testsuite_iterators.h (__gnu_test::test_container):
Add constructor from array.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index fbd03a7..c2ac031 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2026,7 +2026,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
typename iterator_traits<_ForwardIterator>::value_type, _Tp>)
   __glibcxx_requires_partitioned_lower_pred(__first, __last,
__val, __comp);
-  __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
 
   return std::__lower_bound(__first, __last, __val,
__gnu_cxx::__ops::__iter_comp_val(__comp));
@@ -2080,7 +2079,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_LessThanOpConcept<
_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
-  __glibcxx_requires_irreflexive2(__first, __last);
 
   return std::__upper_bound(__first, __last, __val,
__gnu_cxx::__ops::__val_less_iter());
@@ -2112,7 +2110,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
__val, __comp);
-  __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
 
   return std::__upper_bound(__first, __last, __val,
__gnu_cxx::__ops::__val_comp_iter(__comp));
@@ -2186,7 +2183,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_lower(__first, __last, __val);
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
-  __glibcxx_requires_irreflexive2(__first, __last);
 
   return std::__equal_range(__first, __last, __val,
__gnu_cxx::__ops::__iter_less_val(),
@@ -2225,7 +2221,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__val, __comp);
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
__val, __comp);
-  __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
 
   return std::__equal_range(__first, __last, __val,
__gnu_cxx::__ops::__iter_comp_val(__comp),
@@ -2255,7 +2250,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_lower(__first, __last, __val);
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
-  __glibcxx_requires_irreflexive2(__first, __last);
 
   _ForwardIterator __i
= std::__lower_bound(__first, __last, __val,
@@ -2291,7 +2285,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__val, __comp);
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
__val, __comp);
-  __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
 
   _ForwardIterator __i
= std::__lower_bound(__first, __last, 

Re: [C++ PATCH] Fix some DECL_BUILT_IN uses in C++ FE

2016-06-17 Thread Jason Merrill
OK.

Jason


Re: [C++ Patch] One more error + error to error + inform and a subtler issue

2016-06-17 Thread Jason Merrill
On Wed, Jun 15, 2016 at 5:15 AM, Paolo Carlini  wrote:
> +  /* Likewise for the constexpr specifier, in case t is a specialization
> + and we are emitting an error about an incompatible redeclaration.  */

It doesn't need to be in an error about a redeclaration; in general a
specialization can differ in 'constexpr' from its template.  OK with
the second line removed from the comment.

Jason


Re: [C++ Patch] One more error + error to error + inform

2016-06-17 Thread Jason Merrill
OK.

Jason


Re: [PATCH, libgcc/ARM 1a/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-06-17 Thread Thomas Preudhomme
On Wednesday 01 June 2016 10:00:52 Ramana Radhakrishnan wrote:
> Please fix up the macros, post back and redo the test. Otherwise this
> is ok from a quick read.

What about the updated patch in attachment? As for the original patch, I've 
checked that code generation does not change for a number of combinations of 
ISAs (ARM/Thumb), optimization levels (Os/O2), and architectures (armv4, 
armv4t, armv5, armv5t, armv5te, armv6, armv6j, armv6k, armv6s-m, armv6kz, 
armv6t2, armv6z, armv6zk, armv7, armv7-a, armv7e-m, armv7-m, armv7-r, armv7ve, 
armv8-a, armv8-a+crc, iwmmxt and iwmmxt2).

Note, I renumbered this patch 1a to not make the numbering of other patches 
look strange. The CLZ part is now in patch 1b/7.

ChangeLog entries are now as follow:


*** gcc/ChangeLog ***

2016-05-23  Thomas Preud'homme  

* config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
decide whether to prevent some libgcc routines being included for some
multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
link between this condition and the one in
libgcc/config/arm/lib1func.S.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
__ARM_ARCH_ISA_ARM to test for Cortex-M devices.


*** libgcc/ChangeLog ***

2016-06-01  Thomas Preud'homme  

* config/arm/bpabi-v6m.S: Clarify what architectures is the
implementation suitable for.
* config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
for all Thumb-1 only targets.
(NOT_ISA_TARGET_32BIT): Define for Thumb-1 only targets.
(THUMB_LDIV0): Test for NOT_ISA_TARGET_32BIT rather than
__ARM_ARCH_6M__.
(EQUIV): Likewise.
(ARM_FUNC_ALIAS): Likewise.
(umodsi3): Add check to __ARM_ARCH_ISA_THUMB != 1 to guard the idiv
version.
(modsi3): Likewise.
(clzsi2): Test for NOT_ISA_TARGET_32BIT rather than __ARM_ARCH_6M__.
(clzdi2): Likewise.
(ctzsi2): Likewise.
(L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
__ARM_ARCH_6M__ in guard for checking whether it is defined.
(final includes): Test for NOT_ISA_TARGET_32BIT rather than
__ARM_ARCH_6M__ and add comment to indicate the connection between
this condition and the one in gcc/config/arm/elf.h.
* config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
__ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
* config/arm/t-softfp: Likewise.


Best regards,

Thomasdiff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h
index 77f30554d5286bd83aeab0c8dc308cfd44e732dc..246de5492665ba2a0292736a9c53fbaaef184d72 100644
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -148,8 +148,9 @@
   while (0)
 
 /* Horrible hack: We want to prevent some libgcc routines being included
-   for some multilibs.  */
-#ifndef __ARM_ARCH_6M__
+   for some multilibs.  The condition should match the one in
+   libgcc/config/arm/lib1funcs.S.  */
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
 #undef L_fixdfsi
 #undef L_fixunsdfsi
 #undef L_truncdfsf2
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 04ca17656f2f26dda710e8a0f9ca77dd963ab39b..38151375c29cd007f1cc34ead3aa495606224061 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3320,10 +3320,8 @@ proc check_effective_target_arm_cortex_m { } {
 	return 0
 }
 return [check_no_compiler_messages arm_cortex_m assembly {
-	#if !defined(__ARM_ARCH_7M__) \
-&& !defined (__ARM_ARCH_7EM__) \
-&& !defined (__ARM_ARCH_6M__)
-	#error !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && !__ARM_ARCH_6M__
+	#if defined(__ARM_ARCH_ISA_ARM)
+	#error __ARM_ARCH_ISA_ARM is defined
 	#endif
 	int i;
 } "-mthumb"]
diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 5d35aa6afca224613c94cf923f8a2ee8dac949f2..27f33a4e8ced2cb2da8e38f5d78501954ee7363b 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -1,4 +1,5 @@
-/* Miscellaneous BPABI functions.  ARMv6M implementation
+/* Miscellaneous BPABI functions.  Thumb-1 implementation, suitable for ARMv4T,
+   ARMv6-M and ARMv8-M Baseline like ISA variants.
 
Copyright (C) 2006-2016 Free Software Foundation, Inc.
Contributed by CodeSourcery.
diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 375a5135110895faa44267ebee045fd315515027..951dcda1c3bf7f323423a3e2813bdf0501653016 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -124,10 +124,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
  && !defined(__thumb2__)		\
  && (!defined(__THUMB_INTERWORK__)	\
 	 || defined (__OPTIMIZE_SIZE__)	\
-	 || defined(__ARM_ARCH_6M__)))
+	 || !__ARM_ARCH_ISA_ARM))
 # def

Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-17 Thread Jeff Law

On 06/17/2016 07:07 AM, Jakub Sejdak wrote:

So at least in the immediate term let's get you write privileges so you can
commit approved changes and on the path towards maintaining the Phoenix-RTOS
configurations.


Do I have to apply for this permission somewhere? Provided page states
only, that it has to be granted by an existing maintainer.

Yes, there's a link to this form:

https://sourceware.org/cgi-bin/pdw/ps_form.cgi

List my email address (l...@redhat.com) as approving your request for 
write access.


jeff



Re: [PATCH] Fix memory leak in tree-ssa-reassoc.c

2016-06-17 Thread Jeff Law

On 06/17/2016 07:14 AM, Martin Liška wrote:

Hi.

Following simple patch fixes a newly introduced memory leak.

Patch survives regression tests and bootstraps on x86_64-linux.

Ready from trunk?
Thanks,
Martin


0001-Fix-memory-leak-in-tree-ssa-reassoc.c.patch


From a2e6be16d7079b744db4d383b8317226ab53ff58 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 17 Jun 2016 12:26:58 +0200
Subject: [PATCH] Fix memory leak in tree-ssa-reassoc.c

gcc/ChangeLog:

2016-06-17  Martin Liska  

* tree-ssa-reassoc.c (transform_add_to_multiply): Use auto_vec.

OK.

And more generally, conversion from vec to auto_vec to fix memory leaks 
or eliminate explicit memory management is pre-approved.


Jeff



Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Jeff Law

On 06/17/2016 08:33 AM, Ilya Enkovich wrote:


Hmm, there seems to be a level of indirection I'm missing here.  We're
smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
the original loop to the vectorized epilogue.  What am I missing?  Rather
than smuggling around in the aux field, is there some inherent reason why we
can't just copy the info from the original loop directly into
LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?


LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
 - mark this loop as epilogue
 - get VF of original loop (required for both mask and nomask modes)
 - get decision about epilogue masking

That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
place to bind it with the original loop and therefore I use vacant loop->aux
for that.  Any other way to bind epilogue with its original loop would work
as well.  I just chose loop->aux to avoid new fields and data structures.
I was starting to draw the conclusion that the smuggling in the aux 
field was for cases when there was no LOOP_VINFO.  But was rather late 
at night and I didn't follow that idea through the code.  THanks for 
clarifying.





And something just occurred to me -- is there some inherent reason why SLP
doesn't vectorize the epilogue, particularly for the cases where we can
vectorize the epilogue using smaller vectors?  Sorry if you've already
answered this somewhere or it's a dumb question.


IIUC this may happen only if we unroll epilogue into a single BB which happens
only when epilogue iterations count is known. Right?
Probably.  The need to make sure the epilogue is unrolled probably makes 
this a non-starter.


I have a soft spot for SLP as I stumbled on the idea while rewriting a 
presentation in the wee hours of the morning for the next day. 
Essentially it was a "poor man's" vectorizer that could be done for 
dramatically less engineering cost than a traditional vectorizer.  The 
MIT paper outlining the same ideas came out a couple years later...




+   /* Add new loop to a processing queue.  To make it easier

+  to match loop and its epilogue vectorization in dumps
+  put new loop as the next loop to process.  */
+   if (new_loop)
+ {
+   loops.safe_insert (i + 1, new_loop->num);
+   vect_loops_num = number_of_loops (cfun);
+ }
+


So just to be clear, the only reason to do this is for dumps -- other than
processing the loop before it's epilogue, there's no other inherently
necessary ordering of the loops, right?


Right, I don't see other reasons to do it.

Perfect.  Thanks for confirming.

jeff



C++ PATCH for c++/71143, 71209 (bogus error with dependent base)

2016-06-17 Thread Jason Merrill
Now that we have stopped treating *this as a dependent scope, we need
to avoid giving errors for not finding things when we have dependent
bases.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit d553bc7ff104a8d973c3f48c005457038422db26
Author: Jason Merrill 
Date:   Fri Jun 17 12:16:00 2016 -0400

PR c++/71209 - wrong error with dependent base

* typeck.c (finish_class_member_access_expr): Avoid "not a base"
warning when there are dependent bases.

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 2ccd2da..3704b88 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2797,6 +2797,8 @@ finish_class_member_access_expr (cp_expr object, tree 
name, bool template_p,
return error_mark_node;
  if (!access_path)
{
+ if (any_dependent_bases_p (object_type))
+   goto dependent;
  if (complain & tf_error)
error ("%qT is not a base of %qT", scope, object_type);
  return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/template/dependent-base1.C 
b/gcc/testsuite/g++.dg/template/dependent-base1.C
new file mode 100644
index 000..392305b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/dependent-base1.C
@@ -0,0 +1,10 @@
+// PR c++/71209
+
+struct A {
+  int table_clear;
+};
+
+template 
+struct B : T {
+  B() { this->A::table_clear; }
+};


Re: RFC: pass to warn on questionable uses of alloca().

2016-06-17 Thread Jeff Law

On 06/16/2016 02:32 AM, Aldy Hernandez wrote:

Hi folks!

I've been working on a plugin to warn on unbounded uses of alloca() to
help find questionable uses in glibc and other libraries.  It occurred
to me that the broader community could benefit from it, as it has found
quite a few interesting cases. So, I've reimplemented it as an actual
pass, lest it be lost in plugin la-la land and bit-rot.

And just to provide more background.

In my time caretaking glibc for Red Hat, unbound allocas were the single 
most commonly exploited problem in glibc.  They can be used for stack 
shifting or under-allocating objects which in turn allow for the bad 
guys to start scribbling data into memory at locations under the 
attacker's control.


In fact, I saw this enough that I'm of the opinion that we as developers 
simply aren't capable of using alloca correctly and that its explicit 
use ought to be banned by policy.  Anyway.





Before I sink any more time cleaning it up, would this be something
acceptable into the compiler?  It doesn't have anything glibc specific,
except possibly the following idiom which I allow:

I strongly believe it ought to be cleaned up and brought into GCC.



p.s. The pass currently warns on all uses of VLAs.  I'm not completely
sold on this idea, so perhaps we could remove it, or gate it with a flag.
An VLA where the size is under attacker control is no different than an 
unbound or overflowing alloca.  Negative sizes in particular are easy to 
exploit, though the same effect can be achieved overflowing the actual 
size computation.


So I think this problem turns into whether or not we can see the size of 
the allocated object and use that to guide warning.  This also 
introduces the idea of somehow marking objects which are under user 
control (and propagating that property) and using that to help guide 
analysis.


Jeff




Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Jeff Law

On 06/17/2016 08:29 AM, Michael Matz wrote:

Hi,

On Fri, 17 Jun 2016, Bernd Schmidt wrote:


On 06/17/2016 04:03 PM, Michael Matz wrote:

But does this really improve something?  Essentially you're replacing

   0xc9 0xc3 

(the end of a function containing "leave;ret") with

   0xe9  

where the four random bytes are different for each rewritten function
return (but correlated as they differ exactly by their position
difference).

I'm not sure why the latter sequence is better?


I think I'm missing what you're trying to say. The latter sequence does not
contain a return opcode hence it ought to be better?


The "0xe9 " essentially is the leave+return opcode,
after all it jumps to them (let's ignore the possibility that the jump
target address might contain a 0xc3 byte).  So if the attacker finds some
interesting gadget in  I don't see how the change from
leave+ret to jump-to-leave+ret changes anything from a threat avoidance
perspective.  It's fully possible that I don't understand the threat
vector of ROP correctly, in which case I'd also like to know :)

A couple things to note.

I expect that we'll be doing work in the assembler and linker to address 
cases where 0xc3 shows up in immediate displacements, absolute addresses 
and the like.  The easiest ones are when 0xc3 shows up as a byte 
displacement for pc-relative jumps, but there are others.


I haven't looked at the random bytes stuff Bernd has done, but its 
likely to ensure that the bad guys can't jump into the middle of an 
instruction prior to the leave and use that to skip the leave.


Jeff


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Bin.Cheng
On Fri, Jun 17, 2016 at 4:37 PM, Jeff Law  wrote:
> On 06/17/2016 08:48 AM, Bin.Cheng wrote:


> +  /* FORNOW: Currently alias checks are not inherited for epilogues.
> + Don't try to vectorize epilogue because it will require
> + additional alias checks.  */


 Are the alias checks here redundant with the ones done for the original
 loop?  If so won't DOM eliminate them?
>>>
>>>
>>> I revisited this part recently and thought it should actually be safe to
>>> assume we have no aliasing in epilogue because we are dominated by alias
>>> checks of the original loop.  So I prepared a patch to remove this
>>> restriction
>>> and avoid alias checks generation for epilogues (so we compute aliases
>>> checks
>>> required but don't emit them).  I didn't send this patch yet.
>>> Do you think it is a valid assumption?
>>
>> I recently visited that part and agree it's valid, unless epilogue
>> loop is vectorized in larger vector-units, but that would be unlikely
>> to happen, right?  BTW, does this patch start all over analyzing
>> epilogue loop?  As you said the alias checks will be computed.
>
> I think we're OK either way.  If you emit the checks, DOM ought to eliminate
> them as they'd be dominated by the earlier check.
Unfortunately DOM probably can't.  Especially constant offsets are
folded deep in expressions and they could be different under smaller
vector-units.  Even it can, it will introduce long live range since
check result will be combined with some others.  Not sure if all
checks can be avoided, alignment checks should be ok too?

Thanks,
bin
>
> But I'm a fan of not generating dumb code for later passes to clean up, so I
> think we should just avoid generating the additional checks if we can
> reasonably do so in the vectorizer.
>
> I can't envision a scenario where we'd want a larger vector size in the
> epilogue than the main loop.
>
> Jeff
>


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Jeff Law

On 06/17/2016 04:06 AM, Bernd Schmidt wrote:

This is another step to flesh out -mmitigate-rop for i386 a little more.
The basic idea was (I think) Richard Henderson's: if we could arrange to
have every return preceded by a leave instruction, it would make it
harder to construct an attack since it takes away a certain amount of
control over the stack pointer. I extended this to move the leave/ret
pair to libgcc, preceded by a sequence of nops, so as to take away the
possibility of jumping into the middle of an instruction preceding the
leave/ret pair and thereby skipping the leave.
I don't think anyone on our team can take credit for the idea.  We found 
that folks working in this space were calling out leave;ret as being 
harder to exploit.


The key being that to use leave;ret they have to control the frame 
pointer and the saved return address.  Typically they have control of 
just the saved return address.




This has a performance impact when -mmitigate-rop is enabled, I made
some measurements a while ago and it looks like it's about twice the
impact of -fno-omit-frame-pointer.
Right.  My idea is to use this mitigation for functions which aren't 
protected by SSP (fixing the SSP epilogues is a distinct project, 
Florian should have some details on what we need to do to make those 
difficult to attack).  So we're not paying the cost on every function, 
just those which aren't protected by SSP.


Jeff


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Jeff Law

On 06/17/2016 08:16 AM, Ilya Enkovich wrote:


I do think you've got a legitimate question though.   Ilya, can you give any
insights here based on your KNL and Haswell testing or data/insights from
the LLVM and/or ICC teams?


I have no information about LLVM.  As I said in other thread ICC uses all
options (masked epilogue, combined loop, vectorized epilogue with smaller
vector size).  It also may generate different versions (e.g. combined and
with masked epilogue) and choose dynamically depending on iterations count.
Any guidance from the ICC team on the costing model to choose between 
the different approaches?


I'm a bit surprised that there's enough value in doing this much work to 
vectorize the epilogue, but that appears to be the case...


jeff


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Jeff Law

On 06/17/2016 08:48 AM, Bin.Cheng wrote:



+  /* FORNOW: Currently alias checks are not inherited for epilogues.
+ Don't try to vectorize epilogue because it will require
+ additional alias checks.  */


Are the alias checks here redundant with the ones done for the original
loop?  If so won't DOM eliminate them?


I revisited this part recently and thought it should actually be safe to
assume we have no aliasing in epilogue because we are dominated by alias
checks of the original loop.  So I prepared a patch to remove this restriction
and avoid alias checks generation for epilogues (so we compute aliases checks
required but don't emit them).  I didn't send this patch yet.
Do you think it is a valid assumption?

I recently visited that part and agree it's valid, unless epilogue
loop is vectorized in larger vector-units, but that would be unlikely
to happen, right?  BTW, does this patch start all over analyzing
epilogue loop?  As you said the alias checks will be computed.
I think we're OK either way.  If you emit the checks, DOM ought to 
eliminate them as they'd be dominated by the earlier check.


But I'm a fan of not generating dumb code for later passes to clean up, 
so I think we should just avoid generating the additional checks if we 
can reasonably do so in the vectorizer.


I can't envision a scenario where we'd want a larger vector size in the 
epilogue than the main loop.


Jeff



Re: [DOC PATCH] Rewrite docs for inline asm

2016-06-17 Thread Andrew Haley
On 04/04/14 20:48, dw wrote:
> I do not have write permissions to check this patch in.

We must fix that.

Andrew.



Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Ilya Enkovich
2016-06-17 17:48 GMT+03:00 Bin.Cheng :
> On Fri, Jun 17, 2016 at 3:33 PM, Ilya Enkovich  wrote:
>> 2016-06-16 9:00 GMT+03:00 Jeff Law :
>>> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:

 Hi,

 This patch introduces changes required to run vectorizer on loop epilogue.
 This also enables epilogue vectorization using a vector of smaller size.

 Thanks,
 Ilya
 --
 gcc/

 2016-05-19  Ilya Enkovich  

 * tree-if-conv.c (tree_if_conversion): Make public.
 * tree-if-conv.h: New file.
 * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
 try to enhance alignment for epilogues.
 * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
 created loop.
 * tree-vect-loop.c: include tree-if-conv.h.
 (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
 loop->aux.
 (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
 loop->aux.
 (vect_analyze_loop): Reset loop->aux.
 (vect_transform_loop): Check if created epilogue should be
 returned
 for further vectorization.  If-convert epilogue if required.
 * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
 process and insert vectorized loop epilogues into this queue.
 * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
 created
 loop.
 (vect_transform_loop): Return created loop.
>>>
>>> As Richi noted, the additional calls into the if-converter are unfortunate.
>>> I'm not sure how else to avoid them though.  It looks like we can run
>>> if-conversion on just the epilogue, so maybe that's not too bad.
>>>
>>>
 @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
 bool clean_stmts)
destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
loop_vinfo->scalar_cost_vec.release ();

 +  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
free (loop_vinfo);
 -  loop->aux = NULL;
  }
>>>
>>> Hmm, there seems to be a level of indirection I'm missing here.  We're
>>> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
>>> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
>>> the original loop to the vectorized epilogue.  What am I missing?  Rather
>>> than smuggling around in the aux field, is there some inherent reason why we
>>> can't just copy the info from the original loop directly into
>>> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
>>
>> LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
>>  - mark this loop as epilogue
>>  - get VF of original loop (required for both mask and nomask modes)
>>  - get decision about epilogue masking
>>
>> That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
>> vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
>> loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
>> LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
>> place to bind it with the original loop and therefore I use vacant loop->aux
>> for that.  Any other way to bind epilogue with its original loop would work
>> as well.  I just chose loop->aux to avoid new fields and data structures.
>>
>>>
 +  /* FORNOW: Currently alias checks are not inherited for epilogues.
 + Don't try to vectorize epilogue because it will require
 + additional alias checks.  */
>>>
>>> Are the alias checks here redundant with the ones done for the original
>>> loop?  If so won't DOM eliminate them?
>>
>> I revisited this part recently and thought it should actually be safe to
>> assume we have no aliasing in epilogue because we are dominated by alias
>> checks of the original loop.  So I prepared a patch to remove this 
>> restriction
>> and avoid alias checks generation for epilogues (so we compute aliases checks
>> required but don't emit them).  I didn't send this patch yet.
>> Do you think it is a valid assumption?
> I recently visited that part and agree it's valid, unless epilogue
> loop is vectorized in larger vector-units, but that would be unlikely
> to happen, right?  BTW, does this patch start all over analyzing
> epilogue loop?  As you said the alias checks will be computed.

Original loop is vectorized for the max possible vector size and we can't
(and don't want to) choose a bigger one.

We don't preserve any info for epilogue.  Actually even when we try various
vector sizes for a single loop we recompute everything for each vector size.

Thanks,
Ilya

>
> Thanks,
> bin
>>
>>>
>>>
>>> And something just occurred to me -- is there some inherent reason why SLP
>>> doesn't vectorize the epilogue, particularly for the cases where we can
>>> vectorize the epilogue using smaller vectors?  Sorry if you've already
>>> a

Re: [Patch AArch64] Fixup to fcvt patterns added in r237200

2016-06-17 Thread Christophe Lyon
On 17 June 2016 at 16:44, James Greenhalgh  wrote:
> On Fri, Jun 17, 2016 at 04:25:31PM +0200, Christophe Lyon wrote:
>> On 10 June 2016 at 14:29, James Greenhalgh  wrote:
>> >
>> > Hi,
>> >
>> > My autotester picked up some issues with the vcvt{ds}_n_* intrinsics
>> > added in r237200.
>> >
>> Hi,
>>
>> What tests does your autotester perform? I haven't noticed these
>> problems when running the GCC testsuite on the usual aarch64
>> targets. I'm interested in increasing coverage, if doable.
>
> Hi Christophe,
>
> I think we've spoken about this before [1], but the autotester is using
> an internal testsuite that is not feasible to share upstream.

Ha, indeed. I wasn't sure you were referring to the same tests.

>
> To see the sorts of tests that we're running, have a look at the LLVM
> testsuite. If the layout of the testsuite hasn't changed since I last
> looked, you should be able to find an example at:
>
> 
> /SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c
>
> Hope that helps,
> James
>
> [1]: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00775.html
>


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Bin.Cheng
On Fri, Jun 17, 2016 at 3:33 PM, Ilya Enkovich  wrote:
> 2016-06-16 9:00 GMT+03:00 Jeff Law :
>> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> This patch introduces changes required to run vectorizer on loop epilogue.
>>> This also enables epilogue vectorization using a vector of smaller size.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2016-05-19  Ilya Enkovich  
>>>
>>> * tree-if-conv.c (tree_if_conversion): Make public.
>>> * tree-if-conv.h: New file.
>>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>>> try to enhance alignment for epilogues.
>>> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>>> created loop.
>>> * tree-vect-loop.c: include tree-if-conv.h.
>>> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>>> loop->aux.
>>> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>>> loop->aux.
>>> (vect_analyze_loop): Reset loop->aux.
>>> (vect_transform_loop): Check if created epilogue should be
>>> returned
>>> for further vectorization.  If-convert epilogue if required.
>>> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>>> process and insert vectorized loop epilogues into this queue.
>>> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
>>> created
>>> loop.
>>> (vect_transform_loop): Return created loop.
>>
>> As Richi noted, the additional calls into the if-converter are unfortunate.
>> I'm not sure how else to avoid them though.  It looks like we can run
>> if-conversion on just the epilogue, so maybe that's not too bad.
>>
>>
>>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
>>> bool clean_stmts)
>>>destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
>>>loop_vinfo->scalar_cost_vec.release ();
>>>
>>> +  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>>>free (loop_vinfo);
>>> -  loop->aux = NULL;
>>>  }
>>
>> Hmm, there seems to be a level of indirection I'm missing here.  We're
>> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
>> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
>> the original loop to the vectorized epilogue.  What am I missing?  Rather
>> than smuggling around in the aux field, is there some inherent reason why we
>> can't just copy the info from the original loop directly into
>> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
>
> LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
>  - mark this loop as epilogue
>  - get VF of original loop (required for both mask and nomask modes)
>  - get decision about epilogue masking
>
> That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
> vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
> loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
> LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
> place to bind it with the original loop and therefore I use vacant loop->aux
> for that.  Any other way to bind epilogue with its original loop would work
> as well.  I just chose loop->aux to avoid new fields and data structures.
>
>>
>>> +  /* FORNOW: Currently alias checks are not inherited for epilogues.
>>> + Don't try to vectorize epilogue because it will require
>>> + additional alias checks.  */
>>
>> Are the alias checks here redundant with the ones done for the original
>> loop?  If so won't DOM eliminate them?
>
> I revisited this part recently and thought it should actually be safe to
> assume we have no aliasing in epilogue because we are dominated by alias
> checks of the original loop.  So I prepared a patch to remove this restriction
> and avoid alias checks generation for epilogues (so we compute aliases checks
> required but don't emit them).  I didn't send this patch yet.
> Do you think it is a valid assumption?
I recently visited that part and agree it's valid, unless epilogue
loop is vectorized in larger vector-units, but that would be unlikely
to happen, right?  BTW, does this patch start all over analyzing
epilogue loop?  As you said the alias checks will be computed.

Thanks,
bin
>
>>
>>
>> And something just occurred to me -- is there some inherent reason why SLP
>> doesn't vectorize the epilogue, particularly for the cases where we can
>> vectorize the epilogue using smaller vectors?  Sorry if you've already
>> answered this somewhere or it's a dumb question.
>
> IIUC this may happen only if we unroll epilogue into a single BB which happens
> only when epilogue iterations count is known. Right?
>
>>
>>
>>
>>>
>>> +   /* Add new loop to a processing queue.  To make it easier
>>> +  to match loop and its epilogue vectorization in dumps
>>> +  put new loop as the next loop to process.  */
>>> +   if (new_loop)
>>> + 

Re: [Patch AArch64] Fixup to fcvt patterns added in r237200

2016-06-17 Thread James Greenhalgh
On Fri, Jun 17, 2016 at 04:25:31PM +0200, Christophe Lyon wrote:
> On 10 June 2016 at 14:29, James Greenhalgh  wrote:
> >
> > Hi,
> >
> > My autotester picked up some issues with the vcvt{ds}_n_* intrinsics
> > added in r237200.
> >
> Hi,
> 
> What tests does your autotester perform? I haven't noticed these
> problems when running the GCC testsuite on the usual aarch64
> targets. I'm interested in increasing coverage, if doable.

Hi Christophe,

I think we've spoken about this before [1], but the autotester is using
an internal testsuite that is not feasible to share upstream.

To see the sorts of tests that we're running, have a look at the LLVM
testsuite. If the layout of the testsuite hasn't changed since I last
looked, you should be able to find an example at:


/SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c

Hope that helps,
James

[1]: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00775.html



Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-17 Thread Jakub Jelinek
On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> The second set of changes involves teaching the gimplifier to error when
> it detects a function call to an non-acc routines inside an OpenACC
> offloaded region. Actually, I relaxed non-acc routines by excluding
> calls to builtin functions, including those prefixed with _gfortran_.
> Nvptx does have a newlib c library, and it also has a subset of
> libgfortran. Still, this solution is probably not optimal.

I don't really like that, hardcoding prefixes or whatever is available
(you have quite some subset of libc, libm etc. available too) in the
compiler looks very hackish.  What is wrong with complaining during
linking of the offloaded code?

> Next, I had to modify the openacc header files in libgomp to mark
> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> build the opeancc.mod module for gfortran with -fopenacc. But doing
> that, caused caused gcc to stream offloaded code to the openacc.o object
> file. So, I've updated the behavior of flag_generate_offload such that
> minus one indicates that the user specified -foffload=disable, and that
> will prevent gcc from streaming offloaded lto code. The alternative was
> to hack libtool to build libgomp with -foffload=disable.

This also looks wrong.  I'd say the right thing is when loading modules
that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
handled this well) into CU with the corresponding flags unset (-fopenacc,
-fopenmp, -fopenmp-simd here, depending on which bit it is), then
IMHO the module loading code should just ignore it, pretend it wasn't there.
Similarly e.g. to how lto1 with -g0 should ignore debug statements that
could be in the LTO inputs.

Jakub


Re: [PATCH][ARM] Delete thumb_reload_in_h

2016-06-17 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00821.html

Thanks,
Kyrill
On 10/06/16 15:55, Kyrill Tkachov wrote:

Hi all,

This function just ICEs and isn't actually called from anywhere.
It was introduced back in 2000 as part of a large merge introducing Thumb 
support
and was aborting even then. I don't think having it around is of any benefit.

Tested on arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill

2016-06-10  Kyrylo Tkachov  

* config/arm/arm.c (thumb_reload_in_hi): Delete.
* config/arm/arm-protos.h (thumb_reload_in_hi): Delete prototype.




Re: OpenACC wait clause

2016-06-17 Thread Jakub Jelinek
On Thu, Jun 16, 2016 at 08:22:29PM -0700, Cesar Philippidis wrote:
> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -677,7 +677,6 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
> && gfc_match ("async") == MATCH_YES)
>   {
> c->async = true;
> -   needs_space = false;
> if (gfc_match (" ( %e )", &c->async_expr) != MATCH_YES)
>   {
> c->async_expr
> @@ -685,6 +684,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
>gfc_default_integer_kind,
>&gfc_current_locus);
> mpz_set_si (c->async_expr->value.integer, GOMP_ASYNC_NOVAL);
> +   needs_space = true;
>   }
> continue;
>   }
> @@ -1328,7 +1328,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
> && gfc_match ("wait") == MATCH_YES)
>   {
> c->wait = true;
> -   match_oacc_expr_list (" (", &c->wait_list, false);
> +   if (match_oacc_expr_list (" (", &c->wait_list, false) == MATCH_NO)
> + needs_space = true;
> continue;
>   }
> if ((mask & OMP_CLAUSE_WORKER)

I think it is still problematic.  Most of the parsing fortran FE errors are 
deferred,
meaning that if you don't reject the whole gfc_match_omp_clauses, then no
diagnostics is actually emitted.  Both
gfc_match (" ( %e )", &c->async_expr) and match_oacc_expr_list (" (", 
&c->wait_list, false)
IMHO can return MATCH_YES, MATCH_NO and MATCH_ERROR, and I believe you need
to do different actions in each case.
In particular, if something is optional, then for MATCH_YES you should
accept it (continue) and not set needs_space, because after ) you don't need
space.  If MATCH_NO, then you should accept it too (because it is optional),
and set needs_space = true; first and perhaps do whatever else you need to
do.  If MATCH_ERROR, then you should make sure not to accept it, e.g. by
doing break; or making sure continue will not be done (which one depends on
whether it might be validly parsed as some other clause, which is very
likely not the case).  In the above changes, you do it all except for the
MATCH_ERROR handling, where you still do continue; and thus I bet
diagnostics for it won't be reported.
E.g. for
!$omp acc parallel async(&abc)
!$omp acc end parallel
end
no diagnostics is reported.  Looking around, there are many more issues like
that, e.g. match_oacc_clause_gang(c) (note, wrong formatting) also ignores
MATCH_ERROR, etc.

> @@ -1649,7 +1650,7 @@ gfc_match_oacc_wait (void)
>gfc_expr_list *wait_list = NULL, *el;
>  
>match_oacc_expr_list (" (", &wait_list, true);
> -  gfc_match_omp_clauses (&c, OACC_WAIT_CLAUSES, false, false, true);
> +  gfc_match_omp_clauses (&c, OACC_WAIT_CLAUSES, false, true, true);
>  
>if (gfc_match_omp_eos () != MATCH_YES)
>  {

Can you explain this change?  I bet it again suffers from the above
mentioned issue.  If match_oacc_expr_list returns MATCH_YES, I believe you
want false, false, true as you don't need space in between the closing
) of the wait_list and name of next clause.  Note, does OpenACC allow also comma
in that case?
!$acc wait (whatever),async
?
If match_oacc_expr_list returns MATCH_NO, then IMHO it should be
true, true, true, because you don't want to accept
!$acc waitasync
and also don't want to accept
!$acc wait,async
And if match_oacc_expr_list returns MATCH_ERROR, you should reject it, so
that diagnostics is emitted.

Jakub


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Bernd Schmidt

On 06/17/2016 04:29 PM, Michael Matz wrote:

On Fri, 17 Jun 2016, Bernd Schmidt wrote:

On 06/17/2016 04:03 PM, Michael Matz wrote:

But does this really improve something?  Essentially you're replacing

   0xc9 0xc3 

(the end of a function containing "leave;ret") with

   0xe9  

where the four random bytes are different for each rewritten function
return (but correlated as they differ exactly by their position
difference).

I'm not sure why the latter sequence is better?


I think I'm missing what you're trying to say. The latter sequence does not
contain a return opcode hence it ought to be better?


The "0xe9 " essentially is the leave+return opcode,
after all it jumps to them (let's ignore the possibility that the jump
target address might contain a 0xc3 byte).  So if the attacker finds some
interesting gadget in  I don't see how the change from
leave+ret to jump-to-leave+ret changes anything from a threat avoidance
perspective.  It's fully possible that I don't understand the threat
vector of ROP correctly, in which case I'd also like to know :)


The advantage is that this way the attack can't skip the leave opcode by 
jumping into the "random bytes1" in your first sequence. Hence, we 
ensure the return path will always overwrite esp first, which is what's 
supposed to make the attack harder since now you need to control ebp as 
well.



Bernd


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Ilya Enkovich
2016-06-16 9:00 GMT+03:00 Jeff Law :
> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch introduces changes required to run vectorizer on loop epilogue.
>> This also enables epilogue vectorization using a vector of smaller size.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * tree-if-conv.c (tree_if_conversion): Make public.
>> * tree-if-conv.h: New file.
>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>> try to enhance alignment for epilogues.
>> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>> created loop.
>> * tree-vect-loop.c: include tree-if-conv.h.
>> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>> loop->aux.
>> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>> loop->aux.
>> (vect_analyze_loop): Reset loop->aux.
>> (vect_transform_loop): Check if created epilogue should be
>> returned
>> for further vectorization.  If-convert epilogue if required.
>> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>> process and insert vectorized loop epilogues into this queue.
>> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
>> created
>> loop.
>> (vect_transform_loop): Return created loop.
>
> As Richi noted, the additional calls into the if-converter are unfortunate.
> I'm not sure how else to avoid them though.  It looks like we can run
> if-conversion on just the epilogue, so maybe that's not too bad.
>
>
>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
>> bool clean_stmts)
>>destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
>>loop_vinfo->scalar_cost_vec.release ();
>>
>> +  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>>free (loop_vinfo);
>> -  loop->aux = NULL;
>>  }
>
> Hmm, there seems to be a level of indirection I'm missing here.  We're
> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
> the original loop to the vectorized epilogue.  What am I missing?  Rather
> than smuggling around in the aux field, is there some inherent reason why we
> can't just copy the info from the original loop directly into
> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?

LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
 - mark this loop as epilogue
 - get VF of original loop (required for both mask and nomask modes)
 - get decision about epilogue masking

That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
place to bind it with the original loop and therefore I use vacant loop->aux
for that.  Any other way to bind epilogue with its original loop would work
as well.  I just chose loop->aux to avoid new fields and data structures.

>
>> +  /* FORNOW: Currently alias checks are not inherited for epilogues.
>> + Don't try to vectorize epilogue because it will require
>> + additional alias checks.  */
>
> Are the alias checks here redundant with the ones done for the original
> loop?  If so won't DOM eliminate them?

I revisited this part recently and thought it should actually be safe to
assume we have no aliasing in epilogue because we are dominated by alias
checks of the original loop.  So I prepared a patch to remove this restriction
and avoid alias checks generation for epilogues (so we compute aliases checks
required but don't emit them).  I didn't send this patch yet.
Do you think it is a valid assumption?

>
>
> And something just occurred to me -- is there some inherent reason why SLP
> doesn't vectorize the epilogue, particularly for the cases where we can
> vectorize the epilogue using smaller vectors?  Sorry if you've already
> answered this somewhere or it's a dumb question.

IIUC this may happen only if we unroll epilogue into a single BB which happens
only when epilogue iterations count is known. Right?

>
>
>
>>
>> +   /* Add new loop to a processing queue.  To make it easier
>> +  to match loop and its epilogue vectorization in dumps
>> +  put new loop as the next loop to process.  */
>> +   if (new_loop)
>> + {
>> +   loops.safe_insert (i + 1, new_loop->num);
>> +   vect_loops_num = number_of_loops (cfun);
>> + }
>> +
>
> So just to be clear, the only reason to do this is for dumps -- other than
> processing the loop before it's epilogue, there's no other inherently
> necessary ordering of the loops, right?

Right, I don't see other reasons to do it.

Thanks,
Ilya

>
>
> Jeff


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Michael Matz
Hi,

On Fri, 17 Jun 2016, Bernd Schmidt wrote:

> On 06/17/2016 04:03 PM, Michael Matz wrote:
> > But does this really improve something?  Essentially you're replacing
> > 
> >0xc9 0xc3 
> > 
> > (the end of a function containing "leave;ret") with
> > 
> >0xe9  
> > 
> > where the four random bytes are different for each rewritten function
> > return (but correlated as they differ exactly by their position
> > difference).
> > 
> > I'm not sure why the latter sequence is better?
> 
> I think I'm missing what you're trying to say. The latter sequence does not
> contain a return opcode hence it ought to be better?

The "0xe9 " essentially is the leave+return opcode, 
after all it jumps to them (let's ignore the possibility that the jump 
target address might contain a 0xc3 byte).  So if the attacker finds some 
interesting gadget in  I don't see how the change from 
leave+ret to jump-to-leave+ret changes anything from a threat avoidance 
perspective.  It's fully possible that I don't understand the threat 
vector of ROP correctly, in which case I'd also like to know :)


Ciao,
Michael.


[PATCH] Change PRED_LOOP_EXIT from 92 to 85.

2016-06-17 Thread Martin Liška
Hello.

After we've recently applied various changes (fixes) to predict.c, SPEC2006
shows that PRED_LOOP_EXIT value should be amended.

Survives regression tests & bootstrap on x86_64-linux.
Pre-approved by Honza, installed as r237556.

Thanks,
Martin
>From 849c2e064bcadc269f82656d15722f28d1b1fe73 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 17 Jun 2016 14:44:24 +0200
Subject: [PATCH] Change PRED_LOOP_EXIT from 92 to 85.

contrib/ChangeLog:

2016-06-17  Martin Liska  

	* analyze_brprob.py: Fix columns of script output.

gcc/ChangeLog:

2016-06-17  Martin Liska  

	* predict.def: PRED_LOOP_EXIT from 92 to 85.

gcc/testsuite/ChangeLog:

2016-06-17  Martin Liska  

	* gcc.dg/predict-9.c: Fix dump scanning.
---
 contrib/analyze_brprob.py| 4 ++--
 gcc/predict.def  | 2 +-
 gcc/testsuite/gcc.dg/predict-9.c | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/contrib/analyze_brprob.py b/contrib/analyze_brprob.py
index 9808c46..2526623 100755
--- a/contrib/analyze_brprob.py
+++ b/contrib/analyze_brprob.py
@@ -119,10 +119,10 @@ class Profile:
 elif sorting == 'coverage':
 sorter = lambda x: x[1].count
 
-print('%-36s %8s %6s  %-16s %14s %8s %6s' % ('HEURISTICS', 'BRANCHES', '(REL)',
+print('%-40s %8s %6s  %-16s %14s %8s %6s' % ('HEURISTICS', 'BRANCHES', '(REL)',
   'HITRATE', 'COVERAGE', 'COVERAGE', '(REL)'))
 for (k, v) in sorted(self.heuristics.items(), key = sorter):
-print('%-36s %8i %5.1f%% %6.2f%% / %6.2f%% %14i %8s %5.1f%%' %
+print('%-40s %8i %5.1f%% %6.2f%% / %6.2f%% %14i %8s %5.1f%%' %
 (k, v.branches, percentage(v.branches, self.branches_max ()),
  percentage(v.hits, v.count), percentage(v.fits, v.count),
  v.count, v.count_formatted(), percentage(v.count, self.count_max()) ))
diff --git a/gcc/predict.def b/gcc/predict.def
index a0d0ba9..d3bc757 100644
--- a/gcc/predict.def
+++ b/gcc/predict.def
@@ -89,7 +89,7 @@ DEF_PREDICTOR (PRED_COLD_FUNCTION, "cold function call", PROB_VERY_LIKELY,
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Edge causing loop to terminate is probably not taken.  */
-DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (92),
+DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (85),
 	   PRED_FLAG_FIRST_MATCH)
 
 /* Edge causing loop to terminate by computing value used by later
diff --git a/gcc/testsuite/gcc.dg/predict-9.c b/gcc/testsuite/gcc.dg/predict-9.c
index a613961..196e31c 100644
--- a/gcc/testsuite/gcc.dg/predict-9.c
+++ b/gcc/testsuite/gcc.dg/predict-9.c
@@ -19,5 +19,5 @@ void foo (int base)
   }
 }
 
-/* { dg-final { scan-tree-dump-times "first match heuristics: 2.0%" 3 "profile_estimate"} } */
-/* { dg-final { scan-tree-dump-times "first match heuristics: 4.0%" 1 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "first match heuristics: 3.0%" 3 "profile_estimate"} } */
+/* { dg-final { scan-tree-dump-times "first match heuristics: 7.5%" 1 "profile_estimate"} } */
-- 
2.8.3



Re: [ARM][testsuite] Make arm_neon_fp16 depend on arm_neon_ok

2016-06-17 Thread Kyrill Tkachov

Hi Christophe,

On 17/06/16 11:47, Christophe Lyon wrote:

Hi,

As discussed some time ago with Kyrylo (on IRC IIRC), the attached
patch makes sure that arm_neon_fp16_ok and arm_neonv2_ok effective
targets imply that arm_neon_ok passes, and use the corresponding
flags.

Without this patch, the 3 effective targets have different, possibly
inconsistent conditions. For instance, arm_neon_ok make sure that
__ARM_ARCH >= 7, but arm_neon_fp16_ok does not.

This led to failures on configurations not supporting neon, but where
arm_neon_fp16_ok passes as the test is less strict.
Rather than duplicating the same tests, I preferred to call
arm_neon_ok from the other places.

We then use the union of flags needed for arm_neon_ok and
arm_neon_fp16_ok to pass.

Tested on many arm configurations with no harm. It prevents
arm_neon_fp16 tests from passing when forcing -march=armv5t, that
seems coherent.

OK?


Ok with a ChangeLog nit below.
Thanks,
Kyrill


Christophe


2016-06-17  Christophe Lyon

* lib/target-supports.exp
(check_effective_target_arm_neon_fp16_ok_nocache): Call
arm_neon_ok and merge flags. Fix temporary test name.


I believe the rule in ChangeLogs is also to use two spaces after a full stop.
So two spaces before "Fix temporary..."



Re: [Patch AArch64] Fixup to fcvt patterns added in r237200

2016-06-17 Thread Christophe Lyon
On 10 June 2016 at 14:29, James Greenhalgh  wrote:
>
> Hi,
>
> My autotester picked up some issues with the vcvt{ds}_n_* intrinsics
> added in r237200.
>
Hi,

What tests does your autotester perform? I haven't noticed these
problems when running the GCC testsuite on the usual aarch64
targets. I'm interested in increasing coverage, if doable.

Thanks

Christophe

> The iterators in this pattern do not resolve, as they have not been
> explicitly tied to the mode iterator (rather than the code iterator)
> used by the pattern.
>
> This fixup adds the attribute tags, allowing the patterns to work
> correctly.
>
> Additionally, the types assigned to these instructions were wrong, and
> would permit the immediate operand to be in a register. This will then
> develop in to an ICE as the patterns require an immediate operand, and so
> won't match. The ICE can be exposed by writing a wrapping function around
> the vcvtd_n_* intrinsics, which forces the immediate operand to a register.
> We have the infrastructure to error to the user rather than ICEing, but it
> needs some different types, which this patch adds.
>
> I've checked this with an aarch64-none-elf test run, and run it through
> several rounds of my autotester for aarch64-none-elf and
> aarch64_be-none-elf.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2016-06-10  James Greenhalgh  
>
> * config/aarch64/aarch64.md
> (3): Add attributes to
> iterators.
> (3): Likewise.  Correct
> attributes.
> * config/aarch64/aarch64-builtins.c
> (aarch64_types_binop_uss_qualifiers): Delete.
> (TYPES_BINOP_USS): Likewise.
> (aarch64_types_binop_sus_qualifiers): Likewise.
> (TYPES_BINOP_SUS): Likewise.
> (aarch64_types_fcvt_from_unsigned_qualifiers): New.
> (TYPES_FCVTIMM_SUS): Likewise.
> * config/aarch64/aarch64-simd-builtins.def (scvtf): Use SHIFTIMM
> rather than BINOP.
> (ucvtf): Use FCVTIMM_SUS rather than BINOP_SUS.
> (fcvtzs): Use SHIFTIMM rather than BINOP.
> (fcvtzu): Use SHIFTIMM_USS rather than BINOP_USS.
>


Re: move increase_alignment from simple to regular ipa pass

2016-06-17 Thread Prathamesh Kulkarni
On 14 June 2016 at 18:31, Prathamesh Kulkarni
 wrote:
> On 13 June 2016 at 16:13, Jan Hubicka  wrote:
>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>> index ecafe63..41ac408 100644
>>> --- a/gcc/cgraph.h
>>> +++ b/gcc/cgraph.h
>>> @@ -1874,6 +1874,9 @@ public:
>>>   if we did not do any inter-procedural code movement.  */
>>>unsigned used_by_single_function : 1;
>>>
>>> +  /* Set if -fsection-anchors is set.  */
>>> +  unsigned section_anchor : 1;
>>> +
>>>  private:
>>>/* Assemble thunks and aliases associated to varpool node.  */
>>>void assemble_aliases (void);
>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>>> index 4bfcad7..e75d5c0 100644
>>> --- a/gcc/cgraphunit.c
>>> +++ b/gcc/cgraphunit.c
>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl)
>>>   it is available to notice_global_symbol.  */
>>>node->definition = true;
>>>notice_global_symbol (decl);
>>> +
>>> +  node->section_anchor = flag_section_anchors;
>>> +
>>>if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl)
>>>/* Traditionally we do not eliminate static variables when not
>>>optimizing and when not doing toplevel reoder.  */
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index f0d7196..e497795 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -1590,6 +1590,10 @@ fira-algorithm=
>>>  Common Joined RejectNegative Enum(ira_algorithm) Var(flag_ira_algorithm) 
>>> Init(IRA_ALGORITHM_CB) Optimization
>>>  -fira-algorithm=[CB|priority] Set the used IRA algorithm.
>>>
>>> +fipa-increase_alignment
>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization
>>> +Option to gate increase_alignment ipa pass.
>>> +
>>>  Enum
>>>  Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA 
>>> algorithm %qs)
>>>
>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
>>> Init(1) Optimization
>>>  Enable the dependent count heuristic in the scheduler.
>>>
>>>  fsection-anchors
>>> -Common Report Var(flag_section_anchors) Optimization
>>> +Common Report Var(flag_section_anchors)
>>>  Access data in the same section from shared anchor points.
>>>
>>>  fsee
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index a0db3a4..1482566 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void)
>>>
>>>aarch64_register_fma_steering ();
>>>
>>> +  /* Enable increase_alignment pass.  */
>>> +  flag_ipa_increase_alignment = 1;
>>
>> I would rather enable it always on targets that do support anchors.
> AFAIK aarch64 supports section anchors.
>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
>>> index ce9e146..7f09f3a 100644
>>> --- a/gcc/lto/lto-symtab.c
>>> +++ b/gcc/lto/lto-symtab.c
>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, symtab_node 
>>> *entry)
>>>   The type compatibility checks or the completing of types has properly
>>>   dealt with most issues.  */
>>>
>>> +  /* ??? is this assert necessary ?  */
>>> +  varpool_node *v_prevailing = dyn_cast (prevailing);
>>> +  varpool_node *v_entry = dyn_cast (entry);
>>> +  gcc_assert (v_prevailing && v_entry);
>>> +  /* section_anchor of prevailing_decl wins.  */
>>> +  v_entry->section_anchor = v_prevailing->section_anchor;
>>> +
>> Other flags are merged in lto_varpool_replace_node so please move this there.
> Ah indeed, thanks for the pointers.
> I wonder though if we need to set
> prevailing_node->section_anchor = vnode->section_anchor ?
> IIUC, the function merges flags from vnode into prevailing_node
> and removes vnode. However we want prevailing_node->section_anchor
> to always take precedence.
>>> +/* Return true if alignment should be increased for this vnode.
>>> +   This is done if every function that references/referring to vnode
>>> +   has flag_tree_loop_vectorize set.  */
>>> +
>>> +static bool
>>> +increase_alignment_p (varpool_node *vnode)
>>> +{
>>> +  ipa_ref *ref;
>>> +
>>> +  for (int i = 0; vnode->iterate_reference (i, ref); i++)
>>> +if (cgraph_node *cnode = dyn_cast (ref->referred))
>>> +  {
>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl);
>>> + if (!opts->x_flag_tree_loop_vectorize)
>>> +   return false;
>>> +  }
>>
>> If you take address of function that has vectorizer enabled probably doesn't
>> imply need to increase alignment of that var. So please drop the loop.
>>
>> You only want function that read/writes or takes address of the symbol. But
>> onthe other hand, you need to walk all aliases of the symbol by
>> call_for_symbol_and_aliases
>>> +
>>> +  for (int i = 0; vnode->iterate_referring (i, ref); i++)
>>> +if (cgraph_node *cnode = dyn_cast (ref->referring))
>>> +  {
>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl);
>>> + if (!opts->x_flag_tree_loop_vectorize)
>>> +   return false;
>>> +  }
>>> +
>>> +  return true;
>>> +}

[patch] allow --target=e500v[12]-* in configure

2016-06-17 Thread Jérôme Lambourg
Hello,

An initial patch has been integrated into gnu-config to translate triplets like
e500v2-*- into powerpc-*-spe.

The spe extension to the os is expected for targets such as e500v[12]-*-linux
(translated as powerpc-*-linux-gnuspe) or eabi targets.

This patch integrates the patch of gnu-config (config.sub), and takes care of
the vxworks case, as well as properly set the default value for --with-cpu.

I checked that this works with the following targets:
* e500v2-wrs-vxworks
* e500v2-gnu-linux

Thanks in advance for your feedback,

- Jérôme

Author: Jerome Lambourg 
Date:   Tue Jun 14 10:57:06 2016 +0200

P614-008: support e500v[12] configuration as PPC with cpu 854[08]

toplevel/
* config.sub: merge with gnu-config trunk. Accept e500v[12] cpu names, and
canonicalize to powerpc, and add a "spe" suffix to the os name.
* gcc/config.gcc: determine with_cpu from the non canonical target name, and
make sure the powerpc-wrs-vxworks*spe is properly handled.
* libgcc/config.host: accept vxworks*spe when configuring libgcc.



gcc-e500v12-config.diff
Description: Binary data


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Bernd Schmidt

On 06/17/2016 04:03 PM, Michael Matz wrote:

But does this really improve something?  Essentially you're replacing

   0xc9 0xc3 

(the end of a function containing "leave;ret") with

   0xe9  

where the four random bytes are different for each rewritten function
return (but correlated as they differ exactly by their position
difference).

I'm not sure why the latter sequence is better?


I think I'm missing what you're trying to say. The latter sequence does 
not contain a return opcode hence it ought to be better?



Bernd


Re: [PATCH, vec-tails 03/10] Support epilogues vectorization with no masking

2016-06-17 Thread Ilya Enkovich
2016-06-16 8:22 GMT+03:00 Jeff Law :
> On 06/15/2016 05:03 AM, Richard Biener wrote:
>>
>> On Thu, May 19, 2016 at 9:39 PM, Ilya Enkovich
>>  wrote:
>>>
>>> Hi,
>>>
>>> This patch introduces changes required to run vectorizer on loop
>>> epilogue. This also enables epilogue vectorization using a vector
>>> of smaller size.
>>
>>
>> While the idea of epilogue vectorization sounds straight-forward the
>> implementation is somewhat icky with all the ->aux stuff, "redundant"
>> if-conversion and loop iteration stuff.
>>
>> So I was thinking of when epilogue vectorization is beneficial which
>> is obviously when the overall loop trip count is low.  We are not
>> good in optimizing for that case generally (too much peeling for
>> alignment, using expensive avx256 vectorization, etc.), so I wonder
>> if versioning for that case would be a better idea
>> (performance-wise).
>>
>> Thus - what cases were you looking at when deciding that vectorizing
>> the epilogue (with a smaller vector size) is profitable?  Do other
>> compilers generally do this?
>
> I would think it's better stated that the relative benefits of vectorizing
> the epilogue are greater the shorter the loop, but that's nit-picking the
> discussion.
>
> I do think you've got a legitimate question though.   Ilya, can you give any
> insights here based on your KNL and Haswell testing or data/insights from
> the LLVM and/or ICC teams?

I have no information about LLVM.  As I said in other thread ICC uses all
options (masked epilogue, combined loop, vectorized epilogue with smaller
vector size).  It also may generate different versions (e.g. combined and
with masked epilogue) and choose dynamically depending on iterations count.

Thanks,
Ilya

>
> Jeff


Re: [openacc] clean up acc directive matching in fortran

2016-06-17 Thread Jakub Jelinek
On Fri, Jun 17, 2016 at 10:40:40AM +0200, Tobias Burnus wrote:
> Cesar Philippidis wrote:
> > On 06/16/2016 08:30 PM, Cesar Philippidis wrote:
> > > This patch introduces a match_acc function to the fortran FE. It's
> > > almost identical to match_omp, but it passes openacc = true to
> > > gfc_match_omp_clauses. I supposed I could have consolidated those two
> > > functions, but they are reasonably simple so I left them separate. Maybe
> > > a follow up patch can consolidate them. I was able to eliminate a lot of
> > > duplicate code with this function.
> > > 
> > > Is this ok for trunk and gcc-6?
> 
> > And here's the patch.
> 
> The patch seems to be reverse. If I regard the "-" lines as additions
> and the "+" lines as deletions, it makes sense and is in line with
> the ChangeLog and what you wrote above.
> 
> Otherwise, it looks good to me.

Yeah, patch -R + commit is ok with me.

Jakub


[committed] Further OpenMP C++ mapping of struct elements with reference to struct as base fixes

2016-06-17 Thread Jakub Jelinek
On Thu, Jun 16, 2016 at 09:05:40PM +0200, Jakub Jelinek wrote:
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, and
> tested with x86_64-intelmicemul-linux offloading on x86_64-linux, committed
> to trunk.

And the following testcase shows similar issues in the array section
handling path.
Tested on x86_64-linux and i686-linux, plus with intelmicemul offloading,
committed to trunk.

2016-06-17  Jakub Jelinek  

* semantics.c (handle_omp_array_sections_1): Don't ICE when
processing_template_decl when checking for bitfields and unions.
Look through REFERENCE_REF_P as base of COMPONENT_REF.
(finish_omp_clauses): Look through REFERENCE_REF_P even for
array sections with COMPONENT_REF bases.

* testsuite/libgomp.c++/target-21.C: New test.

--- gcc/cp/semantics.c.jj   2016-06-16 17:29:53.0 +0200
+++ gcc/cp/semantics.c  2016-06-17 14:34:21.929203440 +0200
@@ -4487,7 +4487,8 @@ handle_omp_array_sections_1 (tree c, tre
  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FROM)
  && !type_dependent_expression_p (t))
{
- if (DECL_BIT_FIELD (TREE_OPERAND (t, 1)))
+ if (TREE_CODE (TREE_OPERAND (t, 1)) == FIELD_DECL
+ && DECL_BIT_FIELD (TREE_OPERAND (t, 1)))
{
  error_at (OMP_CLAUSE_LOCATION (c),
"bit-field %qE in %qs clause",
@@ -4496,7 +4497,8 @@ handle_omp_array_sections_1 (tree c, tre
}
  while (TREE_CODE (t) == COMPONENT_REF)
{
- if (TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 0))) == UNION_TYPE)
+ if (TREE_TYPE (TREE_OPERAND (t, 0))
+ && TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 0))) == UNION_TYPE)
{
  error_at (OMP_CLAUSE_LOCATION (c),
"%qE is a member of a union", t);
@@ -4504,6 +4506,8 @@ handle_omp_array_sections_1 (tree c, tre
}
  t = TREE_OPERAND (t, 0);
}
+ if (REFERENCE_REF_P (t))
+   t = TREE_OPERAND (t, 0);
}
   if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
{
@@ -6623,6 +6627,8 @@ finish_omp_clauses (tree clauses, enum c
{
  while (TREE_CODE (t) == COMPONENT_REF)
t = TREE_OPERAND (t, 0);
+ if (REFERENCE_REF_P (t))
+   t = TREE_OPERAND (t, 0);
  if (bitmap_bit_p (&map_field_head, DECL_UID (t)))
break;
  if (bitmap_bit_p (&map_head, DECL_UID (t)))
--- libgomp/testsuite/libgomp.c++/target-21.C.jj2016-06-17 
13:18:59.684314656 +0200
+++ libgomp/testsuite/libgomp.c++/target-21.C   2016-06-17 15:10:21.860516966 
+0200
@@ -0,0 +1,173 @@
+extern "C" void abort ();
+struct T { char t[270]; };
+struct S { int (&x)[10]; int *&y; T t; int &z; S (); ~S (); };
+
+template 
+void
+foo (S s)
+{
+  int err;
+  #pragma omp target map (s.x[0:N], s.y[0:N]) map (s.t.t[16:3]) map (from: err)
+  {
+err = s.x[2] != 28 || s.y[2] != 37 || s.t.t[17] != 81;
+s.x[2]++;
+s.y[2]++;
+s.t.t[17]++;
+  }
+  if (err || s.x[2] != 29 || s.y[2] != 38 || s.t.t[17] != 82)
+abort ();
+}
+
+template 
+void
+bar (S s)
+{
+  int err;
+  #pragma omp target map (s.x, s.z)map(from:err)
+  {
+err = s.x[2] != 29 || s.z != 6;
+s.x[2]++;
+s.z++;
+  }
+  if (err || s.x[2] != 30 || s.z != 7)
+abort ();
+}
+
+template 
+void
+foo2 (S &s)
+{
+  int err;
+  #pragma omp target map (s.x[N:10], s.y[N:10]) map (from: err) map 
(s.t.t[N+16:N+3])
+  {
+err = s.x[2] != 30 || s.y[2] != 38 || s.t.t[17] != 81;
+s.x[2]++;
+s.y[2]++;
+s.t.t[17]++;
+  }
+  if (err || s.x[2] != 31 || s.y[2] != 39 || s.t.t[17] != 82)
+abort ();
+}
+
+template 
+void
+bar2 (S &s)
+{
+  int err;
+  #pragma omp target map (s.x, s.z)map(from:err)
+  {
+err = s.x[2] != 31 || s.z != 7;
+s.x[2]++;
+s.z++;
+  }
+  if (err || s.x[2] != 32 || s.z != 8)
+abort ();
+}
+
+template 
+void
+foo3 (U s)
+{
+  int err;
+  #pragma omp target map (s.x[0:10], s.y[0:10]) map (from: err) map 
(s.t.t[16:3])
+  {
+err = s.x[2] != 32 || s.y[2] != 39 || s.t.t[17] != 82;
+s.x[2]++;
+s.y[2]++;
+s.t.t[17]++;
+  }
+  if (err || s.x[2] != 33 || s.y[2] != 40 || s.t.t[17] != 83)
+abort ();
+}
+
+template 
+void
+bar3 (U s)
+{
+  int err;
+  #pragma omp target map (s.x, s.z)map(from:err)
+  {
+err = s.x[2] != 33 || s.z != 8;
+s.x[2]++;
+s.z++;
+  }
+  if (err || s.x[2] != 34 || s.z != 9)
+abort ();
+}
+
+template 
+void
+foo4 (U &s)
+{
+  int err;
+  #pragma omp target map (s.x[0:10], s.y[0:10]) map (from: err) map 
(s.t.t[16:3])
+  {
+err = s.x[2] != 34 || s.y[2] != 40 || s.t.t[17] != 82;
+s.x[2]++;
+s.y[2]++;
+s.t.t[17]++;
+  }
+  if (err || s.x[2] != 35 || s.y[2] != 41 || s.t.t[17] != 83)
+abort ();
+}
+
+template 
+void
+bar4 (U &s)
+{
+  int err;
+  #pragma 

Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Michael Matz
Hi,

On Fri, 17 Jun 2016, Bernd Schmidt wrote:

> This is another step to flesh out -mmitigate-rop for i386 a little more. 
> The basic idea was (I think) Richard Henderson's: if we could arrange to 
> have every return preceded by a leave instruction, it would make it 
> harder to construct an attack since it takes away a certain amount of 
> control over the stack pointer. I extended this to move the leave/ret 
> pair to libgcc, preceded by a sequence of nops, so as to take away the 
> possibility of jumping into the middle of an instruction preceding the 
> leave/ret pair and thereby skipping the leave.

But does this really improve something?  Essentially you're replacing

   0xc9 0xc3 

(the end of a function containing "leave;ret") with

   0xe9  

where the four random bytes are different for each rewritten function 
return (but correlated as they differ exactly by their position 
difference).

I'm not sure why the latter sequence is better?


Ciao,
Michael.


[PATCH] Fix memory leak in tree-ssa-reassoc.c

2016-06-17 Thread Martin Liška
Hi.

Following simple patch fixes a newly introduced memory leak.

Patch survives regression tests and bootstraps on x86_64-linux.

Ready from trunk?
Thanks,
Martin
>From a2e6be16d7079b744db4d383b8317226ab53ff58 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 17 Jun 2016 12:26:58 +0200
Subject: [PATCH] Fix memory leak in tree-ssa-reassoc.c

gcc/ChangeLog:

2016-06-17  Martin Liska  

	* tree-ssa-reassoc.c (transform_add_to_multiply): Use auto_vec.
---
 gcc/tree-ssa-reassoc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index e32d503..cdfe06f 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1807,7 +1807,7 @@ transform_add_to_multiply (vec *ops)
   tree op = NULL_TREE;
   int j;
   int i, start = -1, end = 0, count = 0;
-  vec > indxs = vNULL;
+  auto_vec > indxs;
   bool changed = false;
 
   if (!INTEGRAL_TYPE_P (TREE_TYPE ((*ops)[0]->op))
-- 
2.8.3



Re: [PATCH] Add port for Phoenix-RTOS on ARM platform.

2016-06-17 Thread Jakub Sejdak
> So at least in the immediate term let's get you write privileges so you can
> commit approved changes and on the path towards maintaining the Phoenix-RTOS
> configurations.

Do I have to apply for this permission somewhere? Provided page states
only, that it has to be granted by an existing maintainer.

2016-06-16 18:28 GMT+02:00 Jeff Law :
> On 06/16/2016 02:59 AM, Jakub Sejdak wrote:
>>
>> Actually, if possible, I would skip the "arm" part, because we plan to
>> port Phoenix-RTOS for other platforms. It will be easier to do it
>> once.
>
> Generally we prefer to see an ongoing commitment to the GCC project along
> with regular high quality contributions to appoint maintainers.
>
> So at least in the immediate term let's get you write privileges so you can
> commit approved changes and on the path towards maintaining the Phoenix-RTOS
> configurations.
>
> https://www.gnu.org/software/gcc/svnwrite.html
>
> jeff
>



-- 
Jakub Sejdak
Software Engineer
Phoenix Systems (www.phoesys.com)
+48 608 050 163


Re: Fix loop size estimate in tree-ssa-loop-ivcanon

2016-06-17 Thread Christophe Lyon
On 16 June 2016 at 14:56, Jan Hubicka  wrote:
> Hi,
> tree_estimate_loop_size contains one extra else that prevents it from 
> determining
> that the induction variable comparsion is going to be eliminated in both the 
> peeled
> copies as well as the last copy.  This patch fixes it
> (it really removes one else, but need to reformat the conditional)
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> Honza
>
> * g++.dg/vect/pr36648.cc: Disable cunrolli
> * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Fix estimation
> of comparsions in the last iteration.

Hi,

This patch makes
FAIL: gcc.target/arm/unsigned-extend-2.c scan-assembler ands
on arm-none-linux-gnueabi --with-cpu=cortex-a9

Christophe




> Index: testsuite/g++.dg/vect/pr36648.cc
> ===
> --- testsuite/g++.dg/vect/pr36648.cc(revision 237477)
> +++ testsuite/g++.dg/vect/pr36648.cc(working copy)
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_float } */
> +// { dg-additional-options "-fdisable-tree-cunrolli" }
>
>  struct vector
>  {
> Index: tree-ssa-loop-ivcanon.c
> ===
> --- tree-ssa-loop-ivcanon.c (revision 237477)
> +++ tree-ssa-loop-ivcanon.c (working copy)
> @@ -255,69 +255,73 @@ tree_estimate_loop_size (struct loop *lo
>
>   /* Look for reasons why we might optimize this stmt away. */
>
> - if (gimple_has_side_effects (stmt))
> -   ;
> - /* Exit conditional.  */
> - else if (exit && body[i] == exit->src
> -  && stmt == last_stmt (exit->src))
> + if (!gimple_has_side_effects (stmt))
> {
> - if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "   Exit condition will be eliminated "
> -"in peeled copies.\n");
> - likely_eliminated_peeled = true;
> -   }
> - else if (edge_to_cancel && body[i] == edge_to_cancel->src
> -  && stmt == last_stmt (edge_to_cancel->src))
> -   {
> - if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "   Exit condition will be eliminated "
> -"in last copy.\n");
> - likely_eliminated_last = true;
> -   }
> - /* Sets of IV variables  */
> - else if (gimple_code (stmt) == GIMPLE_ASSIGN
> - && constant_after_peeling (gimple_assign_lhs (stmt), stmt, 
> loop))
> -   {
> - if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "   Induction variable computation will"
> -" be folded away.\n");
> - likely_eliminated = true;
> -   }
> - /* Assignments of IV variables.  */
> - else if (gimple_code (stmt) == GIMPLE_ASSIGN
> -  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
> -  && constant_after_peeling (gimple_assign_rhs1 (stmt), stmt,
> - loop)
> -  && (gimple_assign_rhs_class (stmt) != GIMPLE_BINARY_RHS
> -  || constant_after_peeling (gimple_assign_rhs2 (stmt),
> - stmt, loop)))
> -   {
> - size->constant_iv = true;
> - if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file,
> -"   Constant expression will be folded away.\n");
> - likely_eliminated = true;
> -   }
> - /* Conditionals.  */
> - else if ((gimple_code (stmt) == GIMPLE_COND
> -   && constant_after_peeling (gimple_cond_lhs (stmt), stmt,
> -  loop)
> -   && constant_after_peeling (gimple_cond_rhs (stmt), stmt,
> -  loop)
> -   /* We don't simplify all constant compares so make sure
> -  they are not both constant already.  See PR70288.  */
> -   && (! is_gimple_min_invariant (gimple_cond_lhs (stmt))
> -   || ! is_gimple_min_invariant (gimple_cond_rhs 
> (stmt
> -  || (gimple_code (stmt) == GIMPLE_SWITCH
> -  && constant_after_peeling (gimple_switch_index (
> -   as_a  (stmt)),
> + /* Exit conditional.  */
> + if (exit && body[i] == exit->src
> + && stmt == last_stmt (exit->src))
> +   {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "   Exit condition will be eliminated 
> "
> +"in peeled copies.\n");
> + likely_eliminated_peeled = true;
> +   }
> +  

Re: [Patch, avr] Fix PR 71151

2016-06-17 Thread Georg-Johann Lay

Senthil Kumar Selvaraj schrieb:

Hi,

  This patch fixes PR 71151 by eliminating the
  TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
  JUMP_TABLES_IN_TEXT_SECTION to 1.

  As described in the bugzilla entry, this hook assumed it will get
  called only for jumptable rodata for functions. This was true until
  6.1, when a commit in varasm.c started calling the hook for mergeable
  string/constant data as well.

  This resulted in string constants ending up in a section intended for
  jumptables (flash), and broke code using those constants, which
  expects them to be present in rodata (SRAM).

  Given that the original reason for placing jumptables in a section was
  fixed by Johann in PR 63323, this patch restores the original
  behavior. Reg testing on both gcc-6-branch and trunk showed no regressions.

  As pointed out by Johann, this may end up increasing code
  size if there are lots of branches that cross the jump tables. I
  intend to propose a separate patch that gives additional information
  to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
  what type of function rodata is coming on. Johann also suggested
  handling jump table generation ourselves - I'll experiment with that
  some more.

  If ok, could someone commit please? Could you also backport to
  gcc-6-branch?

Regards
Senthil

gcc/ChangeLog

2016-06-03  Senthil Kumar Selvaraj  



Missing PR target/71151


* config/avr/avr.c (avr_asm_function_rodata_section): Remove.
* config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.

gcc/testsuite/ChangeLog

2016-06-03  Senthil Kumar Selvaraj  



Missing PR target/71151


* gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
* gcc/testsuite/gcc.target/avr/pr71151-2.c: New.



With the PR entry in the ChangeLog / commit message it might be easier 
to find the change, and the respective bugzilla PR will get an automatic 
entry pointing to the commit.


Thanks,  Johann


[C++ Patch] One more error + error to error + inform

2016-06-17 Thread Paolo Carlini

Hi,

one more I missed. Tested x86_64-linux. Should be obvious too...

Thanks,
Paolo.

PS: I still have pending: 
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01116.html


//
/cp
2016-06-17  Paolo Carlini  

* decl.c (grokfndecl): Change pair of errors to error + inform.

/testsuite
2016-06-17  Paolo Carlini  

* g++.dg/cpp0x/defaulted31.C: Adjust for dg-message vs dg-error.
Index: cp/decl.c
===
--- cp/decl.c   (revision 237547)
+++ cp/decl.c   (working copy)
@@ -8295,7 +8295,8 @@ grokfndecl (tree ctype,
  else if (DECL_DEFAULTED_FN (old_decl))
{
  error ("definition of explicitly-defaulted %q+D", decl);
- error ("%q+#D explicitly defaulted here", old_decl);
+ inform (DECL_SOURCE_LOCATION (old_decl),
+ "%q#D explicitly defaulted here", old_decl);
  return NULL_TREE;
}
 
Index: testsuite/g++.dg/cpp0x/defaulted31.C
===
--- testsuite/g++.dg/cpp0x/defaulted31.C(revision 237547)
+++ testsuite/g++.dg/cpp0x/defaulted31.C(working copy)
@@ -4,7 +4,7 @@
 struct A
 {
   A() { }  // { dg-message "defined" }
-  ~A() = default;  // { dg-error "defaulted" }
+  ~A() = default;  // { dg-message "defaulted" }
 };
 
 A::A() = default;  // { dg-error "redefinition" }


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Bernd Schmidt

On 06/17/2016 12:37 PM, Jakub Jelinek wrote:


Do you really need to require frame pointer for this?
I mean, couldn't you instead use what you do if a function needs frame
pointer and otherwise just replace the original ret with
pushq   %rbp
movq%rsp, %rbp
jmp __rop_ret
?  Or would that defeat the purpose of the mitigation?


Yes, kind of, because then you can jump into code before this little 
sequence and the whole pushq/movq/jmp/leave/ret would just behave like a 
normal ret. This is admittedly a concern for smaller functions that look 
a lot like this; maybe we need to pad function entry points as well.



As for __rop_ret, if you are non-PLT jmp to it, I bet it must be in the same
executable or shared library as the code branching to it, so should be
.hidden.  Is libgcc.a really the best place for it though?


I declare myself agnostic.


Looking at nop; nop; 1: jmp 1b; leave; ret
if you branch into the middle of the jmp insn (0x3 below), there is:
   0:   90  nop
   1:   90  nop
   2:   eb fe   jmp0x2
   4:   c9  leaveq
   5:   c3  retq
and thus:
   3:   fe c9   dec%cl
   5:   c3  retq
and thus if you don't mind decreasing %cl, you still have retq without leave
before it.  But I very likely just don't understand the ROP threat stuff
enough.


You'd also have to find useful code before this sequence, and in any 
case it's just a single ret where we used to have many. But maybe 
there's a one-byte trap that could be used instead.



Bernd


[ARM][testsuite] Make arm_neon_fp16 depend on arm_neon_ok

2016-06-17 Thread Christophe Lyon
Hi,

As discussed some time ago with Kyrylo (on IRC IIRC), the attached
patch makes sure that arm_neon_fp16_ok and arm_neonv2_ok effective
targets imply that arm_neon_ok passes, and use the corresponding
flags.

Without this patch, the 3 effective targets have different, possibly
inconsistent conditions. For instance, arm_neon_ok make sure that
__ARM_ARCH >= 7, but arm_neon_fp16_ok does not.

This led to failures on configurations not supporting neon, but where
arm_neon_fp16_ok passes as the test is less strict.
Rather than duplicating the same tests, I preferred to call
arm_neon_ok from the other places.

We then use the union of flags needed for arm_neon_ok and
arm_neon_fp16_ok to pass.

Tested on many arm configurations with no harm. It prevents
arm_neon_fp16 tests from passing when forcing -march=armv5t, that
seems coherent.

OK?

Christophe
gcc/testsuite/ChangeLog:

2016-06-17  Christophe Lyon  

* lib/target-supports.exp
(check_effective_target_arm_neon_fp16_ok_nocache): Call
arm_neon_ok and merge flags. Fix temporary test name.
(check_effective_target_arm_neonv2_ok_nocache): Call arm_neon_ok
and merge flags.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f4cb276..bbb5343 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2990,23 +2990,25 @@ proc check_effective_target_arm_crc_ok { } {
 
 proc check_effective_target_arm_neon_fp16_ok_nocache { } {
 global et_arm_neon_fp16_flags
+global et_arm_neon_flags
 set et_arm_neon_fp16_flags ""
-if { [check_effective_target_arm32] } {
+if { [check_effective_target_arm32]
+&& [check_effective_target_arm_neon_ok] } {
foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
   "-mfpu=neon-fp16 -mfloat-abi=softfp"
   "-mfp16-format=ieee"
   "-mfloat-abi=softfp -mfp16-format=ieee"
   "-mfpu=neon-fp16 -mfp16-format=ieee"
   "-mfpu=neon-fp16 -mfloat-abi=softfp -mfp16-format=ieee"} 
{
-   if { [check_no_compiler_messages_nocache arm_neon_fp_16_ok object {
+   if { [check_no_compiler_messages_nocache arm_neon_fp16_ok object {
#include "arm_neon.h"
float16x4_t
foo (float32x4_t arg)
{
   return vcvt_f16_f32 (arg);
}
-   } "$flags"] } {
-   set et_arm_neon_fp16_flags $flags
+   } "$et_arm_neon_flags $flags"] } {
+   set et_arm_neon_fp16_flags [concat $et_arm_neon_flags $flags]
return 1
}
}
@@ -3085,8 +3087,10 @@ proc check_effective_target_arm_v8_neon_ok { } {
 
 proc check_effective_target_arm_neonv2_ok_nocache { } {
 global et_arm_neonv2_flags
+global et_arm_neon_flags
 set et_arm_neonv2_flags ""
-if { [check_effective_target_arm32] } {
+if { [check_effective_target_arm32]
+&& [check_effective_target_arm_neon_ok] } {
foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-vfpv4" 
"-mfpu=neon-vfpv4 -mfloat-abi=softfp"} {
if { [check_no_compiler_messages_nocache arm_neonv2_ok object {
#include "arm_neon.h"
@@ -3095,8 +3099,8 @@ proc check_effective_target_arm_neonv2_ok_nocache { } {
 {
   return vfma_f32 (a, b, c);
 }
-   } "$flags"] } {
-   set et_arm_neonv2_flags $flags
+   } "$et_arm_neon_flags $flags"] } {
+   set et_arm_neonv2_flags [concat $et_arm_neon_flags $flags]
return 1
}
}


Re: [PATCH, vec-tails 01/10] New compiler options

2016-06-17 Thread Ilya Enkovich
2016-06-16 8:06 GMT+03:00 Jeff Law :
> On 05/20/2016 05:40 AM, Ilya Enkovich wrote:
>>
>> 2016-05-20 14:17 GMT+03:00 Richard Biener :
>>>
>>> On Fri, May 20, 2016 at 11:50 AM, Ilya Enkovich 
>>> wrote:

 2016-05-20 12:26 GMT+03:00 Richard Biener :
>
> On Thu, May 19, 2016 at 9:36 PM, Ilya Enkovich 
> wrote:
>>
>> Hi,
>>
>> This patch introduces new options used for loop epilogues
>> vectorization.
>
>
> Why's that?  This is a bit too much for the casual user and if it is
> really necessary
> to control this via options then it is not fine-grained enough.
>
> Why doesn't the vectorizer/backend have enough info to decide this
> itself?


 I don't expect casual user to decide which modes to choose.  These
 controls are
 added for debugging and performance measurement purposes.  I see now I
 miss
 -ftree-vectorize-epilogues aliased to -ftree-vectorize-epilogues=all.
 Surely
 I expect epilogues and short loops vectorization be enabled by default
 on -O3
 or by -ftree-vectorize-loops.
>>>
>>>
>>> Can you make all these --params then?  I think to be useful to users we'd
>>> want
>>> them to be loop pragmas rather than options.
>>
>>
>> OK, I'll change it to params.  I didn't think about control via
>> pragmas but will do now.
>
> So the questions I'd like to see answered:
>
> 1. You've got 3 modes for epilogue vectorization.  Is this an artifact of
> not really having good heuristics yet for which mode to apply to a
> particular loop at this time?
>
> 2. Similarly for cost models.

All three modes are profitable in different situations.  Profitable mode depends
on a loop structure and target capabilities.  Ultimate goal is to have all three
modes enabled by default.  I can't state current heuristics are good enough
for all cases and targets and therefore don't enable epilogues vectorization
by default for now.  This is to be measured, analyzed and tuned in
time for GCC 7.1.

I add cost model simply to have an ability to force epilogue vectorization for
stability testing (force some mode of epilogue vectorization and check nothing
fails) and performance testing/tuning (try to find cases where we may benefit
from epilogue vectorization but don't due to bad cost model).  Also I don't
want to force epilogue vectorization for all loops for which vectorization is
forced using unlimited cost model because that may hurt performance for
simd loops.

>
> In the cover message you indicated you were getting expected gains of KNL,
> but not on Haswell.  Do you have any sense yet why you're not getting good
> resuls on Haswell yet?  For KNL are you getting those speedups with a
> generic set of options or are those with a custom set of options to set the
> mode & cost models?

Currently I have numbers collected on various suites for KNL machine.  Masking
mode (-ftree-vectorize-epilogues=mask) shows not bad results (dynamic
cost model,
-Ofast -flto -funroll-loops).  I don't see significant losses and there are few
significant gains.  For combine and nomask modes the result is not good enough
yet - there are several significant performance losses.  My guess is that
current threshold for combine is way too high and for nomask variant we better
choose the smallest vector size for epilogues instead of the next available
(use zmm for body and xmm for epilogue instead of zmmm for body and ymm for
epilogue).

ICC shows better results in these modes which makes me believe we can tune them
as well.  Overall nomask mode shows worse results comparing to options with
masking which is quite expected for KNL.

Unfortunately some big gains demonstrated by ICC are not reproducible
using GCC because we originally can't vectorize required hot loops.  E.g. on
200.sixtrack GCC has nothing and ICC has ~40% for all three modes.

I don't have the whole statistics for Haswell but synthetic tests show the
situation is really different from KNL.  Even for the 'perfect' iterations count
number (VF * 2 - 1) scalar version of epilogue shows the same result as a masked
one.  It means ratio of vector code performance vs. scalar code performance is
not as high as for KNL (KNL is more vector oriented and has weaker
scalar performance,
double vector size also matters here) and masking cost is higher for Haswell.
We still focus on AVX-512 targets more because of their rich masking
capabilities
and wider vector.

Thanks,
Ilya

>
> jeff


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Jakub Jelinek
On Fri, Jun 17, 2016 at 12:06:48PM +0200, Bernd Schmidt wrote:
> This is another step to flesh out -mmitigate-rop for i386 a little more. The
> basic idea was (I think) Richard Henderson's: if we could arrange to have
> every return preceded by a leave instruction, it would make it harder to
> construct an attack since it takes away a certain amount of control over the
> stack pointer. I extended this to move the leave/ret pair to libgcc,
> preceded by a sequence of nops, so as to take away the possibility of
> jumping into the middle of an instruction preceding the leave/ret pair and
> thereby skipping the leave.

Do you really need to require frame pointer for this?
I mean, couldn't you instead use what you do if a function needs frame
pointer and otherwise just replace the original ret with
pushq   %rbp
movq%rsp, %rbp
jmp __rop_ret
?  Or would that defeat the purpose of the mitigation?
Though, I think it is very common to have functions that just don't do
anything in many of libraries and so the pushq %rbp; movq %rsp, %rbp; jmp 
__rop_ret
sequence would still very likely appear somewhere.

As for __rop_ret, if you are non-PLT jmp to it, I bet it must be in the same
executable or shared library as the code branching to it, so should be
.hidden.  Is libgcc.a really the best place for it though?  I mean, in
various cases we don't even link libgcc.a (sometimes we only link
libgcc_s.so.1).  Wouldn't it be better to emit the __rop_ret stuff into
every CU into a comdat section, like we do e.g. for the i686 PIC pads.
Is this stuff meant only for -m64, or also for 32-bit code?  If the latter,
then e.g. the i686 PIC pads is something where you also have ret without
leave before it.

Looking at nop; nop; 1: jmp 1b; leave; ret
if you branch into the middle of the jmp insn (0x3 below), there is:
   0:   90  nop
   1:   90  nop
   2:   eb fe   jmp0x2
   4:   c9  leaveq 
   5:   c3  retq   
and thus:
   3:   fe c9   dec%cl
   5:   c3  retq   
and thus if you don't mind decreasing %cl, you still have retq without leave
before it.  But I very likely just don't understand the ROP threat stuff
enough.

Jakub


i386/prologues: ROP mitigation for normal function epilogues

2016-06-17 Thread Bernd Schmidt
This is another step to flesh out -mmitigate-rop for i386 a little more. 
The basic idea was (I think) Richard Henderson's: if we could arrange to 
have every return preceded by a leave instruction, it would make it 
harder to construct an attack since it takes away a certain amount of 
control over the stack pointer. I extended this to move the leave/ret 
pair to libgcc, preceded by a sequence of nops, so as to take away the 
possibility of jumping into the middle of an instruction preceding the 
leave/ret pair and thereby skipping the leave.


Outside of the i386 changes, this adds a new optional prologue component 
that is always placed at function entry. There's already a use this for 
this in the the static chain on stack functionality.


This has survived a bootstrap and test both normally and with 
flag_mitigate_rop enabled by force in ix86_option_override. The former 
is completely clean. In the latter case, there are all sorts of 
scan-assembler testcases that fail, but that is to be expected. There's 
also some effect on guality, but other than that everything seems to be 
working.
These tests were with a very slightly earlier version that was missing 
the have_entry_prologue test in function.c; will retest with this one as 
well.


This has a performance impact when -mmitigate-rop is enabled, I made 
some measurements a while ago and it looks like it's about twice the 
impact of -fno-omit-frame-pointer.



Bernd
	* config/i386/i386-protos.h (ix86_expand_entry_prologue): Declare.
	* config/i386/i386.c (ix86_frame_pointer_required): True if
	flag_mitigate_rop.
	(ix86_compute_frame_layout): Determine whether to use ROP returns,
	and adjust save_regs_using_mov for it.
	(ix86_expand_entry_prologue): New function.
	(ix86_expand_prologue): Move parts from here into it.  Deal with
	the rop return variant.
	(ix86_expand_epilogue): Deal with the rop return variant.
	(ix86_expand_call): For sibcalls with flag_mitigate_rop, show a
	clobber and use of the hard frame pointer.
	(ix86_output_call_insn): For sibling calls, if using rop returns,
	emit a leave.
	(ix86_pad_returns): Skip if using rop returns.
	* config/i386/i386.h (struct machine_function): New field
	use_rop_ret.
	* config/i386/i386.md (sibcall peepholes): Disallow loads from
	memory locations involving the hard frame pointer.
	(return): Explicitly call gen_simple_return.
	(simple_return): Generate simple_return_leave_internal if
	necessary.
	(simple_return_internal): Assert we're not using rop returns.
	(simple_return_leave_internal): New pattern.
	(entry_prologue): New pattern.
	* function.c (make_entry_prologue_seq): New static function.
	(thread_prologue_and_epilogue_insns): Call it and emit the
	sequence.
	* target-insns.def (entry_prologue): Add.

libgcc/
	* config/i386/t-linux (LIB2ADD_ST): New, to add ropret.S.
	* config/i386/ropret.S: New file.

Index: gcc/config/i386/i386-protos.h
===
--- gcc/config/i386/i386-protos.h	(revision 237310)
+++ gcc/config/i386/i386-protos.h	(working copy)
@@ -33,6 +33,7 @@ extern void ix86_expand_prologue (void);
 extern void ix86_maybe_emit_epilogue_vzeroupper (void);
 extern void ix86_expand_epilogue (int);
 extern void ix86_expand_split_stack_prologue (void);
+extern void ix86_expand_entry_prologue (void);
 
 extern void ix86_output_addr_vec_elt (FILE *, int);
 extern void ix86_output_addr_diff_elt (FILE *, int, int);
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 237310)
+++ gcc/config/i386/i386.c	(working copy)
@@ -11529,6 +11530,9 @@ ix86_can_use_return_insn_p (void)
 static bool
 ix86_frame_pointer_required (void)
 {
+  if (flag_mitigate_rop)
+return true;
+
   /* If we accessed previous frames, then the generated code expects
  to be able to access the saved ebp value in our frame.  */
   if (cfun->machine->accesses_prev_frame)
@@ -12102,11 +12106,21 @@ ix86_compute_frame_layout (struct ix86_f
 	   = !expensive_function_p (count);
 }
 
-  frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
-   /* If static stack checking is enabled and done with probes,
-	  the registers need to be saved before allocating the frame.  */
-   && flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
+  cfun->machine->use_rop_ret = (flag_mitigate_rop
+&& !TARGET_SEH
+&& !stack_realign_drap
+&& crtl->args.pops_args == 0
+&& !crtl->calls_eh_return
+&& !ix86_static_chain_on_stack);
+
+  if (cfun->machine->use_rop_ret)
+frame->save_regs_using_mov = true;
+  else
+frame->save_regs_using_mov
+  = (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
+	 /* If static stack checking is enabled and done with probes,
+	the registers need to be saved before allocating the frame.  */
+	 && flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
 
   /* Skip return addre

Re: PR 71181 Avoid rehash after reserve

2016-06-17 Thread Jonathan Wakely

On 16/06/16 21:29 +0200, François Dumont wrote:

Here is a new version compiling all your feedbacks.

   PR libstdc++/71181
   * include/tr1/hashtable_policy.h
   (_Prime_rehash_policy::_M_next_bkt): Make past-the-end iterator
   dereferenceable to avoid check on lower_bound result.
   (_Prime_rehash_policy::_M_bkt_for_elements): Call latter.
   (_Prime_rehash_policy::_M_need_rehash): Likewise.
   * src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
   Always return a value greater than input value. Set _M_next_resize to
   max value when reaching highest prime number.
   * src/shared/hashtable-aux.cc (__prime_list): Add comment about 
sentinel

   being now useless.
   * testsuite/23_containers/unordered_set/hash_policy/71181.cc: New.
   * testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc
   (test02): New.
   * 
testsuite/23_containers/unordered_set/hash_policy/prime_rehash.cc: 
New.

   * testsuite/23_containers/unordered_set/hash_policy/rehash.cc:
   Fix indentation.


Great - OK for trunk, thanks.



On 15/06/2016 10:29, Jonathan Wakely wrote:

On 14/06/16 22:34 +0200, François Dumont wrote:

  const unsigned long* __next_bkt =
-  std::lower_bound(__prime_list + 5, __prime_list + 
__n_primes, __n);

+  std::lower_bound(__prime_list + 6, __prime_list_end, __n);
+
+if (*__next_bkt == __n && __next_bkt != __prime_list_end)
+  ++__next_bkt;


Can we avoid this check by searching for __n + 1 instead of __n with
the lower_bound call?


Yes, that's another option, I will give it a try.


I did some comparisons and this version seems to execute fewer
instructions in some simple tests, according to cachegrind.

The only drawback is that calling _M_next_bkt(size_t(-1)) doesn't give 
the right result. But reaching this kind of value is not likely to 
happen in real use cases so it is acceptable.


Yes, good point. I don't think we can ever have that many buckets,
because each bucket requires more than one byte, so we can't fit that
many buckets in memory anyway.



[PATCH] [ARC] Add simple shift/rotate ops.

2016-06-17 Thread Claudiu Zissulescu
Basic ARC cpus are having only simple shift operations. Here they are.

OK to apply?
Claudiu

gcc/
2016-06-09  Claudiu Zissulescu  

* config/arc/arc.md (*rotrsi3_cnt1): New pattern,
(*ashlsi2_cnt1, *lshrsi3_cnt1, *ashrsi3_cnt1): Likewise.
---
 gcc/config/arc/arc.md | 40 
 1 file changed, 40 insertions(+)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 852f0e0..a095ba1 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -6219,6 +6219,46 @@
   (zero_extract:SI (match_dup 1) (match_dup 5) (match_dup 
7)))])
(match_dup 1)])
 
+(define_insn "*rotrsi3_cnt1"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=w")
+   (rotatert:SI (match_operand:SI 1 "register_operand" "c")
+(const_int 1)))]
+  ""
+  "ror %0,%1%&"
+  [(set_attr "type" "shift")
+   (set_attr "predicable" "no")
+   (set_attr "length" "4")])
+
+(define_insn "*ashlsi2_cnt1"
+  [(set (match_operand:SI 0 "dest_reg_operand"   "=Rcqq,w")
+   (ashift:SI (match_operand:SI 1 "register_operand" "Rcqq,c")
+  (const_int 1)))]
+  ""
+  "asl%? %0,%1%&"
+  [(set_attr "type" "shift")
+   (set_attr "iscompact" "maybe,false")
+   (set_attr "predicable" "no,no")])
+
+(define_insn "*lshrsi3_cnt1"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,w")
+   (lshiftrt:SI (match_operand:SI 1 "register_operand" "Rcqq,c")
+(const_int 1)))]
+  ""
+  "lsr%? %0,%1%&"
+  [(set_attr "type" "shift")
+   (set_attr "iscompact" "maybe,false")
+   (set_attr "predicable" "no,no")])
+
+(define_insn "*ashrsi3_cnt1"
+  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,w")
+   (ashiftrt:SI (match_operand:SI 1 "register_operand" "Rcqq,c")
+(const_int 1)))]
+  ""
+  "asr%? %0,%1%&"
+  [(set_attr "type" "shift")
+   (set_attr "iscompact" "maybe,false")
+   (set_attr "predicable" "no,no")])
+
 ;; include the arc-FPX instructions
 (include "fpx.md")
 
-- 
1.9.1



Re: [openacc] clean up acc directive matching in fortran

2016-06-17 Thread Tobias Burnus
Cesar Philippidis wrote:
> On 06/16/2016 08:30 PM, Cesar Philippidis wrote:
> > This patch introduces a match_acc function to the fortran FE. It's
> > almost identical to match_omp, but it passes openacc = true to
> > gfc_match_omp_clauses. I supposed I could have consolidated those two
> > functions, but they are reasonably simple so I left them separate. Maybe
> > a follow up patch can consolidate them. I was able to eliminate a lot of
> > duplicate code with this function.
> > 
> > Is this ok for trunk and gcc-6?

> And here's the patch.

The patch seems to be reverse. If I regard the "-" lines as additions
and the "+" lines as deletions, it makes sense and is in line with
the ChangeLog and what you wrote above.

Otherwise, it looks good to me.

Tobias


Re: [Patch ARM arm_neon.h] s/__FAST_MATH/__FAST_MATH__/g

2016-06-17 Thread Ramana Radhakrishnan
On Thu, Jun 16, 2016 at 6:18 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> As subject, config/arm/arm_neon.h currently uses __FAST_MATH, but:
>
>   $ gcc -E -dM - -ffast-math < /dev/null | grep FAST_MATH
>   #define __FAST_MATH__ 1
>
> It should be spelled as __FAST_MATH__.
>
> I've made that change, and confirmed that it causes the preprocessor to
> do what was intended for these intrinsics under -ffast-math.
>
> Currently bootstrapped on arm-none-linux-gnueabihf with no issues.

Ok, thanks for finding this - I'm not sure how that got messed up and
given its so far back I don't have any records left.

>
> This could also be backported to release branches. I think Ramana's patch
> went in for GCC 5.0, so backports to gcc_5_branch and gcc_6_branch would
> be feasible.
>

It's not a regression fix , so lets just keep this on trunk.

Ramana

> Thanks,
> James
>
> ---
> 2016-06-16  James Greenhalgh  
>
> * config/arm/arm_neon.h (vadd_f32): replace __FAST_MATH with
> __FAST_MATH__.
> (vaddq_f32): Likewise.
> (vmul_f32): Likewise.
> (vmulq_f32): Likewise.
> (vsub_f32): Likewise.
> (vsubq_f32): Likewise.
>