Re: PR libstdc++/78420 Make std::less etc. yield total order for pointers

2018-03-17 Thread Jonathan Wakely
On 14 March 2018 at 23:01, Jonathan Wakely wrote:
> Here's a very different patch. This gets rid of the __ptr_rel_ops and
> just puts the special handling for pointers directly in the
> std::less<_Tp*> etc. specializations. Then std::less uses
> std::less for some pointer type P*. I've also added a ton of ugly
> metaprogramming to detect the "if the call operator calls a built-in
> operator comparing pointers" condition. The idea is borrowed from
> Casey Carter and Eric Niebler's Ranges work, but basically it checks
> if operator<(T,U) or T.operator<(U) can be called, and if not the
> comparison must be using a built-in operator. If both T and U are
> convertible to pointers (specifically, to const volatile void* which
> is the most accepting of all pointers) then we assume we're using a
> built-in operator comparing pointers, and delegate to std::less volatile void*>, which ensures a total order.
>
> Tested powerpc64le-linux, committed to trunk. This fixes a regression,
> but I'm not sure about backporting it yet, I haven't even tried
> testing it on the branches.

This adjusts the new test to avoid using 'auto' so it works as C++98 too.
commit be039c0c67a6d18ddeb86bddc50cd9704a120f75
Author: Jonathan Wakely 
Date:   Sat Mar 17 13:35:50 2018 +

Fix new test that fails in C++98 mode

* testsuite/20_util/function_objects/comparisons_pointer.cc: Adjust
to compile as C++98.

diff --git 
a/libstdc++-v3/testsuite/20_util/function_objects/comparisons_pointer.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/comparisons_pointer.cc
index 6b2b8be0f51..aad0bc3dd2e 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/comparisons_pointer.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/comparisons_pointer.cc
@@ -27,7 +27,7 @@ int a[8];
 void
 test01()
 {
-  auto p = a + 8;
+  int* p = a + 8;
   std::greater gt;
 
   std::stringstream ss;
@@ -44,7 +44,7 @@ test01()
   ss.str("");
   ss.clear();
   sum = 0;
-  auto p2 = a + 8;
+  int* p2 = a + 8;
   std::greater<> gt2;
   ss << gt2(p2, b) << ' ' << gt2(b, p2) << ' ' << (!gt2(p2, b) && !gt2(b, p2));
   while (ss >> n)
@@ -59,7 +59,7 @@ test01()
 void
 test02()
 {
-  auto p = a + 8;
+  int* p = a + 8;
   std::less lt;
 
   std::stringstream ss;
@@ -76,7 +76,7 @@ test02()
   ss.str("");
   ss.clear();
   sum = 0;
-  auto p2 = a + 8;
+  int* p2 = a + 8;
   std::less<> lt2;
   ss << lt2(p2, b) << ' ' << lt2(b, p2) << ' ' << (!lt2(p2, b) && !lt2(b, p2));
   while (ss >> n)
@@ -91,7 +91,7 @@ test02()
 void
 test03()
 {
-  auto p = a + 8;
+  int* p = a + 8;
   std::greater_equal ge;
 
   std::stringstream ss;
@@ -109,7 +109,7 @@ test03()
   ss.str("");
   ss.clear();
   sum = 0;
-  auto p2 = a + 8;
+  int* p2 = a + 8;
   std::greater_equal<> ge2;
   ss << !ge2(p2, b) << ' ' << !ge2(b, p2) << ' ' << (ge2(p2, b) && ge2(b, p2));
   while (ss >> n)
@@ -125,7 +125,7 @@ test03()
 void
 test04()
 {
-  auto p = a + 8;
+  int* p = a + 8;
   std::less_equal le;
 
   std::stringstream ss;
@@ -143,7 +143,7 @@ test04()
   ss.str("");
   ss.clear();
   sum = 0;
-  auto p2 = a + 8;
+  int* p2 = a + 8;
   std::less_equal<> le2;
   ss << !le2(p2, b) << ' ' << !le2(b, p2) << ' ' << (le2(p2, b) && le2(b, p2));
   while (ss >> n)
@@ -167,7 +167,7 @@ void
 test05()
 {
   std::less lt;
-  auto p = y + 4;
+  X* p = y + 4;
   std::stringstream ss;
   ss << lt(x, p) << ' ' << lt(p, x) << ' ' << (!lt(p, x) && !lt(x, p));
   int sum = 0;
@@ -183,7 +183,7 @@ test05()
   ss.str("");
   ss.clear();
   sum = 0;
-  auto p2 = y + 4;
+  X* p2 = y + 4;
   std::less<> lt2;
   ss << lt2(x, p2) << ' ' << lt2(p2, x) << ' ' << (!lt2(x, p2) && !lt2(p2, x));
   while (ss >> n)


PING^2 [PATCH] i386: Avoid PLT when shadow stack is enabled directly

2018-03-17 Thread H.J. Lu
On Fri, Dec 8, 2017 at 5:02 AM, H.J. Lu  wrote:
> On Tue, Oct 24, 2017 at 5:31 AM, H.J. Lu  wrote:
>> PLT should be avoided with shadow stack in 32-bit mode if more than 2
>> parameters are passed in registers since only 2 parameters can be passed
>> in registers for external function calls via PLT with shadow stack
>> enabled.
>>
>> OK for trunk if there is no regressions?
>
> Here is the updated patch.
>
> PING.
>

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00485.html

PING.

-- 
H.J.


Re: Don't try to vectorise COND_EXPR reduction chains (PR 84913)

2018-03-17 Thread Richard Biener
On March 17, 2018 11:21:18 AM GMT+01:00, Richard Sandiford 
 wrote:
>The testcase ICEd for both SVE and AVX512 because we were trying
>to vectorise a chain of COND_EXPRs as a reduction and getting
>confused by reduc_index == -1.
>
>We normally handle reduction chains by vectorising all statements
>except the last one normally, but that wouldn't work here for the
>reasons explained in the comment.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK. 

Richard. 

>Richard
>
>
>2018-03-17  Richard Sandiford  
>
>gcc/
>   PR tree-optimization/84913
>   * tree-vect-loop.c (vectorizable_reduction): Don't try to
>   vectorize chains of COND_EXPRs.
>
>gcc/testsuite/
>   PR tree-optimization/84913
>   * gfortran.dg/vect/pr84913.f90: New test.
>
>Index: gcc/tree-vect-loop.c
>===
>--- gcc/tree-vect-loop.c   2018-03-01 08:20:43.850526185 +
>+++ gcc/tree-vect-loop.c   2018-03-17 10:19:19.289947549 +
>@@ -6788,6 +6788,30 @@ vectorizable_reduction (gimple *stmt, gi
>/* If we have a condition reduction, see if we can simplify it further.
> */
>   if (v_reduc_type == COND_REDUCTION)
> {
>+  /* TODO: We can't yet handle reduction chains, since we need to
>treat
>+   each COND_EXPR in the chain specially, not just the last one.
>+   E.g. for:
>+
>+  x_1 = PHI 
>+  x_2 = a_2 ? ... : x_1;
>+  x_3 = a_3 ? ... : x_2;
>+
>+   we're interested in the last element in x_3 for which a_2 || a_3
>+   is true, whereas the current reduction chain handling would
>+   vectorize x_2 as a normal VEC_COND_EXPR and only treat x_3
>+   as a reduction operation.  */
>+  if (reduc_index == -1)
>+  {
>+if (dump_enabled_p ())
>+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>+   "conditional reduction chains not supported\n");
>+return false;
>+  }
>+
>+  /* vect_is_simple_reduction ensured that operand 2 is the
>+   loop-carried operand.  */
>+  gcc_assert (reduc_index == 2);
>+
>   /* Loop peeling modifies initial value of reduction PHI, which
>makes the reduction stmt to be transformed different to the
>original stmt analyzed.  We need to record reduction code for
>Index: gcc/testsuite/gfortran.dg/vect/pr84913.f90
>===
>--- /dev/null  2018-03-17 08:19:33.716019995 +
>+++ gcc/testsuite/gfortran.dg/vect/pr84913.f90 2018-03-17
>10:19:19.286947657 +
>@@ -0,0 +1,13 @@
>+! { dg-do compile }
>+
>+function foo(a, b, c, n)
>+  integer :: a(n), b(n), c(n), n, i, foo
>+  foo = 0
>+  do i = 1, n
>+if (a(i) .eq. b(i)) then
>+  foo = 1
>+else if (a(i) .eq. c(i)) then
>+  foo = 2
>+end if
>+  end do
>+end function foo



PING: [PATCH] i386: Insert ENDBR before the profiling counter call

2018-03-17 Thread H.J. Lu
On Tue, Oct 24, 2017 at 10:58 AM, H.J. Lu  wrote:
> On Tue, Oct 24, 2017 at 10:40 AM, Andi Kleen  wrote:
>> "H.J. Lu"  writes:
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/pr82699-4.c
>>> @@ -0,0 +1,11 @@
>>> +/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
>>> +/* { dg-options "-O2 -fpic -fcf-protection -mcet -pg -mfentry 
>>> -fasynchronous-unwind-tables" } */
>>> +/* { dg-final { scan-assembler-times {\t\.cfi_startproc\n\tendbr} 1 }
>>> } */
>>
>> Would add test cases for -mnop-mcount and -mrecord-mcount too
>>
>
> Here is the updated patch to add a testcase for -mnop-mcount -mrecord-mcount.
> No other changes otherwise.
>
>

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01741.html

PING.

-- 
H.J.


Re: [PR c++/71965] silence multi-dim array init sorry without tf_error

2018-03-17 Thread Alexandre Oliva
On Mar 17, 2018, Alexandre Oliva  wrote:

> We shouldn't substitute templates into short-circuited-out concepts
> constraints, but we do, and to add insult to injury, we issue a
> sorry() error when a concept that shouldn't even have been substituted
> attempts to perform a multi-dimensional array initialization with a
> new{} expression.

> Although fixing the requirements short-circuiting is probably too
> risky at this point, we can get closer to the intended effect by
> silencing that sorry just as we silence other errors.

Err...  Regstrapped on i686- and x86_64-linux-gnu.  Ok to install?

> for  gcc/cp/ChangeLog

>   PR c++/71965
>   * init.c (build_vec_init): Silence sorry without tf_error.

> for  gcc/testsuite/ChangeLog

>   PR c++/71965
>   * g++.dg/concepts/pr71965.C: New.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PR c++/71251] out-of-range parms in tmpl arg substitution

2018-03-17 Thread Alexandre Oliva
As we go through each of the template parameters, substituting it with
corresponding template arguments, an incorrect argument list might
cause us to index argument vectors past their length (or fail in the
preceding tree checks).  Avoid such dereferences and instead issue an
error (if requested) if we find the argument index to be past the
parameter vector length.

Regstrapped on i686- and x86_64-linux-gnu.  Ok to install?

for  gcc/cp/ChangeLog

PR c++/71251
* pt.c (tsubst): Test for and report out-of-range template
parms.

for  gcc/testsuite/ChangeLog

PR c++/71251
* g++.dg/cpp0x/pr71251.C: New.
---
 gcc/cp/pt.c  |   27 +++
 gcc/testsuite/g++.dg/cpp0x/pr71251.C |9 +
 2 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr71251.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fa9bfb12c297..4cc18d0abe78 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13976,11 +13976,30 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
if (level <= levels
&& TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0)
  {
-   arg = TMPL_ARG (args, level, idx);
+   if (TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > idx)
+ {
+   arg = TMPL_ARG (args, level, idx);
 
-   /* See through ARGUMENT_PACK_SELECT arguments. */
-   if (arg && TREE_CODE (arg) == ARGUMENT_PACK_SELECT)
- arg = argument_pack_select_arg (arg);
+   /* See through ARGUMENT_PACK_SELECT arguments. */
+   if (arg && TREE_CODE (arg) == ARGUMENT_PACK_SELECT)
+ arg = argument_pack_select_arg (arg);
+ }
+   else
+ {
+   if (complain & tf_error)
+ {
+   /* ??? We could use a better location for this
+  message.  It takes the context of the closing
+  bracket of the innermost template we're
+  substituting, but the error may very well be
+  related with some enclosing context.  */
+   error ("missing template argument at this"
+  " or enclosing context");
+   error_at (DECL_SOURCE_LOCATION (TEMPLATE_PARM_DECL (t)),
+ "for substitution of template parameter");
+ }
+   arg = error_mark_node;
+ }
  }
 
if (arg == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr71251.C 
b/gcc/testsuite/g++.dg/cpp0x/pr71251.C
new file mode 100644
index ..f16fb130
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr71251.C
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++11 } }
+
+template template using U=void; // { dg-error "parameter" }
+
+template struct S1;
+
+template struct S1>{ template struct S2:S2{}; }; // { dg-error "missing" }
+//^ the error should be here, but it is^here, thus the weird line break


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PR c++/71965] silence multi-dim array init sorry without tf_error

2018-03-17 Thread Alexandre Oliva
We shouldn't substitute templates into short-circuited-out concepts
constraints, but we do, and to add insult to injury, we issue a
sorry() error when a concept that shouldn't even have been substituted
attempts to perform a multi-dimensional array initialization with a
new{} expression.

Although fixing the requirements short-circuiting is probably too
risky at this point, we can get closer to the intended effect by
silencing that sorry just as we silence other errors.

for  gcc/cp/ChangeLog

PR c++/71965
* init.c (build_vec_init): Silence sorry without tf_error.

for  gcc/testsuite/ChangeLog

PR c++/71965
* g++.dg/concepts/pr71965.C: New.
---
 gcc/cp/init.c   |   19 ---
 gcc/testsuite/g++.dg/concepts/pr71965.C |   27 +++
 2 files changed, 39 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/pr71965.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index cb62f4886e6d..dcaad730dc86 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -4384,12 +4384,17 @@ build_vec_init (tree base, tree maxindex, tree init,
   else if (TREE_CODE (type) == ARRAY_TYPE)
{
  if (init && !BRACE_ENCLOSED_INITIALIZER_P (init))
-   sorry
- ("cannot initialize multi-dimensional array with initializer");
- elt_init = build_vec_init (build1 (INDIRECT_REF, type, base),
-0, init,
-explicit_value_init_p,
-0, complain);
+   {
+ if ((complain & tf_error))
+   sorry ("cannot initialize multi-dimensional"
+  " array with initializer");
+ elt_init = error_mark_node;
+   }
+ else
+   elt_init = build_vec_init (build1 (INDIRECT_REF, type, base),
+  0, init,
+  explicit_value_init_p,
+  0, complain);
}
   else if (explicit_value_init_p)
{
@@ -4455,7 +4460,7 @@ build_vec_init (tree base, tree maxindex, tree init,
}
 
   current_stmt_tree ()->stmts_are_full_exprs_p = 1;
-  if (elt_init)
+  if (elt_init && !errors)
finish_expr_stmt (elt_init);
   current_stmt_tree ()->stmts_are_full_exprs_p = 0;
 
diff --git a/gcc/testsuite/g++.dg/concepts/pr71965.C 
b/gcc/testsuite/g++.dg/concepts/pr71965.C
new file mode 100644
index ..6bfaef19211f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/pr71965.C
@@ -0,0 +1,27 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "-fconcepts" }
+
+template 
+concept bool Destructible() {
+return false;
+}
+
+template 
+concept bool ConstructibleObject =
+// Concept evaluation should short-circuit even the template
+// substitution, so we shouldn't even substitute into the requires
+// constraint and the unimplemented multi-dimensional new T{...}
+// initialization.  ATM we do, but as long as we don't output the
+// sorry() message we used to for such constructs when asked not
+// to issue errors, this shouldn't be a problem for this and
+// similar cases.
+Destructible() && requires (Args&&...args) {
+new T{ (Args&&)args... };
+};
+
+int main() {
+using T = int[2][2];
+// GCC has not implemented initialization of multi-dimensional
+// arrays with new{} expressions.
+static_assert(!ConstructibleObject);
+}


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Use SCEV information when aligning for vectorisation (PR 84005)

2018-03-17 Thread Richard Sandiford
This PR is another regression caused by the removal of the simple_iv
check in dr_analyze_innermost for BB analysis.  Without splitting out
the step, we weren't able to find an underlying object whose alignment
could be increased.

As with PR81635, I think the simple_iv was only handling one special
case of something that ought to be more general.  The more general
thing here is that if the address can be analysed as a scalar
evolution, and if all updates preserve alignment N, it's possible
to align the address to N by increasing the alignment of the base
object to N.  That applies also to outer loops, and to both loop
and BB analysis.

I wasn't sure where the new functions ought to live, but tree-data-ref.c
seemed OK since (a) that already does scev analysis on addresses and
(b) you'd want to use dr_analyze_innermost first if you were analysing
a reference.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2018-03-17  Richard Sandiford  

gcc/
PR tree-optimization/84005
* tree-data-ref.h (get_base_for_alignment): Declare.
* tree-data-ref.c (get_base_for_alignment_1): New function.
(get_base_for_alignment): Likewise.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
get_base_for_alignment to find a suitable base object, instead
of always using drb->base_address.

gcc/testsuite/
PR tree-optimization/84005
* gcc.dg/vect/bb-slp-1.c: Make sure there is no message about
failing to force the alignment.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2018-01-13 18:02:00.948360274 +
+++ gcc/tree-data-ref.h 2018-03-17 10:43:42.544669549 +
@@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec
 extern tree find_data_references_in_bb (struct loop *, basic_block,
 vec *);
 extern unsigned int dr_alignment (innermost_loop_behavior *);
+extern tree get_base_for_alignment (tree, unsigned int *);
 
 /* Return the alignment in bytes that DR is guaranteed to have at all
times.  */
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +
+++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +
@@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d
   return alignment;
 }
 
+/* If BASE is a pointer-typed SSA name, try to find the object that it
+   is based on.  Return this object X on success and store the alignment
+   in bytes of BASE -  in *ALIGNMENT_OUT.  */
+
+static tree
+get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
+{
+  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
+return NULL_TREE;
+
+  gimple *def = SSA_NAME_DEF_STMT (base);
+  base = analyze_scalar_evolution (loop_containing_stmt (def), base);
+
+  /* Peel chrecs and record the minimum alignment preserved by
+ all steps.  */
+  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
+{
+  unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));
+  alignment = MIN (alignment, step_alignment);
+  base = CHREC_LEFT (base);
+}
+
+  /* Punt if the expression is too complicated to handle.  */
+  if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base)))
+return NULL_TREE;
+
+  /* Analyze the base to which the steps we peeled were applied.  */
+  poly_int64 bitsize, bitpos, bytepos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep;
+  tree offset;
+  base = get_inner_reference (build_fold_indirect_ref (base),
+ , , , ,
+ , , );
+  if (!base || !multiple_p (bitpos, BITS_PER_UNIT, ))
+return NULL_TREE;
+
+  /* Restrict the alignment to that guaranteed by the offsets.  */
+  unsigned int bytepos_alignment = known_alignment (bytepos);
+  if (bytepos_alignment != 0)
+alignment = MIN (alignment, bytepos_alignment);
+  if (offset)
+{
+  unsigned int offset_alignment = highest_pow2_factor (offset);
+  alignment = MIN (alignment, offset_alignment);
+}
+
+  *alignment_out = alignment;
+  return base;
+}
+
+/* Return the object whose alignment would need to be changed in order
+   to increase the alignment of ADDR.  Store the maximum achievable
+   alignment in *MAX_ALIGNMENT.  */
+
+tree
+get_base_for_alignment (tree addr, unsigned int *max_alignment)
+{
+  tree base = get_base_for_alignment_1 (addr, max_alignment);
+  if (base)
+return base;
+
+  if (TREE_CODE (addr) == ADDR_EXPR)
+addr = TREE_OPERAND (addr, 0);
+  *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  return addr;
+}
+
 /* Recursive helper function.  */
 
 static bool
Index: gcc/tree-vect-data-refs.c

Don't try to vectorise COND_EXPR reduction chains (PR 84913)

2018-03-17 Thread Richard Sandiford
The testcase ICEd for both SVE and AVX512 because we were trying
to vectorise a chain of COND_EXPRs as a reduction and getting
confused by reduc_index == -1.

We normally handle reduction chains by vectorising all statements
except the last one normally, but that wouldn't work here for the
reasons explained in the comment.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2018-03-17  Richard Sandiford  

gcc/
PR tree-optimization/84913
* tree-vect-loop.c (vectorizable_reduction): Don't try to
vectorize chains of COND_EXPRs.

gcc/testsuite/
PR tree-optimization/84913
* gfortran.dg/vect/pr84913.f90: New test.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2018-03-01 08:20:43.850526185 +
+++ gcc/tree-vect-loop.c2018-03-17 10:19:19.289947549 +
@@ -6788,6 +6788,30 @@ vectorizable_reduction (gimple *stmt, gi
   /* If we have a condition reduction, see if we can simplify it further.  */
   if (v_reduc_type == COND_REDUCTION)
 {
+  /* TODO: We can't yet handle reduction chains, since we need to treat
+each COND_EXPR in the chain specially, not just the last one.
+E.g. for:
+
+   x_1 = PHI 
+   x_2 = a_2 ? ... : x_1;
+   x_3 = a_3 ? ... : x_2;
+
+we're interested in the last element in x_3 for which a_2 || a_3
+is true, whereas the current reduction chain handling would
+vectorize x_2 as a normal VEC_COND_EXPR and only treat x_3
+as a reduction operation.  */
+  if (reduc_index == -1)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"conditional reduction chains not supported\n");
+ return false;
+   }
+
+  /* vect_is_simple_reduction ensured that operand 2 is the
+loop-carried operand.  */
+  gcc_assert (reduc_index == 2);
+
   /* Loop peeling modifies initial value of reduction PHI, which
 makes the reduction stmt to be transformed different to the
 original stmt analyzed.  We need to record reduction code for
Index: gcc/testsuite/gfortran.dg/vect/pr84913.f90
===
--- /dev/null   2018-03-17 08:19:33.716019995 +
+++ gcc/testsuite/gfortran.dg/vect/pr84913.f90  2018-03-17 10:19:19.286947657 
+
@@ -0,0 +1,13 @@
+! { dg-do compile }
+
+function foo(a, b, c, n)
+  integer :: a(n), b(n), c(n), n, i, foo
+  foo = 0
+  do i = 1, n
+if (a(i) .eq. b(i)) then
+  foo = 1
+else if (a(i) .eq. c(i)) then
+  foo = 2
+end if
+  end do
+end function foo


Re: [PATCH] Fix x86 -march/-mtune after recent icelake-* split (PR target/84902)

2018-03-17 Thread Uros Bizjak
On Fri, Mar 16, 2018 at 9:23 PM, Jakub Jelinek  wrote:
> Hi!
>
> We now have more than 32 enumerators in enum processor_type (well, we had
> 33 already before the icelake-* split, but the last one isn't really used),
> and while all the m_* macros were changed tom 1U< HOST_WIDE_INT_1U< vars weren't, so for PROCESSOR_ZNVER1 (-march=znver1 or -mtune=znver1)
> we now invoke undefined behavior, which can e.g. mean treat it like
> PROCESSOR_GENERIC, or no processor at all.
> Also, given that the m_* macros are using HOST_WIDE_INT_1U, it seems more
> consistent to use unsigned HOST_WIDE_INT type for the
> initial_ix86_*_features tables, rather than unsigned long long.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-03-16  Jakub Jelinek  
>
> PR target/84902
> * config/i386/i386.c (initial_ix86_tune_features,
> initial_ix86_arch_features): Use unsigned HOST_WIDE_INT rather than
> unsigned long long.
> (set_ix86_tune_features): Change ix86_tune_mask from unsigned int
> to unsigned HOST_WIDE_INT, initialize to HOST_WIDE_INT_1U << ix86_tune
> rather than 1u << ix86_tune.  Formatting fix.
> (ix86_option_override_internal): Change ix86_arch_mask from
> unsigned int to unsigned HOST_WIDE_INT, initialize to
> HOST_WIDE_INT_1U << ix86_arch rather than 1u << ix86_arch.
> (ix86_function_specific_restore): Likewise.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2018-03-15 22:07:57.964242564 +0100
> +++ gcc/config/i386/i386.c  2018-03-16 18:35:53.621105345 +0100
> @@ -183,7 +183,7 @@ unsigned char ix86_tune_features[X86_TUN
>
>  /* Feature tests against the various tunings used to create 
> ix86_tune_features
> based on the processor mask.  */
> -static unsigned long long initial_ix86_tune_features[X86_TUNE_LAST] = {
> +static unsigned HOST_WIDE_INT initial_ix86_tune_features[X86_TUNE_LAST] = {
>  #undef DEF_TUNE
>  #define DEF_TUNE(tune, name, selector) selector,
>  #include "x86-tune.def"
> @@ -195,7 +195,7 @@ unsigned char ix86_arch_features[X86_ARC
>
>  /* Feature tests against the various architecture variations, used to create
> ix86_arch_features based on the processor mask.  */
> -static unsigned long long initial_ix86_arch_features[X86_ARCH_LAST] = {
> +static unsigned HOST_WIDE_INT initial_ix86_arch_features[X86_ARCH_LAST] = {
>/* X86_ARCH_CMOV: Conditional move was added for pentiumpro.  */
>~(m_386 | m_486 | m_PENT | m_LAKEMONT | m_K6),
>
> @@ -3310,7 +3310,7 @@ parse_mtune_ctrl_str (bool dump)
>  static void
>  set_ix86_tune_features (enum processor_type ix86_tune, bool dump)
>  {
> -  unsigned int ix86_tune_mask = 1u << ix86_tune;
> +  unsigned HOST_WIDE_INT ix86_tune_mask = HOST_WIDE_INT_1U << ix86_tune;
>int i;
>
>for (i = 0; i < X86_TUNE_LAST; ++i)
> @@ -3318,7 +3318,8 @@ set_ix86_tune_features (enum processor_t
>if (ix86_tune_no_default)
>  ix86_tune_features[i] = 0;
>else
> -ix86_tune_features[i] = !!(initial_ix86_tune_features[i] & 
> ix86_tune_mask);
> +   ix86_tune_features[i]
> + = !!(initial_ix86_tune_features[i] & ix86_tune_mask);
>  }
>
>if (dump)
> @@ -3373,7 +3374,7 @@ ix86_option_override_internal (bool main
>struct gcc_options *opts_set)
>  {
>int i;
> -  unsigned int ix86_arch_mask;
> +  unsigned HOST_WIDE_INT ix86_arch_mask;
>const bool ix86_tune_specified = (opts->x_ix86_tune_string != NULL);
>
>const wide_int_bitmask PTA_3DNOW (HOST_WIDE_INT_1U << 0);
> @@ -4234,7 +4235,7 @@ ix86_option_override_internal (bool main
>XDELETEVEC (s);
>  }
>
> -  ix86_arch_mask = 1u << ix86_arch;
> +  ix86_arch_mask = HOST_WIDE_INT_1U << ix86_arch;
>for (i = 0; i < X86_ARCH_LAST; ++i)
>  ix86_arch_features[i] = !!(initial_ix86_arch_features[i] & 
> ix86_arch_mask);
>
> @@ -5159,7 +5160,7 @@ ix86_function_specific_restore (struct g
>  {
>enum processor_type old_tune = ix86_tune;
>enum processor_type old_arch = ix86_arch;
> -  unsigned int ix86_arch_mask;
> +  unsigned HOST_WIDE_INT ix86_arch_mask;
>int i;
>
>/* We don't change -fPIC.  */
> @@ -5210,7 +5211,7 @@ ix86_function_specific_restore (struct g
>/* Recreate the arch feature tests if the arch changed */
>if (old_arch != ix86_arch)
>  {
> -  ix86_arch_mask = 1u << ix86_arch;
> +  ix86_arch_mask = HOST_WIDE_INT_1U << ix86_arch;
>for (i = 0; i < X86_ARCH_LAST; ++i)
> ix86_arch_features[i]
>   = !!(initial_ix86_arch_features[i] & ix86_arch_mask);
>
> Jakub


Re: [PATCH AArch64]Fix test failure for pr84682-2.c

2018-03-17 Thread Richard Sandiford
Kyrill  Tkachov  writes:
> Hi Bin,
>
> On 16/03/18 11:42, Bin Cheng wrote:
>> Hi,
>> This simple patch fixes test case failure for pr84682-2.c by returning
>> false on wrong mode rtx in aarch64_classify_address, rather than assert.
>>
>> Bootstrap and test on aarch64.  Is it OK?
>>
>> Thanks,
>> bin
>>
>> 2018-03-16  Bin Cheng  
>>
>> * config/aarch64/aarch64.c (aarch64_classify_address): Return false
>> on wrong mode rtx, rather than assert.
>
> This looks ok to me in light of
> https://gcc.gnu.org/ml/gcc-patches/2018-03/msg00633.html
> This function is used to validate inline asm operands too, not just
> internally-generated addresses.
> Therefore all kinds of garbage must be rejected gracefully rather than ICEing.
>
> You'll need an approval from an AArch64 maintainer though.

IMO we should make address_operand itself check something like:

  (GET_MODE (x) == VOIDmode || SCALAR_INT_MODE_P (GET_MODE (x)))

Target-independent code fundamentally assumes that an address will not
be a float, so I think the check should be in target-independent code
rather than copied to each individual backend.

This was only caught on aarch64 because we added the assert, but I think
some backends ignore the mode of the address and so would actually accept
simple float rtxes.

Thanks,
Richard