date:20191023

Re: [PATCH] Define [range.cmp] comparisons for C++20

2019-10-23 Thread Jonathan Wakely

On Wed, 23 Oct 2019 at 00:33, Tam S. B.  wrote:
>
> > commit b948d3f92d7bbe4d53237cb20ff40a15fa123988
> > Author: Jonathan Wakely 
> > Date:   Thu Oct 17 15:20:38 2019 +0100
> >
> > Define [range.cmp] comparisons for C++20
> >
> > Define std::identity, std::ranges::equal_to, std::ranges::not_equal_to,
> > std::ranges::greater, std::ranges::less, std::ranges::greater_equal and
> > std::ranges::less_equal.
> >
> > * include/Makefile.am: Add new header.
> > * include/Makefile.in: Regenerate.
> > * include/bits/range_cmp.h: New header for C++20 function 
> > objects.
> > * include/std/functional: Include new header.
> > * testsuite/20_util/function_objects/identity/1.cc: New test.
> > * testsuite/20_util/function_objects/range.cmp/equal_to.cc: New 
> > test.
> > * testsuite/20_util/function_objects/range.cmp/greater.cc: New 
> > test.
> > * 
> > testsuite/20_util/function_objects/range.cmp/greater_equal.cc: New
> > test.
> > * testsuite/20_util/function_objects/range.cmp/less.cc: New 
> > test.
> > * testsuite/20_util/function_objects/range.cmp/less_equal.cc: 
> > New test.
> > * testsuite/20_util/function_objects/range.cmp/not_equal_to.cc: 
> > New
> > test.
> >
> > diff --git a/libstdc++-v3/include/Makefile.am 
> > b/libstdc++-v3/include/Makefile.am
> > index 35ee3cfcd34..9ff12f10fb1 100644
> > --- a/libstdc++-v3/include/Makefile.am
> > +++ b/libstdc++-v3/include/Makefile.am
> > @@ -152,6 +152,7 @@ bits_headers = \
> >   ${bits_srcdir}/random.h \
> >   ${bits_srcdir}/random.tcc \
> >   ${bits_srcdir}/range_access.h \
> > + ${bits_srcdir}/range_cmp.h \
> >   ${bits_srcdir}/refwrap.h \
> >   ${bits_srcdir}/regex.h \
> >   ${bits_srcdir}/regex.tcc \
> > diff --git a/libstdc++-v3/include/bits/range_cmp.h 
> > b/libstdc++-v3/include/bits/range_cmp.h
> > new file mode 100644
> > index 000..3e5bb8847ab
> > --- /dev/null
> > +++ b/libstdc++-v3/include/bits/range_cmp.h
> > @@ -0,0 +1,179 @@
> > +// Concept-constrained comparison implementations -*- C++ -*-
> > +
> > +// Copyright (C) 2019 Free Software Foundation, Inc.
> > +//
> > +// This file is part of the GNU ISO C++ Library.  This library is free
> > +// software; you can redistribute it and/or modify it under the
> > +// terms of the GNU General Public License as published by the
> > +// Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +
> > +// This library is distributed in the hope that it will be useful,
> > +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +// GNU General Public License for more details.
> > +
> > +// Under Section 7 of GPL version 3, you are granted additional
> > +// permissions described in the GCC Runtime Library Exception, version
> > +// 3.1, as published by the Free Software Foundation.
> > +
> > +// You should have received a copy of the GNU General Public License and
> > +// a copy of the GCC Runtime Library Exception along with this program;
> > +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +// .
> > +
> > +/** @file bits/ranges_function.h
>
> This does not match the actual filename. Seems like a typo?

Yes, I renamed the file and didn't fix the comment. It's already fixed
locally and will be committed with the next round of changes (probably
today).

> > + *  This is an internal header file, included by other library headers.
> > + *  Do not attempt to use it directly. @headername{functional}
> > + */
> > +
> > +#ifndef _RANGE_CMP_H
> > +#define _RANGE_CMP_H 1
> > +
> > +#if __cplusplus > 201703L
> > +# include 
> > +# include 
> > +
> > +namespace std _GLIBCXX_VISIBILITY(default)
> > +{
> > +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> > +
> > +  struct __is_transparent; // not defined
> > +
> > +  // Define std::identity here so that  and 
> > +  // don't need to include  to get it.
> > +
> > +  /// [func.identity] The identity function.
> > +  struct identity
> > +  {
> > +template
> > +  constexpr _Tp&&
> > +  operator()(_Tp&& __t) const noexcept
> > +  { return std::forward<_Tp>(__t); }
> > +
> > +using is_transparent = __is_transparent;
> > +  };
> > +
> > +namespace ranges
> > +{
> > +  namespace __detail
> > +  {
> > +// BUILTIN-PTR-CMP(T, ==, U)
> > +template
> > +  concept __eq_builtin_ptr_cmp
> > + = convertible_to<_Tp, const volatile void*>
> > +   && convertible_to<_Up, const volatile void*>
> > +   && (! requires(_Tp&& __t, _Up&& __u)
>
> The use of concepts is causing `#include ` to break on clang.

OK, thanks, I'll guard it with #if.

Re: Ping: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-23 Thread luoxhu



Hi Feng, 
Thanks for the patch.  It works for me as expected.

I am not a reviewer, just tiny comment after tried.
This is quite a good case for newbies to go through the ipa-cp pass.
Is it necessary to update the test case a bit as attached to include more
circumstances for callee's aggregate include both by value and by
reference when caller's aggregate is passed either by value or by reference?


Xiong Hu
Thanks


On 2019/9/30 16:53, Feng Xue OS wrote:

Hi Honza & Martin,

And also hope your comments on this patch. Thanks.

Feng


From: Feng Xue OS 
Sent: Thursday, September 19, 2019 10:30 PM
To: Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org
Subject: [PATCH V4] Extend IPA-CP to support arithmetically-computed 
value-passing on by-ref argument (PR ipa/91682)

Fix a bug on unary/binary operation check.

Feng
---
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 33d52fe5537..f218f1093b8 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1244,23 +1244,23 @@ initialize_node_lattices (struct cgraph_node *node)
}
  }

-/* Return the result of a (possibly arithmetic) pass through jump function
-   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
-   to which the result is passed.  Return NULL_TREE if that cannot be
-   determined or be considered an interprocedural invariant.  */
+/* Return the result of a (possibly arithmetic) operation on the constant
+   value INPUT.  OPERAND is 2nd operand for binary operation.  RES_TYPE is
+   the type of the parameter to which the result is passed.  Return
+   NULL_TREE if that cannot be determined or be considered an
+   interprocedural invariant.  */

  static tree
-ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
-   tree res_type)
+ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand,
+tree res_type)
  {
tree res;

-  if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
+  if (opcode == NOP_EXPR)
  return input;
if (!is_gimple_ip_invariant (input))
  return NULL_TREE;

-  tree_code opcode = ipa_get_jf_pass_through_operation (jfunc);
if (!res_type)
  {
if (TREE_CODE_CLASS (opcode) == tcc_comparison)
@@ -1274,8 +1274,7 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
*jfunc, tree input,
if (TREE_CODE_CLASS (opcode) == tcc_unary)
  res = fold_unary (opcode, res_type, input);
else
-res = fold_binary (opcode, res_type, input,
-  ipa_get_jf_pass_through_operand (jfunc));
+res = fold_binary (opcode, res_type, input, operand);

if (res && !is_gimple_ip_invariant (res))
  return NULL_TREE;
@@ -1283,6 +1282,21 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
*jfunc, tree input,
return res;
  }

+/* Return the result of a (possibly arithmetic) pass through jump function
+   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
+   to which the result is passed.  Return NULL_TREE if that cannot be
+   determined or be considered an interprocedural invariant.  */
+
+static tree
+ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
+   tree res_type)
+{
+  return ipa_get_jf_arith_result (ipa_get_jf_pass_through_operation (jfunc),
+ input,
+ ipa_get_jf_pass_through_operand (jfunc),
+ res_type);
+}
+
  /* Return the result of an ancestor jump function JFUNC on the constant value
 INPUT.  Return NULL_TREE if that cannot be determined.  */

@@ -1416,6 +1430,146 @@ ipa_context_from_jfunc (ipa_node_params *info, 
cgraph_edge *cs, int csidx,
return ctx;
  }

+/* See if NODE is a clone with a known aggregate value at a given OFFSET of a
+   parameter with the given INDEX.  */
+
+static tree
+get_clone_agg_value (struct cgraph_node *node, HOST_WIDE_INT offset,
+int index)
+{
+  struct ipa_agg_replacement_value *aggval;
+
+  aggval = ipa_get_agg_replacements_for_node (node);
+  while (aggval)
+{
+  if (aggval->offset == offset
+ && aggval->index == index)
+   return aggval->value;
+  aggval = aggval->next;
+}
+  return NULL_TREE;
+}
+
+/* Determine whether ITEM, jump function for an aggregate part, evaluates to a
+   single known constant value and if so, return it.  Otherwise return NULL.
+   NODE and INFO describes the caller node or the one it is inlined to, and
+   its related info.  */
+
+static tree
+ipa_agg_value_from_node (class ipa_node_params *info,
+struct cgraph_node *node,
+struct ipa_agg_jf_item *item)
+{
+  tree value = NULL_TREE;
+  int src_idx;
+
+  if (item->offset < 0 || item->jftype == IPA_JF_UNKNOWN)
+return NULL_TREE;
+
+  if (item->jftype == IPA_JF_CONST)
+return item->value.constant;
+
+  gcc_checking_assert (item->jftyp

Re: Ping: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-23 Thread Feng Xue OS

Thanks for your comment, I will update the case accordingly.

Feng


From: luoxhu 
Sent: Wednesday, October 23, 2019 4:02 PM
To: Feng Xue OS; Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org
Subject: Re: Ping: [PATCH V4] Extend IPA-CP to support arithmetically-computed 
value-passing on by-ref argument (PR ipa/91682)


Hi Feng,
Thanks for the patch.  It works for me as expected.
I am not a reviewer, just tiny comment after tried.
This is quite a good case for newbies to go through the ipa-cp pass.
Is it necessary to update the test case a bit as attached to include more
circumstances for callee's aggregate include both by value and by
reference when caller's aggregate is passed either by value or by reference?


Xiong Hu
Thanks


On 2019/9/30 16:53, Feng Xue OS wrote:
> Hi Honza & Martin,
>
> And also hope your comments on this patch. Thanks.
>
> Feng
>
> 
> From: Feng Xue OS 
> Sent: Thursday, September 19, 2019 10:30 PM
> To: Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org
> Subject: [PATCH V4] Extend IPA-CP to support arithmetically-computed 
> value-passing on by-ref argument (PR ipa/91682)
>
> Fix a bug on unary/binary operation check.
>
> Feng
> ---
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 33d52fe5537..f218f1093b8 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1244,23 +1244,23 @@ initialize_node_lattices (struct cgraph_node *node)
> }
>   }
>
> -/* Return the result of a (possibly arithmetic) pass through jump function
> -   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
> -   to which the result is passed.  Return NULL_TREE if that cannot be
> -   determined or be considered an interprocedural invariant.  */
> +/* Return the result of a (possibly arithmetic) operation on the constant
> +   value INPUT.  OPERAND is 2nd operand for binary operation.  RES_TYPE is
> +   the type of the parameter to which the result is passed.  Return
> +   NULL_TREE if that cannot be determined or be considered an
> +   interprocedural invariant.  */
>
>   static tree
> -ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
> -   tree res_type)
> +ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand,
> +tree res_type)
>   {
> tree res;
>
> -  if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
> +  if (opcode == NOP_EXPR)
>   return input;
> if (!is_gimple_ip_invariant (input))
>   return NULL_TREE;
>
> -  tree_code opcode = ipa_get_jf_pass_through_operation (jfunc);
> if (!res_type)
>   {
> if (TREE_CODE_CLASS (opcode) == tcc_comparison)
> @@ -1274,8 +1274,7 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input,
> if (TREE_CODE_CLASS (opcode) == tcc_unary)
>   res = fold_unary (opcode, res_type, input);
> else
> -res = fold_binary (opcode, res_type, input,
> -  ipa_get_jf_pass_through_operand (jfunc));
> +res = fold_binary (opcode, res_type, input, operand);
>
> if (res && !is_gimple_ip_invariant (res))
>   return NULL_TREE;
> @@ -1283,6 +1282,21 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input,
> return res;
>   }
>
> +/* Return the result of a (possibly arithmetic) pass through jump function
> +   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
> +   to which the result is passed.  Return NULL_TREE if that cannot be
> +   determined or be considered an interprocedural invariant.  */
> +
> +static tree
> +ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
> +   tree res_type)
> +{
> +  return ipa_get_jf_arith_result (ipa_get_jf_pass_through_operation (jfunc),
> + input,
> + ipa_get_jf_pass_through_operand (jfunc),
> + res_type);
> +}
> +
>   /* Return the result of an ancestor jump function JFUNC on the constant 
> value
>  INPUT.  Return NULL_TREE if that cannot be determined.  */
>
> @@ -1416,6 +1430,146 @@ ipa_context_from_jfunc (ipa_node_params *info, 
> cgraph_edge *cs, int csidx,
> return ctx;
>   }
>
> +/* See if NODE is a clone with a known aggregate value at a given OFFSET of a
> +   parameter with the given INDEX.  */
> +
> +static tree
> +get_clone_agg_value (struct cgraph_node *node, HOST_WIDE_INT offset,
> +int index)
> +{
> +  struct ipa_agg_replacement_value *aggval;
> +
> +  aggval = ipa_get_agg_replacements_for_node (node);
> +  while (aggval)
> +{
> +  if (aggval->offset == offset
> + && aggval->index == index)
> +   return aggval->value;
> +  aggval = aggval->next;
> +}
> +  return NULL_TREE;
> +}
> +
> +/* Determine whether ITEM, jump function for an aggregate part, evaluates to 
> a
> +   single known

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-23 Thread Christophe Lyon


On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a way 
that it will run the GCC testsuite after each of your patches, so I'll keep in 
mind not to report regressions (I've noticed several already).


I can perform a manual validation taking your 29 patches as a single one and 
compare the results with those of the revision preceding the one were you 
committed patch #1. Do you think it would be useful?


Christophe




I think if you can filter out any that are removed by later patches and then 
report against the patch that caused the regression itself then that would be 
the best.  But I realise that would be more work for you, so a round-up against 
the combined set would be OK.

BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also reported by 
the validation of r277179 (early split most DImode comparison operations)

When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)

I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator new), then 
_malloc_r
The qemu traces end with:
IN: _malloc_r^M
0x00019224:  e3a00ffe  mov  r0, #0x3f8^M
0x00019228:  e3a0c07f  mov  ip, #0x7f^M
0x0001922c:  e3a0e07e  mov  lr, #0x7e^M
0x00019230:  eafffe41  b#0x18b3c^M
^M
R00=00049418 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=0060 R13=fffeb460 R14=00018b14 R15=00019224^M
PSR=2010 --C- A usr32^M
^M
IN: _malloc_r^M
0x00018b3c:  e59f76f8  ldr  r7, [pc, #0x6f8]^M
0x00018b40:  e087  add  r0, r7, r0^M
0x00018b44:  e5903004  ldr  r3, [r0, #4]^M
0x00018b48:  e248  sub  r0, r0, #8^M
0x00018b4c:  e153  cmp  r0, r3^M
0x00018b50:  1a05  bne  #0x18b6c^M
^M
R00=03f8 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b3c^M
PSR=2010 --C- A usr32^M
R00=00049c30 R01= R02=0554 R03=00049c30^M
R04= R05=0808 R06=00049418 R07=00049840^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b54^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M
0x00019120:  e1a02a0b  lsl  r2, fp, #0x14^M
0x00019124:  e1a02a22  lsr  r2, r2, #0x14^M
0x00019128:  e352  cmp  r2, #0^M
0x0001912c:  1afffee7  bne  #0x18cd0^M
^M
R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001000 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00019120^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M
0x00019130:  e5974008  ldr  r4, [r7, #8]^M
0x00019134:  e0898008  add  r8, sb, r8^M
0x00019138:  e3888001  orr  r8, r8, #1^M
0x0001913c:  e5848004  str  r8, [r4, #4]^M
0x00019140:  ea14  b#0x18d98^M
^M
R00=0004b000 R01=08002108 R02= R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001000 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00019130^M
PSR=6010 -ZC- A usr32^M
R00=0004b000 R01=08002108 R02= R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001721 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00018d98^M
PSR=6010 -ZC- A usr32^M

Christophe

[patch] Fix PR tree-optimization/92131

2019-10-23 Thread Eric Botcazou

This is a regression present on mainline, 9 and 8 branches, but the underlying 
issue is probably latent on the 7 branch.  compare_values in tree-vrp.c can 
rely on undefined overflow to compare symbolic expressions with constants so 
we must make sure not to introduce such an overflow when combining ranges with 
symbolic expressions.

Tested on x86_64-suse-linux, OK for all active branches?


2019-10-23  Eric Botcazou  

PR tree-optimization/92131
* tree-vrp.c (extract_range_from_plus_minus_expr): If the resulting
range would be symbolic, drop to varying for any explicit overflow
in the constant part or if one of the ranges is not a singleton.


2019-10-23  Eric Botcazou  

* gcc.c-torture/execute/20191023-1.c: New test.

-- 
Eric BotcazouIndex: tree-vrp.c
===
--- tree-vrp.c	(revision 277149)
+++ tree-vrp.c	(working copy)
@@ -1652,7 +1652,7 @@ extract_range_from_plus_minus_expr (valu
   value_range_kind vr0_kind = vr0.kind (), vr1_kind = vr1.kind ();
   tree vr0_min = vr0.min (), vr0_max = vr0.max ();
   tree vr1_min = vr1.min (), vr1_max = vr1.max ();
-  tree min = NULL, max = NULL;
+  tree min = NULL_TREE, max = NULL_TREE;
 
   /* This will normalize things such that calculating
  [0,0] - VR_VARYING is not dropped to varying, but is
@@ -1715,18 +1715,19 @@ extract_range_from_plus_minus_expr (valu
   combine_bound (code, wmin, min_ovf, expr_type, min_op0, min_op1);
   combine_bound (code, wmax, max_ovf, expr_type, max_op0, max_op1);
 
-  /* If we have overflow for the constant part and the resulting
-	 range will be symbolic, drop to VR_VARYING.  */
-  if (((bool)min_ovf && sym_min_op0 != sym_min_op1)
-	  || ((bool)max_ovf && sym_max_op0 != sym_max_op1))
+  /* If the resulting range will be symbolic, we need to eliminate any
+	 explicit or implicit overflow introduced in the above computation
+	 because compare_values could make an incorrect use of it.  That's
+	 why we require one of the ranges to be a singleton.  */
+  if ((sym_min_op0 != sym_min_op1 || sym_max_op0 != sym_max_op1)
+	  && ((bool)min_ovf || (bool)max_ovf
+	  || (min_op0 != max_op0 && min_op1 != max_op1)))
 	{
 	  vr->set_varying (expr_type);
 	  return;
 	}
 
   /* Adjust the range for possible overflow.  */
-  min = NULL_TREE;
-  max = NULL_TREE;
   set_value_range_with_overflow (kind, min, max, expr_type,
  wmin, wmax, min_ovf, max_ovf);
   if (kind == VR_VARYING)
/* PR tree-optimization/92131 */
/* Testcase by Armin Rigo  */

long b, c, d, e, f, i;
char g, h, j, k;
int *aa;

static void error (void) __attribute__((noipa));
static void error (void) { __builtin_abort(); }

static void see_me_here (void) __attribute__((noipa));
static void see_me_here (void) {}

static void aaa (void) __attribute__((noipa));
static void aaa (void) {}

static void a (void) __attribute__((noipa));
static void a (void) {
  long am, ao;
  if (aa == 0) {
aaa();
if (j)
  goto ay;
  }
  return;
ay:
  aaa();
  if (k) {
aaa();
goto az;
  }
  return;
az:
  if (i)
if (g)
  if (h)
if (e)
  goto bd;
  return;
bd:
  am = 0;
  while (am < e) {
switch (c) {
case 8:
  goto bh;
case 4:
  return;
}
  bh:
if (am >= 0)
  b = -am;
ao = am + b;
f = ao & 7;
if (f == 0)
  see_me_here();
if (ao >= 0)
  am++;
else
  error();
  }
}

int main (void)
{
j++;
k++;
i++;
g++;
h++;
e = 1;
a();
return 0;
}

Re: [PATCH] Do not ICE in IPA inliner.

2019-10-23 Thread Richard Biener

On Tue, Oct 22, 2019 at 2:47 PM Martin Liška  wrote:
>
> Hi.
>
> We should not call to_gcov_type on a count that is uninitialized.
> That's the case for a THUNK cgraph_node that we inline.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-10-22  Martin Liska  
>
> PR ipa/91969
> * ipa-inline.c (recursive_inlining): Do not print
> when curr->count is not initialized.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-22  Martin Liska  
>
> PR ipa/91969
> * g++.dg/ipa/pr91969.C: New test.
> ---
>  gcc/ipa-inline.c   |  2 +-
>  gcc/testsuite/g++.dg/ipa/pr91969.C | 38 ++
>  2 files changed, 39 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr91969.C
>
>
>

Re: [patch] Fix PR tree-optimization/92131

2019-10-23 Thread Richard Biener

On Wed, Oct 23, 2019 at 10:33 AM Eric Botcazou  wrote:
>
> This is a regression present on mainline, 9 and 8 branches, but the underlying
> issue is probably latent on the 7 branch.  compare_values in tree-vrp.c can
> rely on undefined overflow to compare symbolic expressions with constants so
> we must make sure not to introduce such an overflow when combining ranges with
> symbolic expressions.
>
> Tested on x86_64-suse-linux, OK for all active branches?

OK.

Thanks,
Richard.

>
> 2019-10-23  Eric Botcazou  
>
> PR tree-optimization/92131
> * tree-vrp.c (extract_range_from_plus_minus_expr): If the resulting
> range would be symbolic, drop to varying for any explicit overflow
> in the constant part or if one of the ranges is not a singleton.
>
>
> 2019-10-23  Eric Botcazou  
>
> * gcc.c-torture/execute/20191023-1.c: New test.
>
> --
> Eric Botcazou

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-23 Thread Richard Biener

On Wed, Oct 23, 2019 at 5:36 AM Feng Xue OS  wrote:
>
> Michael,
>
> > I've only noticed a couple typos, and one minor remark.
> Typos corrected.
>
> > I just wonder why you duplicated these three loops instead of integrating
> > the real body into the existing LI_FROM_INNERMOST loop.  I would have
> > expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block
> > to simply be the else block of the existing
> > "if (... conditions for normal loop splitting ...)" block.
> Adjusted to do two kinds of loop-split in same LI_FROM_INNERMOST loop.
>
> > From my perspective it's okay, but you still need the okay of a proper 
> > reviewer,
> > for which you might want to state the testing/regression state of this
> > patch relative to trunk.
>
> Richard,
>
>   Is it ok to commit this patch? Bootstrap and regression test passed. And for
> performance, we can get about 7% improvement on spec2017 omnetpp with this
> patch.

Can you please provide the corresponding ChangeLog entries as well and
attach the patch?  It seems to be garbled by some encoding.

Thanks,
Richard.

> Thanks,
> Feng
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1407d019d14..d41e5aa0215 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-loop-cond-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the 
> conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-loop-cond-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-loop-cond-split-prob
> +When FDO profile information is available, @option{min-loop-cond-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 322c37f8b96..73b59f7465e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> "The maximum number of unswitchings in a single loop.",
> 3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
> +   "max-loop-cond-split-insns",
> +   "The maximum number of insns to be added due to loop split on "
> +   "semi-invariant condition statement.",
> +   100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
> +   "min-loop-cond-split-prob",
> +   "The minimum threshold for probability of semi-invariant condition "
> +   "statement to trigger loop split.",
> +   30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
> headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include 
> +#include 
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map::iterator iter = m.begin (); iter != m.end (); 
> ++iter)
> +{
> +  if (ga->empty)
> +ga->set (iter->second);
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +{
> +  if (b)
> +b = do_something();
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/tree-ssa-loop-split.

[PATCH] Fix PR92179

2019-10-23 Thread Richard Biener



More vectorizable_shift fallout - I've removed another disparity
of SLP vs. non-SLP plus allowed type but not mode changing cases
for SLP to go through (which originally caused the reported ICEs).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-10-23  Richard Biener  

PR tree-optimization/92179
* tree-vect-stmts.c (vectorizable_shift): For shift args
that are all the same remove type restriction in the SLP case.
Adjust SLP code to handle converting of the shift arg to
only apply in case the modes are different.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 277308)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -5661,6 +5661,7 @@ vectorizable_shift (stmt_vec_info stmt_i
 }
 
   /* Vector shifted by vector.  */
+  bool was_scalar_shift_arg = scalar_shift_arg;
   if (!scalar_shift_arg)
 {
   optab = optab_for_tree_code (code, vectype, optab_vector);
@@ -5720,16 +5721,6 @@ vectorizable_shift (stmt_vec_info stmt_i
  else if (!useless_type_conversion_p (TREE_TYPE (vectype),
   TREE_TYPE (op1)))
{
- if (slp_node
- && TYPE_MODE (TREE_TYPE (vectype))
-!= TYPE_MODE (TREE_TYPE (op1)))
-   {
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
vect_location,
- "unusable type for last operand in"
- " vector/vector shift/rotate.\n");
- return false;
-   }
  if (vec_stmt && !slp_node)
{
  op1 = fold_convert (TREE_TYPE (vectype), op1);
@@ -5831,18 +5822,36 @@ vectorizable_shift (stmt_vec_info stmt_i
   && !useless_type_conversion_p (TREE_TYPE (vectype),
  TREE_TYPE (op1)))
{
- /* Convert the scalar constant shift amounts in-place.  */
- slp_tree shift = SLP_TREE_CHILDREN (slp_node)[1];
- gcc_assert (SLP_TREE_DEF_TYPE (shift) == vect_constant_def);
- for (unsigned i = 0;
-  i < SLP_TREE_SCALAR_OPS (shift).length (); ++i)
+ if (was_scalar_shift_arg)
+   {
+ /* If the argument was the same in all lanes create
+the correctly typed vector shift amount directly.  */
+ op1 = fold_convert (TREE_TYPE (vectype), op1);
+ op1 = vect_init_vector (stmt_info, op1, TREE_TYPE (vectype),
+ !loop_vinfo ? gsi : NULL);
+ vec_oprnd1 = vect_init_vector (stmt_info, op1, vectype,
+!loop_vinfo ? gsi : NULL);
+  vec_oprnds1.create (slp_node->vec_stmts_size);
+ for (k = 0; k < slp_node->vec_stmts_size; k++)
+   vec_oprnds1.quick_push (vec_oprnd1);
+   }
+ else if (dt[1] == vect_constant_def)
{
- SLP_TREE_SCALAR_OPS (shift)[i]
-   = fold_convert (TREE_TYPE (vectype),
-   SLP_TREE_SCALAR_OPS (shift)[i]);
- gcc_assert ((TREE_CODE (SLP_TREE_SCALAR_OPS (shift)[i])
-  == INTEGER_CST));
+ /* Convert the scalar constant shift amounts in-place.  */
+ slp_tree shift = SLP_TREE_CHILDREN (slp_node)[1];
+ gcc_assert (SLP_TREE_DEF_TYPE (shift) == vect_constant_def);
+ for (unsigned i = 0;
+  i < SLP_TREE_SCALAR_OPS (shift).length (); ++i)
+   {
+ SLP_TREE_SCALAR_OPS (shift)[i]
+ = fold_convert (TREE_TYPE (vectype),
+ SLP_TREE_SCALAR_OPS (shift)[i]);
+ gcc_assert ((TREE_CODE (SLP_TREE_SCALAR_OPS (shift)[i])
+  == INTEGER_CST));
+   }
}
+ else
+   gcc_assert (TYPE_MODE (op1_vectype) == TYPE_MODE (vectype));
}
 
   /* vec_oprnd1 is available if operand 1 should be of a scalar-type

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-23 Thread Feng Xue OS

Patch attached.

Feng


From: Richard Biener 
Sent: Wednesday, October 23, 2019 5:04 PM
To: Feng Xue OS
Cc: Michael Matz; Philipp Tomsich; gcc-patches@gcc.gnu.org; Christoph Müllner; 
erick.oc...@theobroma-systems.com
Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR 
tree-optimization/89134)

On Wed, Oct 23, 2019 at 5:36 AM Feng Xue OS  wrote:
>
> Michael,
>
> > I've only noticed a couple typos, and one minor remark.
> Typos corrected.
>
> > I just wonder why you duplicated these three loops instead of integrating
> > the real body into the existing LI_FROM_INNERMOST loop.  I would have
> > expected your "if (!optimize_loop_for_size_p && split_loop_on_cond)" block
> > to simply be the else block of the existing
> > "if (... conditions for normal loop splitting ...)" block.
> Adjusted to do two kinds of loop-split in same LI_FROM_INNERMOST loop.
>
> > From my perspective it's okay, but you still need the okay of a proper 
> > reviewer,
> > for which you might want to state the testing/regression state of this
> > patch relative to trunk.
>
> Richard,
>
>   Is it ok to commit this patch? Bootstrap and regression test passed. And for
> performance, we can get about 7% improvement on spec2017 omnetpp with this
> patch.

Can you please provide the corresponding ChangeLog entries as well and
attach the patch?  It seems to be garbled by some encoding.

Thanks,
Richard.

> Thanks,
> Feng
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1407d019d14..d41e5aa0215 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11481,6 +11481,19 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-loop-cond-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the 
> conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-loop-cond-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-loop-cond-split-prob
> +When FDO profile information is available, @option{min-loop-cond-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
>  @item iv-consider-all-candidates-bound
>  Bound on number of candidates for induction variables, below which
>  all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 322c37f8b96..73b59f7465e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -415,6 +415,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> "The maximum number of unswitchings in a single loop.",
> 3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_LOOP_COND_SPLIT_INSNS,
> +   "max-loop-cond-split-insns",
> +   "The maximum number of insns to be added due to loop split on "
> +   "semi-invariant condition statement.",
> +   100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_LOOP_COND_SPLIT_PROB,
> +   "min-loop-cond-split-prob",
> +   "The minimum threshold for probability of semi-invariant condition "
> +   "statement to trigger loop split.",
> +   30, 0, 100)
> +
>  /* The maximum number of insns in loop header duplicated by the copy loop
> headers pass.  */
>  DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include 
> +#include 
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map::iterator iter = m.begin (); iter != m.end (); 
> ++iter)
> +{
> +  if (ga->empty)
> +ga->set (iter->second);
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  re

[PATCH 1/3][rs6000] Replace vsx_xvcdpsp by vsx_xvcvdpsp

2019-10-23 Thread Kewen.Lin

Hi,

I noticed that vsx_xvcdpsp and vsx_xvcvdpsp are almost the same,
and vsx_xvcdpsp looks replaceable with vsx_xvcvdpsp since it's only
called by gen_*.

Bootstrapped and regress tested on powerpc64le-linux-gnu.


gcc/ChangeLog

2019-10-23  Kewen Lin  

* config/rs6000/vsx.md (vsx_xvcdpsp): Remove define_insn.
(UNSPEC_VSX_XVCDPSP): Remove.
* config/rs6000/rs6000.c (rs6000_generate_float2_double_code):
Replace gen_vsx_xvcdpsp by gen_vsx_xvcvdpsp.

>From 8c6309c131b7614ed8d6aeb4ca2d3d89ab0b8d38 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Tue, 8 Oct 2019 01:51:06 -0500
Subject: [PATCH 1/3] Replace vsx_xvcdpsp by vsx_xvcvdpsp

---
 gcc/config/rs6000/rs6000.c | 4 ++--
 gcc/config/rs6000/vsx.md   | 9 -
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c2834bd..23898b1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25549,8 +25549,8 @@ rs6000_generate_float2_double_code (rtx dst, rtx src1, 
rtx src2)
   rtx_tmp2 = gen_reg_rtx (V4SFmode);
   rtx_tmp3 = gen_reg_rtx (V4SFmode);
 
-  emit_insn (gen_vsx_xvcdpsp (rtx_tmp2, rtx_tmp0));
-  emit_insn (gen_vsx_xvcdpsp (rtx_tmp3, rtx_tmp1));
+  emit_insn (gen_vsx_xvcvdpsp (rtx_tmp2, rtx_tmp0));
+  emit_insn (gen_vsx_xvcvdpsp (rtx_tmp3, rtx_tmp1));
 
   if (BYTES_BIG_ENDIAN)
 emit_insn (gen_p8_vmrgew_v4sf (dst, rtx_tmp2, rtx_tmp3));
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f54d343..d6f079c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -301,7 +301,6 @@
UNSPEC_VSX_XVCVSXDDP
UNSPEC_VSX_XVCVUXDDP
UNSPEC_VSX_XVCVDPSXDS
-   UNSPEC_VSX_XVCDPSP
UNSPEC_VSX_XVCVDPUXDS
UNSPEC_VSX_SIGN_EXTEND
UNSPEC_VSX_XVCVSPSXWS
@@ -2367,14 +2366,6 @@
   "xvcvuxdsp %x0,%x1"
   [(set_attr "type" "vecdouble")])
 
-(define_insn "vsx_xvcdpsp"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
-   (unspec:V4SF [(match_operand:V2DF 1 "vsx_register_operand" "wa")]
-UNSPEC_VSX_XVCDPSP))]
-  "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "xvcvdpsp %x0,%x1"
-  [(set_attr "type" "vecdouble")])
-
 ;; Convert from 32-bit to 64-bit types
 ;; Provide both vector and scalar targets
 (define_insn "vsx_xvcvsxwdp"
-- 
2.7.4

[PATCH 2/3][rs6000] vector conversion RTL pattern update for same unit size

2019-10-23 Thread Kewen.Lin

Hi,

For those fixed point <-> floating point vector conversion with
same element unit size, such as: SP <-> SI, DP <-> DI, it's fine
to use the existing RTL operations like any_fix/any_float for them.

This patch is to update them with any_fix/any_float.

Bootstrapped and regress tested on powerpc64le-linux-gnu.


gcc/ChangeLog

2019-10-23  Kewen Lin  

* config/rs6000/vsx.md (UNSPEC_VSX_CV[SU]XWSP,
UNSPEC_VSX_XVCV[SU]XDDP, UNSPEC_VSX_XVCVDP[SU]XDS,
UNSPEC_VSX_XVCVSPSXWS): Remove.
(vsx_xvcv[su]xddp, vsx_xvcvdp[su]xds, vsx_xvcvsp[su]xws,
vsx_xvcv[su]xwsp): Update define_insn RTL patterns.
>From 39ae875d4ae6ce22e170aeb456ef307a1f5fd1e0 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Wed, 23 Oct 2019 02:56:48 -0500
Subject: [PATCH 2/3] Update RTL pattern on vector SP<->[SU]W DP<->[SU]D
 conversion

---
 gcc/config/rs6000/vsx.md | 105 +--
 1 file changed, 28 insertions(+), 77 deletions(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index d6f079c..83e4071 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -277,8 +277,6 @@
UNSPEC_VSX_CVUXDSP
UNSPEC_VSX_CVSPSXDS
UNSPEC_VSX_CVSPUXDS
-   UNSPEC_VSX_CVSXWSP
-   UNSPEC_VSX_CVUXWSP
UNSPEC_VSX_FLOAT2
UNSPEC_VSX_UNS_FLOAT2
UNSPEC_VSX_FLOATE
@@ -298,12 +296,7 @@
UNSPEC_VSX_DIVSD
UNSPEC_VSX_DIVUD
UNSPEC_VSX_MULSD
-   UNSPEC_VSX_XVCVSXDDP
-   UNSPEC_VSX_XVCVUXDDP
-   UNSPEC_VSX_XVCVDPSXDS
-   UNSPEC_VSX_XVCVDPUXDS
UNSPEC_VSX_SIGN_EXTEND
-   UNSPEC_VSX_XVCVSPSXWS
UNSPEC_VSX_XVCVSPSXDS
UNSPEC_VSX_VSLO
UNSPEC_VSX_EXTRACT
@@ -2202,6 +2195,34 @@
 
 ;; Convert and scale (used by vec_ctf, vec_cts, vec_ctu for double/long long)
 
+(define_insn "vsx_xvcvxwsp"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+ (any_float:V4SF (match_operand:V4SI 1 "vsx_register_operand" "wa")))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xvcvxwsp %x0,%x1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "vsx_xvcvxddp"
+  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
+(any_float:V2DF (match_operand:V2DI 1 "vsx_register_operand" "wa")))]
+  "VECTOR_UNIT_VSX_P (V2DFmode)"
+  "xvcvxddp %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
+(define_insn "vsx_xvcvspxws"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
+(any_fix:V4SI (match_operand:V4SF 1 "vsx_register_operand" "wa")))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xvcvspxws %x0,%x1"
+  [(set_attr "type" "vecfloat")])
+
+(define_insn "vsx_xvcvdpxds"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
+(any_fix:V2DI (match_operand:V2DF 1 "vsx_register_operand" "wa")))]
+  "VECTOR_UNIT_VSX_P (V2DFmode)"
+  "xvcvdpxds %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
 (define_expand "vsx_xvcvsxddp_scale"
   [(match_operand:V2DF 0 "vsx_register_operand")
(match_operand:V2DI 1 "vsx_register_operand")
@@ -2217,14 +2238,6 @@
   DONE;
 })
 
-(define_insn "vsx_xvcvsxddp"
-  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
-(unspec:V2DF [(match_operand:V2DI 1 "vsx_register_operand" "wa")]
- UNSPEC_VSX_XVCVSXDDP))]
-  "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "xvcvsxddp %x0,%x1"
-  [(set_attr "type" "vecdouble")])
-
 (define_expand "vsx_xvcvuxddp_scale"
   [(match_operand:V2DF 0 "vsx_register_operand")
(match_operand:V2DI 1 "vsx_register_operand")
@@ -2240,14 +2253,6 @@
   DONE;
 })
 
-(define_insn "vsx_xvcvuxddp"
-  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
-(unspec:V2DF [(match_operand:V2DI 1 "vsx_register_operand" "wa")]
- UNSPEC_VSX_XVCVUXDDP))]
-  "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "xvcvuxddp %x0,%x1"
-  [(set_attr "type" "vecdouble")])
-
 (define_expand "vsx_xvcvdpsxds_scale"
   [(match_operand:V2DI 0 "vsx_register_operand")
(match_operand:V2DF 1 "vsx_register_operand")
@@ -2270,26 +2275,6 @@
 })
 
 ;; convert vector of 64-bit floating point numbers to vector of
-;; 64-bit signed integer
-(define_insn "vsx_xvcvdpsxds"
-  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
-(unspec:V2DI [(match_operand:V2DF 1 "vsx_register_operand" "wa")]
- UNSPEC_VSX_XVCVDPSXDS))]
-  "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "xvcvdpsxds %x0,%x1"
-  [(set_attr "type" "vecdouble")])
-
-;; convert vector of 32-bit floating point numbers to vector of
-;; 32-bit signed integer
-(define_insn "vsx_xvcvspsxws"
-  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
-   (unspec:V4SI [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
-UNSPEC_VSX_XVCVSPSXWS))]
-  "VECTOR_UNIT_VSX_P (V4SFmode)"
-  "xvcvspsxws %x0,%x1"
-  [(set_attr "type" "vecfloat")])
-
-;; convert vector of 64-bit floating point numbers to vector of
 ;; 64-bit unsigned integer
 (define_expand "vsx_xvcvdpuxds_scale"
   [(match_operand:V2DI 0 "vsx_register_operand")
@@ -2312,24 +2297,6 @@
   DONE;
 })
 
-;; convert vector of 32-bit floating

[PATCH 3/3][rs6000] vector conversion RTL pattern update for diff unit size

2019-10-23 Thread Kewen.Lin

Hi,

Following the previous one 2/3, this patch is to update the
vector conversions between fixed point and floating point
with different element unit sizes, such as: SP <-> DI, DP <-> SI.

Bootstrap and regression testing just launched.


gcc/ChangeLog

2019-10-23  Kewen Lin  

* config/rs6000/rs6000-modes.def (V2SF, V2SI): New modes.
* config/rs6000/vsx.md (UNSPEC_VSX_CVDPSXWS, UNSPEC_VSX_CVSXDSP, 
UNSPEC_VSX_CVUXDSP, UNSPEC_VSX_CVSPSXDS, UNSPEC_VSX_CVSPUXDS): Remove.
(vsx_xvcvspdp): New define_expand, old one split to...
(vsx_xvcvspdp_be): ... this.  New.  And...
(vsx_xvcvspdp_le): ... this.  New.
(vsx_xvcvdpsp): New define_expand, old one split to...
(vsx_xvcvdpsp_be): ... this.  New.  And...
(vsx_xvcvdpsp_le): ... this.  New.
(vsx_xvcvdp[su]xws): New define_expand, old one split to...
(vsx_xvcvdpxws_be): ... this.  New.  And...
(vsx_xvcvdpxws_le): ... this.  New.
(vsx_xvcv[su]xdsp): New define_expand, old one split to...
(vsx_xvcvxdsp_be): ... this.  New.  And...
(vsx_xvcvxdsp_le): ... this.  New.
(vsx_xvcv[su]xwdp): New define_expand, old one split to...
(vsx_xvcvxwdp_be): ... this.  New.  And...
(vsx_xvcvxwdp_le): ... this.  New.
(vsx_xvcvsp[su]xds): New define_expand, old one split to...
(vsx_xvcvspxds_be): ... this.  New.  And...
(vsx_xvcvspxds_le): ... this.  New.
>From 5315810c391b75661de9027ea2848d31390e1d8b Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Wed, 23 Oct 2019 04:02:00 -0500
Subject: [PATCH 3/3] Update RTL pattern on vector fp/int 32bit <-> 64bit
 conversion

---
 gcc/config/rs6000/rs6000-modes.def |   4 +
 gcc/config/rs6000/vsx.md   | 240 +++--
 2 files changed, 181 insertions(+), 63 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-modes.def 
b/gcc/config/rs6000/rs6000-modes.def
index 677062c..449e176 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -74,6 +74,10 @@ VECTOR_MODES (FLOAT, 16); /*   V8HF  V4SF V2DF */
 VECTOR_MODES (INT, 32);   /* V32QI V16HI V8SI V4DI */
 VECTOR_MODES (FLOAT, 32); /*   V16HF V8SF V4DF */
 
+/* Half VMX/VSX vector (for select)  */
+VECTOR_MODE (FLOAT, SF, 2);   /* V2SF  */
+VECTOR_MODE (INT, SI, 2); /* V2SI  */
+
 /* Replacement for TImode that only is allowed in GPRs.  We also use PTImode
for quad memory atomic operations to force getting an even/odd register
combination.  */
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 83e4071..44025f6 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -265,7 +265,6 @@
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
-   UNSPEC_VSX_CVDPSXWS
UNSPEC_VSX_CVDPUXWS
UNSPEC_VSX_CVSPDP
UNSPEC_VSX_CVHPSP
@@ -273,10 +272,6 @@
UNSPEC_VSX_CVDPSPN
UNSPEC_VSX_CVSXWDP
UNSPEC_VSX_CVUXWDP
-   UNSPEC_VSX_CVSXDSP
-   UNSPEC_VSX_CVUXDSP
-   UNSPEC_VSX_CVSPSXDS
-   UNSPEC_VSX_CVSPUXDS
UNSPEC_VSX_FLOAT2
UNSPEC_VSX_UNS_FLOAT2
UNSPEC_VSX_FLOATE
@@ -2106,22 +2101,69 @@
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-(define_insn "vsx_xvcvspdp"
+(define_insn "vsx_xvcvspdp_be"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=v,?wa")
-   (unspec:V2DF [(match_operand:V4SF 1 "vsx_register_operand" "wa,wa")]
- UNSPEC_VSX_CVSPDP))]
-  "VECTOR_UNIT_VSX_P (V4SFmode)"
+ (float_extend:V2DF
+   (vec_select:V2SF (match_operand:V4SF 1 "vsx_register_operand" "wa,wa")
+(parallel [(const_int 0) (const_int 2)]]
+  "VECTOR_UNIT_VSX_P (V4SFmode) && BYTES_BIG_ENDIAN"
+  "xvcvspdp %x0,%x1"
+  [(set_attr "type" "vecdouble")])
+
+(define_insn "vsx_xvcvspdp_le"
+  [(set (match_operand:V2DF 0 "vsx_register_operand" "=v,?wa")
+ (float_extend:V2DF
+   (vec_select:V2SF (match_operand:V4SF 1 "vsx_register_operand" "wa,wa")
+(parallel [(const_int 1) (const_int 3)]]
+  "VECTOR_UNIT_VSX_P (V4SFmode) && !BYTES_BIG_ENDIAN"
   "xvcvspdp %x0,%x1"
   [(set_attr "type" "vecdouble")])
 
-(define_insn "vsx_xvcvdpsp"
+(define_expand "vsx_xvcvspdp"
+  [(match_operand:V2DF 0 "vsx_register_operand")
+   (match_operand:V4SF 1 "vsx_register_operand")]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+{
+  if (BYTES_BIG_ENDIAN)
+emit_insn (gen_vsx_xvcvspdp_be (operands[0], operands[1]));
+  else
+emit_insn (gen_vsx_xvcvspdp_le (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "vsx_xvcvdpsp_be"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,?wa")
-   (unspec:V4SF [(match_operand:V2DF 1 "vsx_register_operand" "v,wa")]
- UNSPEC_VSX_CVSPDP))]
-  "VECTOR_UNIT_VSX_P (V2DFmode)"
+ (float_truncate:V4SF
+   (vec_concat:V4DF (match_operand:V2DF 1 "vsx_register_operand" "v,wa")
+(vec_select:V2DF (match_dup 1)
+  (parallel

Re: [PATCH] Improve debug info in ivopts optimized loops (PR debug/90231)

2019-10-23 Thread Bin.Cheng

On Tue, Oct 22, 2019 at 3:32 PM Jakub Jelinek  wrote:
>
> On Mon, Oct 21, 2019 at 01:24:30PM +0200, Jakub Jelinek wrote:
> > So I wonder if for correctness I don't need to add:
> >
> >   if (!use->iv->no_overflow
> >   && !cand->iv->no_overflow
> >   && !integer_pow2p (cstep))
> > return NULL_TREE;
> >
> > with some of the above as comment explaining why.
> >
> > On the other side, if cand->iv->no_overflow, couldn't we bypass the extra
> > precision test?
>
> Here are these two in patch form.
>
> 2019-10-22  Jakub Jelinek  
>
> PR debug/90231
> * tree-ssa-loop-ivopts.c (get_debug_computation_at): New function.
> (remove_unused_ivs): Use it instead of get_computation_at.  When
> choosing best candidate, only consider candidates where
> get_debug_computation_at actually returns non-NULL.
>
> --- gcc/tree-ssa-loop-ivopts.c.jj   2019-10-21 14:17:57.598198162 +0200
> +++ gcc/tree-ssa-loop-ivopts.c  2019-10-22 09:30:09.782238157 +0200
> @@ -4089,6 +4089,94 @@ get_computation_at (class loop *loop, gi
>return fold_convert (type, aff_combination_to_tree (&aff));
>  }
>
> +/* Like get_computation_at, but try harder, even if the computation
> +   is more expensive.  Intended for debug stmts.  */
> +
> +static tree
> +get_debug_computation_at (class loop *loop, gimple *at,
> + struct iv_use *use, struct iv_cand *cand)
> +{
> +  if (tree ret = get_computation_at (loop, at, use, cand))
> +return ret;
> +
> +  tree ubase = use->iv->base, ustep = use->iv->step;
> +  tree cbase = cand->iv->base, cstep = cand->iv->step;
> +  tree var;
> +  tree utype = TREE_TYPE (ubase), ctype = TREE_TYPE (cbase);
> +  widest_int rat;
> +
> +  /* We must have a precision to express the values of use.  */
> +  if (TYPE_PRECISION (utype) >= TYPE_PRECISION (ctype))
> +return NULL_TREE;
> +
> +  /* Try to handle the case that get_computation_at doesn't,
> + try to express
> + use = ubase + (var - cbase) / ratio.  */
> +  if (!constant_multiple_of (cstep, fold_convert (TREE_TYPE (cstep), ustep),
> +&rat))
> +return NULL_TREE;
> +
> +  bool neg_p = false;
> +  if (wi::neg_p (rat))
> +{
> +  if (TYPE_UNSIGNED (ctype))
> +   return NULL_TREE;
> +  neg_p = true;
> +  rat = wi::neg (rat);
> +}
> +
> +  /* If both IVs can wrap around and CAND doesn't have a power of two step,
> + it is unsafe.  Consider uint16_t CAND with step 9, when wrapping around,
> + the values will be ... 0xfff0, 0xfff9, 2, 11 ... and when use is say
> + uint8_t with step 3, those values divided by 3 cast to uint8_t will be
> + ... 0x50, 0x53, 0, 3 ... rather than expected 0x50, 0x53, 0x56, 0x59.  
> */
Interesting, so we can still get correct debug info for iter in such
special cases.

> +  if (!use->iv->no_overflow
> +  && !cand->iv->no_overflow
> +  && !integer_pow2p (cstep))
> +return NULL_TREE;
> +
> +  int bits = wi::exact_log2 (rat);
> +  if (bits == -1)
> +bits = wi::floor_log2 (rat) + 1;
> +  if (!cand->iv->no_overflow
> +  && TYPE_PRECISION (utype) + bits > TYPE_PRECISION (ctype))
> +return NULL_TREE;
The patch is fine for me.

Just for the record, guess we may try to find (by recording
information in early phase) the correct/bijection candidate in
computing the iv in debuginfo in the future, then those checks would
be unnecessary.

Thanks,
bin
> +
> +  var = var_at_stmt (loop, cand, at);
> +
> +  if (POINTER_TYPE_P (ctype))
> +{
> +  ctype = unsigned_type_for (ctype);
> +  cbase = fold_convert (ctype, cbase);
> +  cstep = fold_convert (ctype, cstep);
> +  var = fold_convert (ctype, var);
> +}
> +
> +  ubase = unshare_expr (ubase);
> +  cbase = unshare_expr (cbase);
> +  if (stmt_after_increment (loop, cand, at))
> +var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var,
> +  unshare_expr (cstep));
> +
> +  var = fold_build2 (MINUS_EXPR, TREE_TYPE (var), var, cbase);
> +  var = fold_build2 (EXACT_DIV_EXPR, TREE_TYPE (var), var,
> +wide_int_to_tree (TREE_TYPE (var), rat));
> +  if (POINTER_TYPE_P (utype))
> +{
> +  var = fold_convert (sizetype, var);
> +  if (neg_p)
> +   var = fold_build1 (NEGATE_EXPR, sizetype, var);
> +  var = fold_build2 (POINTER_PLUS_EXPR, utype, ubase, var);
> +}
> +  else
> +{
> +  var = fold_convert (utype, var);
> +  var = fold_build2 (neg_p ? MINUS_EXPR : PLUS_EXPR, utype,
> +ubase, var);
> +}
> +  return var;
> +}
> +
>  /* Adjust the cost COST for being in loop setup rather than loop body.
> If we're optimizing for space, the loop setup overhead is constant;
> if we're optimizing for speed, amortize it over the per-iteration cost.
> @@ -7523,6 +7611,7 @@ remove_unused_ivs (struct ivopts_data *d
>   struct iv_use dummy_use;
>   struct iv_cand *best_cand = NULL, *cand;
>   unsigned

Re: PING*2 : Fwd: [PATCH][gcov-profile/91971]Profile directory concatenated with object file path

2019-10-23 Thread Martin Liška

On 10/21/19 5:32 PM, Qing Zhao wrote:
> Please let me know whether this patch is reasonable or not.

The patch is fine. Please add PR entry to the ChangeLog and
install the patch.

Thanks,
Martin

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-23 Thread Richard Biener

On Wed, Oct 23, 2019 at 11:11 AM Feng Xue OS
 wrote:
>
> Patch attached.

+  /* For PHI node that is not in loop header, its source operands should
+be defined inside the loop, which are seen as loop variant.  */
+  if (def_bb != loop->header || !skip_head)
+   return false;

so if we have

 for (;;)
  {
 if (x)
   a = ..;
 else
   a = ...;
 if (cond-to-split-on dependent on a)
...
  }

the above is too restrictive in case 'x' is semi-invariant as well, correct?

+ /* A new value comes from outside of loop.  */
+ if (!bb || !flow_bb_inside_loop_p (loop, bb))
+   return false;

but that means starting from the second iteration the value is invariant.

+ /* Don't consider redefinitions in excluded basic blocks.  */
+ if (!dominated_by_p (CDI_DOMINATORS, e->src, skip_head))
+   {
+ /* There are more than one source operands that can
+provide value to the SSA name, it is variant.  */
+ if (from)
+   return false;

they might be the same though, for PHIs with > 2 arguments.

In the cycle handling you are not recursing via stmt_semi_invariant_p
but only handle SSA name copies - any particular reason for that?

+static bool
+branch_removable_p (basic_block branch_bb)
+{
+  if (single_pred_p (branch_bb))
+return true;

I'm not sure what this function tests - at least the single_pred_p check
looks odd to me given the dominator checks later.  The single predecessor
could simply be a forwarder.  I wonder if you are looking for branches forming
an irreducible loop?  I think you can then check EDGE_IRREDUCIBLE_LOOP
or BB_IRREDUCIBLE_LOOP on the condition block (btw, I don't see
testcases covering the appearant special-cases in the patch - refering to
existing ones via a comment often helps understanding the code).

+
+  return EDGE_SUCC (cond_bb, (unsigned) invar[1]);
+}

magic ensures that invar[1] is always the invariant edge?  Oh, it's a bool.
Ick.  I wonder if logic with int invariant_edge = -1; and the loop setting
it to either 0 or 1 would be easier to follow...

Note your stmt_semi_invariant_p check is exponential for a condition
like

   _1 = 1;
   _2 = _1 + _1;
   _3 = _2 + _2;
   if (_3 != param_4(D))

because you don't track ops you already proved semi-invariant.  We've
run into such situation repeatedly in SCEV analysis so I doubt it can be
disregarded as irrelevant in practice.  A worklist approach could then
also get rid of the recursion.  You are already computing the stmts
forming the condition in compute_added_num_insns so another option
is to re-use that.

Btw, I wonder if we can simply re-use PARAM_MAX_PEELED_INSNS
instead of adding yet another param (it also happens to have the same
size).  Because we are "peeling" the loop.

+  edge invar_branch = get_cond_invariant_branch (loop, cond);
+
+  if (!invar_branch)
+return NULL;

extra vertical space is unwanted in such cases.

+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "In %s(), split loop %d at branch<%s>, BB %d\n",
+ current_function_name (), loop1->num,
+ true_invar ? "T" : "F", cond_bb->index);
+ print_gimple_stmt (dump_file, cond, 0, TDF_SLIM | TDF_VOPS);
+   }

can you please use sth like

  if (dump_enabled_p ())
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
 cond, "loop split on semi-invariant condition");

so -fopt-info-loop will show it?

+  /* Generate a bool type temporary to hold result of the condition.  */
+  tree tmp = make_ssa_name (boolean_type_node);
+  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
+  gimple *stmt = gimple_build_assign (tmp,
+ gimple_cond_code (cond),
+ gimple_cond_lhs (cond),
+ gimple_cond_rhs (cond));

shorter is

   gimple_seq stmts = NULL;
   tree tmp = gimple_build (&stmts, gimple_cond_code (cond),
  boolean_type_node,
gimple_cond_lhs (cond), gimple_cond_rhs (cond));
   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);

+  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+  gimple_cond_set_condition (cond, EQ_EXPR, tmp, boolean_true_node);

but I wonder what's the point here to move the condition computation to
a temporary?  Why not just build the original condition again for break_cond?

in split_loop_on_cond you'll find the first semi-invariant condition
to split on,
but we'll not visit the split loop again (also for original splitting I guess).
Don't we eventually want to recurse on that?

Otherwise the patch looks reasonable.  Sorry for the many bits above and the
late real review from me...

Thanks,
Richard.

> Feng
>
> 
> From: Richard Biener 
> Sent: Wednesday, October 23, 2019 5:04 PM
> To: Feng Xue OS
> Cc: Michael Matz; Philipp

[committed][AArch64] Don't apply mode_for_int_vector to scalars

2019-10-23 Thread Richard Sandiford

aarch64_emit_approx_sqrt handles both vectors and scalars and was using
mode_for_int_vector even for the scalar case.  Although that happened
to work, it isn't how mode_for_int_vector is supposed to be used.

Tested on aarch64-linux-gnu and applied as r277311.

Richard


2019-10-23  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Use
int_mode_for_mode rather than mode_for_int_vector for scalars.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-10-22 08:46:57.363355908 +0100
+++ gcc/config/aarch64/aarch64.c2019-10-23 11:30:08.169740215 +0100
@@ -11828,7 +11828,9 @@ aarch64_emit_approx_sqrt (rtx dst, rtx s
 /* Caller assumes we cannot fail.  */
 gcc_assert (use_rsqrt_p (mode));
 
-  machine_mode mmsk = mode_for_int_vector (mode).require ();
+  machine_mode mmsk = (VECTOR_MODE_P (mode)
+  ? mode_for_int_vector (mode).require ()
+  : int_mode_for_mode (mode).require ());
   rtx xmsk = gen_reg_rtx (mmsk);
   if (!recp)
 /* When calculating the approximate square root, compare the

RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread Richard Sandiford

This patch is the first of a series that tries to remove two
assumptions:

(1) that all vectors involved in vectorisation must be the same size

(2) that there is only one vector mode for a given element mode and
number of elements

Relaxing (1) helps with targets that support multiple vector sizes or
that require the number of elements to stay the same.  E.g. if we're
vectorising code that operates on narrow and wide elements, and the
narrow elements use 64-bit vectors, then on AArch64 it would normally
be better to use 128-bit vectors rather than pairs of 64-bit vectors
for the wide elements.

Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
fixed-length code for SVE.  It also allows unpacked/half-size SVE
vectors to work with -msve-vector-bits=256.

The patch adds a new hook that targets can use to control how we
move from one vector mode to another.  The hook takes a starting vector
mode, a new element mode, and (optionally) a new number of elements.
The flexibility needed for (1) comes in when the number of elements
isn't specified.

All callers in this patch specify the number of elements, but a later
vectoriser patch doesn't.  I won't be posting the vectoriser patch
for a few days, hence the RFC/A tag.

Tested individually on aarch64-linux-gnu and as a series on
x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
look OK?

I'll post some follow-up patches too.

Richard


2019-10-23  Richard Sandiford  

gcc/
* target.def (related_mode): New hook.
* doc/tm.texi.in (TARGET_VECTORIZE_RELATED_MODE): New hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_vectorize_related_mode): Declare.
* targhooks.c (default_vectorize_related_mode): New function.
* machmode.h (related_vector_mode): Declare.
* stor-layout.c (related_vector_mode): New function.
* expmed.c (extract_bit_field_1): Use it instead of mode_for_vector.
* optabs-query.c (qimode_for_vec_perm): Likewise.
* tree-vect-stmts.c (get_group_load_store_type): Likewise.
(vectorizable_store, vectorizable_load): Likewise

Index: gcc/target.def
===
--- gcc/target.def  2019-09-30 17:20:57.370607986 +0100
+++ gcc/target.def  2019-10-23 11:33:01.568510253 +0100
@@ -1909,6 +1909,33 @@ for autovectorization.  The default impl
  (vector_sizes *sizes, bool all),
  default_autovectorize_vector_sizes)
 
+DEFHOOK
+(related_mode,
+ "If a piece of code is using vector mode @var{vector_mode} and also wants\n\
+to operate on elements of mode @var{element_mode}, return the vector mode\n\
+it should use for those elements.  If @var{nunits} is nonzero, ensure that\n\
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector\n\
+size pairs the most naturally with @var{vector_mode}.  Return an empty\n\
+@code{opt_machine_mode} if there is no supported vector mode with the\n\
+required properties.\n\
+\n\
+There is no prescribed way of handling the case in which @var{nunits}\n\
+is zero.  One common choice is to pick a vector mode with the same size\n\
+as @var{vector_mode}; this is the natural choice if the target has a\n\
+fixed vector size.  Another option is to choose a vector mode with the\n\
+same number of elements as @var{vector_mode}; this is the natural choice\n\
+if the target has a fixed number of elements.  Alternatively, the hook\n\
+might choose a middle ground, such as trying to keep the number of\n\
+elements as similar as possible while applying maximum and minimum\n\
+vector sizes.\n\
+\n\
+The default implementation uses @code{mode_for_vector} to find the\n\
+requested mode, returning a mode with the same size as @var{vector_mode}\n\
+when @var{nunits} is zero.  This is the correct behavior for most targets.",
+ opt_machine_mode,
+ (machine_mode vector_mode, scalar_mode element_mode, poly_uint64 nunits),
+ default_vectorize_related_mode)
+
 /* Function to get a target mode for a vector mask.  */
 DEFHOOK
 (get_mask_mode,
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  2019-09-30 17:20:57.350608130 +0100
+++ gcc/doc/tm.texi.in  2019-10-23 11:33:01.564510281 +0100
@@ -4181,6 +4181,8 @@ address;  but often a machine-dependent
 
 @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 
+@hook TARGET_VECTORIZE_RELATED_MODE
+
 @hook TARGET_VECTORIZE_GET_MASK_MODE
 
 @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2019-09-30 17:20:57.350608130 +0100
+++ gcc/doc/tm.texi 2019-10-23 11:33:01.560510309 +0100
@@ -6029,6 +6029,30 @@ The hook does not need to do anything if
 for autovectorization.  The default implementation does nothing.
 @end deftypefn
 
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_RELATED_MODE 
(machine_mode @var{vector_mode}, scalar_mode @va

Re: [Patch 0/X] [WIP][RFC][libsanitizer] Introduce HWASAN to GCC

2019-10-23 Thread Matthew Malcomson

Hi Martin,

I'm getting close to putting up a patch series that I believe could go 
in before stage1 close.

I currently have to do testing on sanitizing the kernel, and track down 
a bootstrap comparison diff in the code handling shadow-stack cleanup 
during exception unwinding.

I just thought I'd answer these questions below to see if there's 
anything I extra could to do to make reviewing easier.

On 23/09/19 09:02, Martin Liška wrote:
> Hi.
> 
> As mentioned in the next email thread, there are main objectives
> that will help me to make a proper patch review:
> 
> 1) Make first libsanitizer merge from trunk, it will remove the need
> of the backports that you made. Plus I will be able to apply the
> patchset on the current master.
Done
> 2) I would exclude the setjmp/longjmp - these should be upstreamed first
> in libsanitizer.

Will exclude in the patch series, upstreaming under progress 
(https://reviews.llvm.org/D69045)

> 3) I would like to see a two HWASAN options that will clearly separate the
> 2 supported modes: TBI without MTE and MTE. Here I would appreciate to 
> have
> a compiler farm machine with TBI which we can use for testing.

I went back and looked at clang to see that it uses 
`-fsanitize=hwaddress` and `-fsanitize=memtag`, which are completely 
different options.

I'm now doing the same, with the two sanitizers just using similar code 
paths.

In fact, I'm not going to have the MTE instrumentation ready by the end 
of stage1, so my aim is to just put the `-fsanitize=hwaddress` sanitizer 
in, but send some outline code to the mailing list to demonstrate how 
`-fsanitize=memtag` would fit in.

## w.r.t. a compiler farm machine with TBI

Any AArch64 machine has this feature.  However in order to use the 
sanitizer the kernel needs to allow "tagged pointers" in syscalls.

The kernel has allowed these tagged pointers in syscalls (once it's been 
turned on with a relevant prctl) in mainline since 5.4-rc1 (i.e. the 
start of this month).

My testing has been on a virtual machine with a mainline kernel built 
from source.

Given that I'm not sure how you want to proceed.
Could we set up a virtual machine on the compiler farm?

> 4) About the BUILTIN expansion: you provided a patch for couple of them. My 
> question
> is whether the list is complete?

The list of BUILTINs was nowhere near complete at the time I posted the 
RFC patches.

Since then I've added features and correspondingly added BUILTINs.

Now I believe I've added all the BUILTIN's into sanitizer.def this 
sanitizer will need.

> 5) I would appreciate the patch set to be split into less logical parts, e.g.
> libsanitizer changes; option introduction; stack variable handling 
> (colour/uncolour/alignment);
> hwasan pass and other GIMPLE-related changes; RTL hooks, new RTL 
> instructions and expansion changes.
> 

Will do!

> Thank you,
> Martin
> 
>

[PATCH] S/390: Use UNSPEC_GET_TP for thread pointer loads

2019-10-23 Thread Ilya Leoshkevich

Boostrapped and regtested on s390x-redhat-linux.

gcc/ChangeLog:

2019-10-21  Ilya Leoshkevich  

* config/s390/s390.c (s390_get_thread_pointer): Use
gen_get_thread_pointer.
(s390_expand_split_stack_prologue): Likewise.
* config/s390/s390.md (UNSPEC_GET_TP): New UNSPEC.
(*get_tp_31): New 31-bit splitter for UNSPEC_GET_TP.
(*get_tp_64): New 64-bit splitter for UNSPEC_GET_TP.
(get_thread_pointer): Use UNSPEC_GET_TP, use
parameterized name.

gcc/testsuite/ChangeLog:

2019-10-21  Ilya Leoshkevich  

* gcc.target/s390/load-thread-pointer-once-2.c: New test.
---
 gcc/config/s390/s390.c|  5 ++-
 gcc/config/s390/s390.md   | 38 +--
 .../s390/load-thread-pointer-once-2.c | 14 +++
 3 files changed, 43 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/load-thread-pointer-once-2.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9fed7d3b99f..151b80da0b3 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5106,7 +5106,8 @@ s390_get_thread_pointer (void)
 {
   rtx tp = gen_reg_rtx (Pmode);
 
-  emit_move_insn (tp, gen_rtx_REG (Pmode, TP_REGNUM));
+  emit_insn (gen_get_thread_pointer (Pmode, tp));
+
   mark_reg_pointer (tp, BITS_PER_WORD);
 
   return tp;
@@ -11711,7 +11712,7 @@ s390_expand_split_stack_prologue (void)
   /* Get thread pointer.  r1 is the only register we can always destroy - 
r0
 could contain a static chain (and cannot be used to address memory
 anyway), r2-r6 can contain parameters, and r6-r15 are callee-saved.  */
-  emit_move_insn (r1, gen_rtx_REG (Pmode, TP_REGNUM));
+  emit_insn (gen_get_thread_pointer (Pmode, r1));
   /* Aim at __private_ss.  */
   guard = gen_rtx_MEM (Pmode, plus_constant (Pmode, r1, psso));
 
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 1e6439d5fd6..e3881d07f2b 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -105,6 +105,7 @@
; TLS support
UNSPEC_TLSLDM_NTPOFF
UNSPEC_TLS_LOAD
+   UNSPEC_GET_TP
 
; String Functions
UNSPEC_SRST
@@ -1860,23 +1861,35 @@
   *,*,yes")
 ])
 
-; Splitters for loading/storing TLS pointers from/to %a0:DI.
-; Do this only during split2, which runs after reload. At the point when split1
-; runs, some of %a0:DI occurrences might be nested inside other rtxes and thus
-; not matched. As a result, only some occurrences will be split, which will
-; prevent CSE. At the point when split2 runs, reload will have ensured that no
-; nested references exist.
+; Splitters for loading TLS pointer from UNSPEC_GET_TP.
+; UNSPEC_GET_TP is used instead of %a0:P, since the latter is a hard register,
+; and those are not handled by Partial Redundancy Elimination (gcse.c), which
+; results in generation of redundant thread pointer loads.
 
-(define_split
-  [(set (match_operand:DI 0 "register_operand" "")
-(match_operand:DI 1 "register_operand" ""))]
-  "TARGET_ZARCH && ACCESS_REG_P (operands[1]) && reload_completed"
+(define_insn_and_split "*get_tp_31"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (unspec:SI [(match_operand:SI 1 "register_operand" "t")]
+  UNSPEC_GET_TP))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 1))])
+
+(define_insn_and_split "*get_tp_64"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "t")]
+  UNSPEC_GET_TP))]
+  "TARGET_ZARCH"
+  "#"
+  "&& reload_completed"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 0) (ashift:DI (match_dup 0) (const_int 32)))
(set (strict_low_part (match_dup 2)) (match_dup 4))]
   "operands[2] = gen_lowpart (SImode, operands[0]);
s390_split_access_reg (operands[1], &operands[4], &operands[3]);")
 
+; Splitters for storing TLS pointer to %a0:DI.
+
 (define_split
   [(set (match_operand:DI 0 "register_operand" "")
 (match_operand:DI 1 "register_operand" ""))]
@@ -10520,8 +10533,9 @@
 ;;- Thread-local storage support.
 ;;
 
-(define_expand "get_thread_pointer"
-  [(set (match_operand:P 0 "nonimmediate_operand" "") (reg:P TP_REGNUM))]
+(define_expand "@get_thread_pointer"
+  [(set (match_operand:P 0 "nonimmediate_operand" "")
+   (unspec:P [(reg:P TP_REGNUM)] UNSPEC_GET_TP))]
   ""
   "")
 
diff --git a/gcc/testsuite/gcc.target/s390/load-thread-pointer-once-2.c 
b/gcc/testsuite/gcc.target/s390/load-thread-pointer-once-2.c
new file mode 100644
index 000..36b1ed8800f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/load-thread-pointer-once-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+extern void c(void *);
+
+void a(void)
+{
+  void *b = __builtin_thread_pointer();
+  if (b)
+c(b);
+}
+
+/* { dg-final { scan-assembler-times {\n\tear\t} 2 { target { lp64 } } } } */
+/* { dg-final { sc

Replace mode_for_int_vector with related_int_vector_mode

2019-10-23 Thread Richard Sandiford

mode_for_int_vector, like mode_for_vector, can sometimes return
an integer mode or an unsupported vector mode.  But no callers
are interested in that case, and only want supported vector modes.
This patch therefore replaces mode_for_int_vector with
related_int_vector_mode, which gives the target a chance to pick
its preferred vector mode for the given element mode and size.

Tested individually on aarch64-linux-gnu and as a series on
x86_64-linux-gnu.  OK to install?

Richard


2019-10-23  Richard Sandiford  

gcc/
* machmode.h (mode_for_int_vector): Delete.
(related_int_vector_mode): Declare.
* stor-layout.c (mode_for_int_vector): Delete.
(related_int_vector_mode): New function.
* optabs.c (expand_vec_perm_1): Use related_int_vector_mode
instead of mode_for_int_vector.
(expand_vec_perm_const): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Likewise.
(aarch64_evpc_sve_tbl): Likewise.
* config/s390/s390.c (s390_expand_vec_compare_cc): Likewise.
(s390_expand_vcond): Likewise.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2019-10-23 11:33:01.564510281 +0100
+++ gcc/machmode.h  2019-10-23 12:01:36.968336613 +0100
@@ -879,22 +879,9 @@ smallest_int_mode_for_size (poly_uint64
 extern opt_scalar_int_mode int_mode_for_mode (machine_mode);
 extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
 extern opt_machine_mode mode_for_vector (scalar_mode, poly_uint64);
-extern opt_machine_mode mode_for_int_vector (unsigned int, poly_uint64);
 extern opt_machine_mode related_vector_mode (machine_mode, scalar_mode,
 poly_uint64 = 0);
-
-/* Return the integer vector equivalent of MODE, if one exists.  In other
-   words, return the mode for an integer vector that has the same number
-   of bits as MODE and the same number of elements as MODE, with the
-   latter being 1 if MODE is scalar.  The returned mode can be either
-   an integer mode or a vector mode.  */
-
-inline opt_machine_mode
-mode_for_int_vector (machine_mode mode)
-{
-  return mode_for_int_vector (GET_MODE_UNIT_BITSIZE (mode),
- GET_MODE_NUNITS (mode));
-}
+extern opt_machine_mode related_int_vector_mode (machine_mode);
 
 /* A class for iterating through possible bitfield modes.  */
 class bit_field_mode_iterator
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2019-10-23 11:33:01.564510281 +0100
+++ gcc/stor-layout.c   2019-10-23 12:01:36.972336585 +0100
@@ -515,21 +515,6 @@ mode_for_vector (scalar_mode innermode,
   return opt_machine_mode ();
 }
 
-/* Return the mode for a vector that has NUNITS integer elements of
-   INT_BITS bits each, if such a mode exists.  The mode can be either
-   an integer mode or a vector mode.  */
-
-opt_machine_mode
-mode_for_int_vector (unsigned int int_bits, poly_uint64 nunits)
-{
-  scalar_int_mode int_mode;
-  machine_mode vec_mode;
-  if (int_mode_for_size (int_bits, 0).exists (&int_mode)
-  && mode_for_vector (int_mode, nunits).exists (&vec_mode))
-return vec_mode;
-  return opt_machine_mode ();
-}
-
 /* If a piece of code is using vector mode VECTOR_MODE and also wants
to operate on elements of mode ELEMENT_MODE, return the vector mode
it should use for those elements.  If NUNITS is nonzero, ensure that
@@ -550,6 +535,26 @@ related_vector_mode (machine_mode vector
   return targetm.vectorize.related_mode (vector_mode, element_mode, nunits);
 }
 
+/* If a piece of code is using vector mode VECTOR_MODE and also wants
+   to operate on integer vectors with the same element size and number
+   of elements, return the vector mode it should use.  Return an empty
+   opt_machine_mode if there is no supported vector mode with the
+   required properties.
+
+   Unlike mode_for_vector. any returned mode is guaranteed to satisfy
+   both VECTOR_MODE_P and targetm.vector_mode_supported_p.  */
+
+opt_machine_mode
+related_int_vector_mode (machine_mode vector_mode)
+{
+  gcc_assert (VECTOR_MODE_P (vector_mode));
+  scalar_int_mode int_mode;
+  if (int_mode_for_mode (GET_MODE_INNER (vector_mode)).exists (&int_mode))
+return related_vector_mode (vector_mode, int_mode,
+   GET_MODE_NUNITS (vector_mode));
+  return opt_machine_mode ();
+}
+
 /* Return the alignment of MODE. This will be bounded by 1 and
BIGGEST_ALIGNMENT.  */
 
Index: gcc/optabs.c
===
--- gcc/optabs.c2019-10-08 09:23:31.582531825 +0100
+++ gcc/optabs.c2019-10-23 12:01:36.972336585 +0100
@@ -5542,7 +5542,7 @@ expand_vec_perm_1 (enum insn_code icode,
   class expand_operand ops[4];
 
   gcc_assert (GET_MODE_CLASS (smode) == MODE_VECTOR_INT
- || mode_for_int_vector (tmode).require () == smode);
+ || re

Add build_truth_vector_type_for_mode

2019-10-23 Thread Richard Sandiford

Callers of vect_halve_mask_nunits and vect_double_mask_nunits
already know what mode the resulting vector type should have,
so we might as well create the vector type directly with that mode,
just like build_vector_type_for_mode lets us build normal vectors
with a known mode.  This avoids the current awkwardness of having
to recompute the mode starting from vec_info::vector_size, which
hard-codes the assumption that all vectors have to be the same size.

A later patch gets rid of build_truth_vector_type and
build_same_sized_truth_vector_type, so the net effect of the
series is to reduce the number of type functions by one.

Tested individually on aarch64-linux-gnu and as a series on
x86_64-linux-gnu.  OK to install?

Richard


2019-10-23  Richard Sandiford  

gcc/
* tree.h (build_truth_vector_type_for_mode): Declare.
* tree.c (build_truth_vector_type_for_mode): New function,
split out from...
(build_truth_vector_type): ...here.
(build_opaque_vector_type): Fix head comment.
* tree-vectorizer.h (supportable_narrowing_operation): Remove
vec_info parameter.
(vect_halve_mask_nunits): Replace vec_info parameter with the
mode of the new vector.
(vect_double_mask_nunits): Likewise.
* tree-vect-loop.c (vect_halve_mask_nunits): Likewise.
(vect_double_mask_nunits): Likewise.
* tree-vect-loop-manip.c: Include insn-config.h, rtl.h and recog.h.
(vect_maybe_permute_loop_masks): Remove vinfo parameter.  Update call
to vect_halve_mask_nunits, getting the required mode from the unpack
patterns.
(vect_set_loop_condition_masked): Update call accordingly.
* tree-vect-stmts.c (supportable_narrowing_operation): Remove vec_info
parameter and update call to vect_double_mask_nunits.
(vectorizable_conversion): Update call accordingly.
(simple_integer_narrowing): Likewise.  Remove vec_info parameter.
(vectorizable_call): Update call accordingly.
(supportable_widening_operation): Update call to
vect_halve_mask_nunits.

Index: gcc/tree.h
===
--- gcc/tree.h  2019-09-21 13:56:07.519944842 +0100
+++ gcc/tree.h  2019-10-23 12:07:54.505663970 +0100
@@ -4437,6 +4437,7 @@ extern tree build_reference_type_for_mod
 extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
 extern tree build_vector_type (tree, poly_int64);
+extern tree build_truth_vector_type_for_mode (poly_uint64, machine_mode);
 extern tree build_truth_vector_type (poly_uint64, poly_uint64);
 extern tree build_same_sized_truth_vector_type (tree vectype);
 extern tree build_opaque_vector_type (tree, poly_int64);
Index: gcc/tree.c
===
--- gcc/tree.c  2019-10-20 13:58:01.679637360 +0100
+++ gcc/tree.c  2019-10-23 12:07:54.501663998 +0100
@@ -3,25 +3,35 @@ build_vector_type (tree innertype, poly_
   return make_vector_type (innertype, nunits, VOIDmode);
 }
 
+/* Build a truth vector with NUNITS units, giving it mode MASK_MODE.  */
+
+tree
+build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
+{
+  gcc_assert (mask_mode != BLKmode);
+
+  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
+  unsigned HOST_WIDE_INT esize = vector_element_size (vsize, nunits);
+  tree bool_type = build_nonstandard_boolean_type (esize);
+
+  return make_vector_type (bool_type, nunits, mask_mode);
+}
+
 /* Build truth vector with specified length and number of units.  */
 
 tree
 build_truth_vector_type (poly_uint64 nunits, poly_uint64 vector_size)
 {
-  machine_mode mask_mode
-= targetm.vectorize.get_mask_mode (nunits, vector_size).else_blk ();
-
-  poly_uint64 vsize;
-  if (mask_mode == BLKmode)
-vsize = vector_size * BITS_PER_UNIT;
-  else
-vsize = GET_MODE_BITSIZE (mask_mode);
+  machine_mode mask_mode;
+  if (targetm.vectorize.get_mask_mode (nunits,
+  vector_size).exists (&mask_mode))
+return build_truth_vector_type_for_mode (nunits, mask_mode);
 
+  poly_uint64 vsize = vector_size * BITS_PER_UNIT;
   unsigned HOST_WIDE_INT esize = vector_element_size (vsize, nunits);
-
   tree bool_type = build_nonstandard_boolean_type (esize);
 
-  return make_vector_type (bool_type, nunits, mask_mode);
+  return make_vector_type (bool_type, nunits, BLKmode);
 }
 
 /* Returns a vector type corresponding to a comparison of VECTYPE.  */
@@ -11150,7 +11160,8 @@ build_same_sized_truth_vector_type (tree
   return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size);
 }
 
-/* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set.  */
+/* Like build_vector_type, but builds a variant type with TYPE_VECTOR_OPAQUE
+   set.  */
 
 tree
 build_opaque_vector_type (tree innertype, poly_int64 nunits)
Index: gcc/tree-vectorizer.h
=

Remove build_{same_sized_,}truth_vector_type

2019-10-23 Thread Richard Sandiford

build_same_sized_truth_vector_type was confusingly named, since for
SVE and AVX512 the returned vector isn't the same byte size (although
it does have the same number of elements).  What it really returns
is the "truth" vector type for a given data vector type.

The more general truth_type_for provides the same thing when passed
a vector and IMO has a more descriptive name, so this patch replaces
all uses of build_same_sized_truth_vector_type with that.  It does
the same for a call to build_truth_vector_type, leaving truth_type_for
itself as the only remaining caller.

It's then more natural to pass build_truth_vector_type the original
vector type rather than its size and nunits, especially since the
given size isn't the size of the returned vector.  This in turn allows
a future patch to simplify the interface of get_mask_mode.  Doing this
also fixes a bug in which truth_type_for would pass a size of zero for
BLKmode vector types.

Tested individually on aarch64-linux-gnu and as a series on
x86_64-linux-gnu.  OK to install?

Richard


2019-10-23  Richard Sandiford  

gcc/
* tree.h (build_truth_vector_type): Delete.
(build_same_sized_truth_vector_type): Likewise.
* tree.c (build_truth_vector_type): Rename to...
(build_truth_vector_type_for): ...this.  Make static and take
a vector type as argument.
(truth_type_for): Update accordingly.
(build_same_sized_truth_vector_type): Delete.
* tree-vect-generic.c (expand_vector_divmod): Use truth_type_for
instead of build_same_sized_truth_vector_type.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-patterns.c (build_mask_conversion): Likeise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* tree-vect-stmts.c (vect_get_vec_def_for_operand): Likewise.
(vect_build_gather_load_calls, vectorizable_call): Likewise.
(scan_store_can_perm_p, vectorizable_scan_store): Likewise.
(vectorizable_store, vectorizable_condition): Likewise.
(get_mask_type_for_scalar_type, get_same_sized_vectype): Likewise.
(vect_get_mask_type_for_stmt): Use truth_type_for instead of
build_truth_vector_type.
* config/rs6000/rs6000-call.c (fold_build_vec_cmp): Use truth_type_for
instead of build_same_sized_truth_vector_type.

gcc/c/
* c-typeck.c (build_conditional_expr): Use truth_type_for instead
of build_same_sized_truth_vector_type.
(build_vec_cmp): Likewise.

gcc/cp/
* call.c (build_conditional_expr_1): Use truth_type_for instead
of build_same_sized_truth_vector_type.
* typeck.c (build_vec_cmp): Likewise.

gcc/d/
* d-codegen.cc (build_boolop): Use truth_type_for instead of
build_same_sized_truth_vector_type.

Index: gcc/tree.h
===
--- gcc/tree.h  2019-10-23 12:07:54.505663970 +0100
+++ gcc/tree.h  2019-10-23 12:10:58.116366179 +0100
@@ -4438,8 +4438,6 @@ extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
 extern tree build_vector_type (tree, poly_int64);
 extern tree build_truth_vector_type_for_mode (poly_uint64, machine_mode);
-extern tree build_truth_vector_type (poly_uint64, poly_uint64);
-extern tree build_same_sized_truth_vector_type (tree vectype);
 extern tree build_opaque_vector_type (tree, poly_int64);
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree, bool = false);
Index: gcc/tree.c
===
--- gcc/tree.c  2019-10-23 12:07:54.501663998 +0100
+++ gcc/tree.c  2019-10-23 12:10:58.116366179 +0100
@@ -11127,11 +11127,16 @@ build_truth_vector_type_for_mode (poly_u
   return make_vector_type (bool_type, nunits, mask_mode);
 }
 
-/* Build truth vector with specified length and number of units.  */
+/* Build a vector type that holds one boolean result for each element of
+   vector type VECTYPE.  The public interface for this operation is
+   truth_type_for.  */
 
-tree
-build_truth_vector_type (poly_uint64 nunits, poly_uint64 vector_size)
+static tree
+build_truth_vector_type_for (tree vectype)
 {
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 vector_size = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
+
   machine_mode mask_mode;
   if (targetm.vectorize.get_mask_mode (nunits,
   vector_size).exists (&mask_mode))
@@ -11144,22 +11149,6 @@ build_truth_vector_type (poly_uint64 nun
   return make_vector_type (bool_type, nunits, BLKmode);
 }
 
-/* Returns a vector type corresponding to a comparison of VECTYPE.  */
-
-tree
-build_same_sized_truth_vector_type (tree vectype)
-{
-  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-return vectype;
-
-  poly_uint64 size = GET_MODE_SIZE (TYPE_MODE (vectype));
-
-  if (known_e

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread Richard Biener

On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
 wrote:
>
> This patch is the first of a series that tries to remove two
> assumptions:
>
> (1) that all vectors involved in vectorisation must be the same size
>
> (2) that there is only one vector mode for a given element mode and
> number of elements
>
> Relaxing (1) helps with targets that support multiple vector sizes or
> that require the number of elements to stay the same.  E.g. if we're
> vectorising code that operates on narrow and wide elements, and the
> narrow elements use 64-bit vectors, then on AArch64 it would normally
> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> for the wide elements.
>
> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> vectors to work with -msve-vector-bits=256.
>
> The patch adds a new hook that targets can use to control how we
> move from one vector mode to another.  The hook takes a starting vector
> mode, a new element mode, and (optionally) a new number of elements.
> The flexibility needed for (1) comes in when the number of elements
> isn't specified.
>
> All callers in this patch specify the number of elements, but a later
> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> for a few days, hence the RFC/A tag.
>
> Tested individually on aarch64-linux-gnu and as a series on
> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> look OK?

In isolation the idea looks good but maybe a bit limited?  I see
how it works for the same-size case but if you consider x86
where we have SSE, AVX256 and AVX512 what would it return
for related_vector_mode (V4SImode, SImode, 0)?  Or is this
kind of query not intended (where the component modes match
but nunits is zero)?  How do you get from SVE fixed 128bit
to NEON fixed 128bit then?  Or is it just used to stay in the
same register set for different component modes?

As said, it looks good but I'd like to see the followups.

Note I delayed thinking about relaxing the single-vector-size
constraint in the vectorizer until after we're SLP only because
that looked more easily done there.  I also remember patches
relaxing this a bit from RISCV folks.

Thanks,
Richard.

> I'll post some follow-up patches too.
>
> Richard
>
>
> 2019-10-23  Richard Sandiford  
>
> gcc/
> * target.def (related_mode): New hook.
> * doc/tm.texi.in (TARGET_VECTORIZE_RELATED_MODE): New hook.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_vectorize_related_mode): Declare.
> * targhooks.c (default_vectorize_related_mode): New function.
> * machmode.h (related_vector_mode): Declare.
> * stor-layout.c (related_vector_mode): New function.
> * expmed.c (extract_bit_field_1): Use it instead of mode_for_vector.
> * optabs-query.c (qimode_for_vec_perm): Likewise.
> * tree-vect-stmts.c (get_group_load_store_type): Likewise.
> (vectorizable_store, vectorizable_load): Likewise
>
> Index: gcc/target.def
> ===
> --- gcc/target.def  2019-09-30 17:20:57.370607986 +0100
> +++ gcc/target.def  2019-10-23 11:33:01.568510253 +0100
> @@ -1909,6 +1909,33 @@ for autovectorization.  The default impl
>   (vector_sizes *sizes, bool all),
>   default_autovectorize_vector_sizes)
>
> +DEFHOOK
> +(related_mode,
> + "If a piece of code is using vector mode @var{vector_mode} and also wants\n\
> +to operate on elements of mode @var{element_mode}, return the vector mode\n\
> +it should use for those elements.  If @var{nunits} is nonzero, ensure that\n\
> +the mode has exactly @var{nunits} elements, otherwise pick whichever 
> vector\n\
> +size pairs the most naturally with @var{vector_mode}.  Return an empty\n\
> +@code{opt_machine_mode} if there is no supported vector mode with the\n\
> +required properties.\n\
> +\n\
> +There is no prescribed way of handling the case in which @var{nunits}\n\
> +is zero.  One common choice is to pick a vector mode with the same size\n\
> +as @var{vector_mode}; this is the natural choice if the target has a\n\
> +fixed vector size.  Another option is to choose a vector mode with the\n\
> +same number of elements as @var{vector_mode}; this is the natural choice\n\
> +if the target has a fixed number of elements.  Alternatively, the hook\n\
> +might choose a middle ground, such as trying to keep the number of\n\
> +elements as similar as possible while applying maximum and minimum\n\
> +vector sizes.\n\
> +\n\
> +The default implementation uses @code{mode_for_vector} to find the\n\
> +requested mode, returning a mode with the same size as @var{vector_mode}\n\
> +when @var{nunits} is zero.  This is the correct behavior for most targets.",
> + opt_machine_mode,
> + (machine_mode vector_mode, scalar_mode element_mode, poly_uint64 nunits),
> + default_vectorize_related_mode)
> +
>  /* Function to get a

Pass the data vector mode to get_mask_mode

2019-10-23 Thread Richard Sandiford

This patch passes the data vector mode to get_mask_mode, rather than its
size and nunits.  This is a bit simpler and allows targets to distinguish
between modes that happen to have the same size and number of elements.

Tested individually on aarch64-linux-gnu and as a series on
x86_64-linux-gnu.  OK to install?

Richard


2019-10-23  Richard Sandiford  

gcc/
* target.def (get_mask_mode): Take a vector mode itself as argument,
instead of properties about the vector mode.
* doc/tm.texi: Regenerate.
* targhooks.h (default_get_mask_mode): Update to reflect new
get_mode_mask interface.
* targhooks.c (default_get_mask_mode): Likewise.  Use
related_int_vector_mode.
* optabs-query.c (can_vec_mask_load_store_p): Update call
to get_mask_mode.
* tree-vect-stmts.c (check_load_store_masking): Likewise, checking
first that the original mode really is a vector.
* tree.c (build_truth_vector_type_for): Likewise.
* config/aarch64/aarch64.c (aarch64_get_mask_mode): Update for new
get_mode_mask interface.
(aarch64_expand_sve_vcond): Update call accordingly.
* config/gcn/gcn.c (gcn_vectorize_get_mask_mode): Update for new
get_mode_mask interface.
* config/i386/i386.c (ix86_get_mask_mode): Likewise.

Index: gcc/target.def
===
--- gcc/target.def  2019-10-23 11:33:01.568510253 +0100
+++ gcc/target.def  2019-10-23 12:13:54.099122100 +0100
@@ -1939,17 +1939,17 @@ when @var{nunits} is zero.  This is the
 /* Function to get a target mode for a vector mask.  */
 DEFHOOK
 (get_mask_mode,
- "A vector mask is a value that holds one boolean result for every element\n\
-in a vector.  This hook returns the machine mode that should be used to\n\
-represent such a mask when the vector in question is @var{length} bytes\n\
-long and contains @var{nunits} elements.  The hook returns an empty\n\
-@code{opt_machine_mode} if no such mode exists.\n\
+ "Return the mode to use for a vector mask that holds one boolean\n\
+result for each element of vector mode @var{mode}.  The returned mask mode\n\
+can be a vector of integers (class @code{MODE_VECTOR_INT}), a vector of\n\
+booleans (class @code{MODE_VECTOR_BOOL}) or a scalar integer (class\n\
+@code{MODE_INT}).  Return an empty @code{opt_machine_mode} if no such\n\
+mask mode exists.\n\
 \n\
-The default implementation returns the mode of an integer vector that\n\
-is @var{length} bytes long and that contains @var{nunits} elements,\n\
-if such a mode exists.",
+The default implementation returns a @code{MODE_VECTOR_INT} with the\n\
+same size and number of elements as @var{mode}, if such a mode exists.",
  opt_machine_mode,
- (poly_uint64 nunits, poly_uint64 length),
+ (machine_mode mode),
  default_get_mask_mode)
 
 /* Function to say whether a masked operation is expensive when the
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2019-10-23 11:33:01.560510309 +0100
+++ gcc/doc/tm.texi 2019-10-23 12:13:54.099122100 +0100
@@ -6053,16 +6053,16 @@ requested mode, returning a mode with th
 when @var{nunits} is zero.  This is the correct behavior for most targets.
 @end deftypefn
 
-@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
(poly_uint64 @var{nunits}, poly_uint64 @var{length})
-A vector mask is a value that holds one boolean result for every element
-in a vector.  This hook returns the machine mode that should be used to
-represent such a mask when the vector in question is @var{length} bytes
-long and contains @var{nunits} elements.  The hook returns an empty
-@code{opt_machine_mode} if no such mode exists.
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
(machine_mode @var{mode})
+Return the mode to use for a vector mask that holds one boolean
+result for each element of vector mode @var{mode}.  The returned mask mode
+can be a vector of integers (class @code{MODE_VECTOR_INT}), a vector of
+booleans (class @code{MODE_VECTOR_BOOL}) or a scalar integer (class
+@code{MODE_INT}).  Return an empty @code{opt_machine_mode} if no such
+mask mode exists.
 
-The default implementation returns the mode of an integer vector that
-is @var{length} bytes long and that contains @var{nunits} elements,
-if such a mode exists.
+The default implementation returns a @code{MODE_VECTOR_INT} with the
+same size and number of elements as @var{mode}, if such a mode exists.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE 
(unsigned @var{ifn})
Index: gcc/targhooks.h
===
--- gcc/targhooks.h 2019-10-23 11:33:01.568510253 +0100
+++ gcc/targhooks.h 2019-10-23 12:13:54.099122100 +0100
@@ -117,7 +117,7 @@ extern void default_autovectorize_vector
 extern opt_machine_mode default_vectoriz

[PATCH][OBVIOUS] Initialize a field in fibonacci_node.

2019-10-23 Thread Martin Liška

The patch fixes a cppcheck where we have potentially
uninitialized struct field.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-10-23  Martin Liska  

PR middle-end/81669
* fibonacci_heap.h (fibonacci_node::fibonacci_node):
Initialize m_data.
---
 gcc/fibonacci_heap.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/fibonacci_heap.h b/gcc/fibonacci_heap.h
index 3bd0a9f8af1..2ff26270048 100644
--- a/gcc/fibonacci_heap.h
+++ b/gcc/fibonacci_heap.h
@@ -56,7 +56,7 @@ class fibonacci_node
 public:
   /* Default constructor.  */
   fibonacci_node (): m_parent (NULL), m_child (NULL), m_left (this),
-m_right (this), m_degree (0), m_mark (0)
+m_right (this), m_data (NULL), m_degree (0), m_mark (0)
   {
   }

Re: [PATCH] OpenACC reference count overhaul

2019-10-23 Thread Thomas Schwinge

Hi Julian!

On 2019-10-21T16:14:11+0200, I wrote:
> On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
>> This patch has been broken out of the patch supporting OpenACC 2.6 manual
>> deep copy last posted here:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html
>
> Thanks.

I meanwhile re-discovered that an earlier submission,
,
had included some documentation/rationale for:

> I haven't understood all the changes related to replacing
> 'dynamic_refcount' with 'virtual_refcount', getting rid of
> 'data_environ', the 'lookup_dev' rework, but I trust you got that right.
> In particular, these seem to remove special-case OpenACC code in favor of
> generic OMP code, which is good.

... these changes.  Please in the future remember to refer to such
existing documentation/rationale, or again include in any re-submissions,
thanks.

>> Tested with offloading to NVPTX, with good results

I noticed that when testing with
'-foffload=x86_64-intelmicemul-linux-gnu', the x86_64-pc-linux-gnu '-m32'
multilib (but not default '-m64', huh) then reproducibly regresses:

PASS: libgomp.c/target-link-1.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/target-link-1.c execution test

..., with an un-helpful message: "offload error: process on the device 0
unexpectedly exited with code 0".

So non-OpenACC code paths seem to be negatively affected in some way?

Hopefully that'll go away when backing out the 'VREFCOUNT_LINK_KEY'
etc. changes, as discussed elsewhere.  (I can easily test patches for
you, no need for you to set up Intel MIC (emulated) offloading testing.)

Grüße
 Thomas

signature.asc
Description: PGP signature

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
>  wrote:
>>
>> This patch is the first of a series that tries to remove two
>> assumptions:
>>
>> (1) that all vectors involved in vectorisation must be the same size
>>
>> (2) that there is only one vector mode for a given element mode and
>> number of elements
>>
>> Relaxing (1) helps with targets that support multiple vector sizes or
>> that require the number of elements to stay the same.  E.g. if we're
>> vectorising code that operates on narrow and wide elements, and the
>> narrow elements use 64-bit vectors, then on AArch64 it would normally
>> be better to use 128-bit vectors rather than pairs of 64-bit vectors
>> for the wide elements.
>>
>> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
>> fixed-length code for SVE.  It also allows unpacked/half-size SVE
>> vectors to work with -msve-vector-bits=256.
>>
>> The patch adds a new hook that targets can use to control how we
>> move from one vector mode to another.  The hook takes a starting vector
>> mode, a new element mode, and (optionally) a new number of elements.
>> The flexibility needed for (1) comes in when the number of elements
>> isn't specified.
>>
>> All callers in this patch specify the number of elements, but a later
>> vectoriser patch doesn't.  I won't be posting the vectoriser patch
>> for a few days, hence the RFC/A tag.
>>
>> Tested individually on aarch64-linux-gnu and as a series on
>> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
>> look OK?
>
> In isolation the idea looks good but maybe a bit limited?  I see
> how it works for the same-size case but if you consider x86
> where we have SSE, AVX256 and AVX512 what would it return
> for related_vector_mode (V4SImode, SImode, 0)?  Or is this
> kind of query not intended (where the component modes match
> but nunits is zero)?

In that case we'd normally get V4SImode back.  It's an allowed
combination, but not very useful.

> How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
> it just used to stay in the same register set for different component
> modes?

Yeah, the idea is to use the original vector mode as essentially
a base architecture.

The follow-on patches replace vec_info::vector_size with
vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
with targetm.vectorize.autovectorize_vector_modes.  These are the
starting modes that would be passed to the hook in the nunits==0 case.

E.g. for Advanced SIMD on AArch64, it would make more sense for
related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode.
I think things would work in a similar way for the x86_64 vector archs.

For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the
Advanced SIMD mode) to autovectorize_vector_modes, even though they
happen to be the same size for 128-bit SVE.  We can then compare
128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring
that we consistently use all-SVE modes or all-Advanced SIMD modes
for each attempt.

The plan for SVE is to add 4(!) modes to autovectorize_vector_modes:

- VNx16QImode (full vector)
- VNx8QImode (half vector)
- VNx4QImode (quarter vector)
- VNx2QImode (eighth vector)

and then pick the one with the lowest cost.  related_mode would
keep the number of units the same for nunits==0, within the limit
of the vector size.  E.g.:

- related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector)
- related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector)
- related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector)
- related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector)

and:

- related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector)
- related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector)
- related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector)
- related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector)

So when operating on multiple element sizes, the tradeoff is between
trying to make full use of the vector size (higher base nunits) vs.
trying to remove packs and unpacks between multiple vector copies
(lower base nunits).  The latter is useful because extending within
a vector is an in-lane rather than cross-lane operation and truncating
within a vector is a no-op.

With a couple of tweaks, we seem to do a good job of guessing which
version has the lowest cost, at least for the simple cases I've tried
so far.

Obviously there's going to be a bit of a compile-time cost
for SVE targets, but I think it's worth paying for.

> As said, it looks good but I'd like to see the followups.
>
> Note I delayed thinking about relaxing the single-vector-size
> constraint in the vectorizer until after we're SLP only because
> that looked more easily done there.  I also remember patches
> relaxing this a bit from RISCV folks.

That side seemed easier than I'd expected TBH, at least after the
mostly mechanical changes above.  The main m

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread Richard Biener

On Wed, Oct 23, 2019 at 1:51 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
> >  wrote:
> >>
> >> This patch is the first of a series that tries to remove two
> >> assumptions:
> >>
> >> (1) that all vectors involved in vectorisation must be the same size
> >>
> >> (2) that there is only one vector mode for a given element mode and
> >> number of elements
> >>
> >> Relaxing (1) helps with targets that support multiple vector sizes or
> >> that require the number of elements to stay the same.  E.g. if we're
> >> vectorising code that operates on narrow and wide elements, and the
> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> >> for the wide elements.
> >>
> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> >> vectors to work with -msve-vector-bits=256.
> >>
> >> The patch adds a new hook that targets can use to control how we
> >> move from one vector mode to another.  The hook takes a starting vector
> >> mode, a new element mode, and (optionally) a new number of elements.
> >> The flexibility needed for (1) comes in when the number of elements
> >> isn't specified.
> >>
> >> All callers in this patch specify the number of elements, but a later
> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> >> for a few days, hence the RFC/A tag.
> >>
> >> Tested individually on aarch64-linux-gnu and as a series on
> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> >> look OK?
> >
> > In isolation the idea looks good but maybe a bit limited?  I see
> > how it works for the same-size case but if you consider x86
> > where we have SSE, AVX256 and AVX512 what would it return
> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
> > kind of query not intended (where the component modes match
> > but nunits is zero)?
>
> In that case we'd normally get V4SImode back.  It's an allowed
> combination, but not very useful.
>
> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
> > it just used to stay in the same register set for different component
> > modes?
>
> Yeah, the idea is to use the original vector mode as essentially
> a base architecture.
>
> The follow-on patches replace vec_info::vector_size with
> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
> with targetm.vectorize.autovectorize_vector_modes.  These are the
> starting modes that would be passed to the hook in the nunits==0 case.
>
> E.g. for Advanced SIMD on AArch64, it would make more sense for
> related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode.
> I think things would work in a similar way for the x86_64 vector archs.
>
> For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the
> Advanced SIMD mode) to autovectorize_vector_modes, even though they
> happen to be the same size for 128-bit SVE.  We can then compare
> 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring
> that we consistently use all-SVE modes or all-Advanced SIMD modes
> for each attempt.
>
> The plan for SVE is to add 4(!) modes to autovectorize_vector_modes:
>
> - VNx16QImode (full vector)
> - VNx8QImode (half vector)
> - VNx4QImode (quarter vector)
> - VNx2QImode (eighth vector)
>
> and then pick the one with the lowest cost.  related_mode would
> keep the number of units the same for nunits==0, within the limit
> of the vector size.  E.g.:
>
> - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector)
> - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector)
> - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector)
> - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector)
>
> and:
>
> - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector)
> - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector)
> - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector)
> - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector)
>
> So when operating on multiple element sizes, the tradeoff is between
> trying to make full use of the vector size (higher base nunits) vs.
> trying to remove packs and unpacks between multiple vector copies
> (lower base nunits).  The latter is useful because extending within
> a vector is an in-lane rather than cross-lane operation and truncating
> within a vector is a no-op.
>
> With a couple of tweaks, we seem to do a good job of guessing which
> version has the lowest cost, at least for the simple cases I've tried
> so far.
>
> Obviously there's going to be a bit of a compile-time cost
> for SVE targets, but I think it's worth paying for.

I would guess that immediate benefit could be seen with
basic-block vectorization which simply fails when conversions
are involved.  x86_64 should no

Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-23 Thread Richard Biener

On Mon, Oct 21, 2019 at 10:04 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> Thanks for the pointers.
>
>
>
> On Fri, 11 Oct 2019 at 22:33, Richard Biener  
> wrote:
> >
> > On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > Hi Richard,
> > > Thanks for the review.
> > >
> > > On Wed, 2 Oct 2019 at 20:41, Richard Biener  
> > > wrote:
> > > >
> > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > As mentioned in the PR, attached patch adds COLLECT_AS_OPTIONS for
> > > > > passing assembler options specified with -Wa, to the link-time driver.
> > > > >
> > > > > The proposed solution only works for uniform -Wa options across all
> > > > > TUs. As mentioned by Richard Biener, supporting non-uniform -Wa flags
> > > > > would require either adjusting partitioning according to flags or
> > > > > emitting multiple object files  from a single LTRANS CU. We could
> > > > > consider this as a follow up.
> > > > >
> > > > > Bootstrapped and regression tests on  arm-linux-gcc. Is this OK for 
> > > > > trunk?
> > > >
> > > > While it works for your simple cases it is unlikely to work in practice 
> > > > since
> > > > your implementation needs the assembler options be present at the link
> > > > command line.  I agree that this might be the way for people to go when
> > > > they face the issue but then it needs to be documented somewhere
> > > > in the manual.
> > > >
> > > > That is, with COLLECT_AS_OPTION (why singular?  I'd expected
> > > > COLLECT_AS_OPTIONS) available to cc1 we could stream this string
> > > > to lto_options and re-materialize it at link time (and diagnose 
> > > > mismatches
> > > > even if we like).
> > > OK. I will try to implement this. So the idea is if we provide
> > > -Wa,options as part of the lto compile, this should be available
> > > during link time. Like in:
> > >
> > > arm-linux-gnueabihf-gcc -march=armv7-a -mthumb -O2 -flto
> > > -Wa,-mimplicit-it=always,-mthumb -c test.c
> > > arm-linux-gnueabihf-gcc  -flto  test.o
> > >
> > > I am not sure where should we stream this. Currently, cl_optimization
> > > has all the optimization flag provided for compiler and it is
> > > autogenerated and all the flags are integer values. Do you have any
> > > preference or example where this should be done.
> >
> > In lto_write_options, I'd simply append the contents of COLLECT_AS_OPTIONS
> > (with -Wa, prepended to each of them), then recover them in lto-wrapper
> > for each TU and pass them down to the LTRANS compiles (if they agree
> > for all TUs, otherwise I'd warn and drop them).
>
> Attached patch streams it and also make sure that the options are the
> same for all the TUs. Maybe it is a bit restrictive.
>
> What is the best place to document COLLECT_AS_OPTIONS. We don't seem
> to document COLLECT_GCC_OPTIONS anywhere ?

Nowhere, it's an implementation detail then.

> Attached patch passes regression and also fixes the original ARM
> kernel build issue with tumb2.

Did you try this with multiple assembler options?  I see you stream
them as -Wa,-mfpu=xyz,-mthumb but then compare the whole
option strings so a mismatch with -Wa,-mthumb,-mfpu=xyz would be
diagnosed.  If there's a spec induced -Wa option do we get to see
that as well?  I can imagine -march=xyz enabling a -Wa option
for example.

+ *collect_as = XNEWVEC (char, strlen (args_text) + 1);
+ strcpy (*collect_as, args_text);

there's strdup.  Btw, I'm not sure why you don't simply leave
the -Wa option in the merged options [individually] and match
them up but go the route of comparing strings and carrying that
along separately.  I think that would be much better.

Thanks and sorry for the delay.
Richard.

> Thanks,
> Kugan
> >
> > Richard.
> >
> > > Thanks,
> > > Kugan
> > >
> > >
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Kugan
> > > > >
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2019-10-02  kugan.vivekanandarajah  
> > > > > 
> > > > >
> > > > > PR lto/78353
> > > > > * gcc.c (putenv_COLLECT_AS_OPTION): New to set COLLECT_AS_OPTION in 
> > > > > env.
> > > > > (driver::main): Call putenv_COLLECT_AS_OPTION.
> > > > > * lto-wrapper.c (run_gcc): use COLLECT_AS_OPTION from env.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 2019-10-02  kugan.vivekanandarajah  
> > > > > 
> > > > >
> > > > > PR lto/78353
> > > > > * gcc.target/arm/pr78353-1.c: New test.
> > > > > * gcc.target/arm/pr78353-2.c: New test.

Re: GCC 9 backports

2019-10-23 Thread Martin Liška

On 9/2/19 10:56 AM, Martin Liška wrote:
> Hi.
> 
> There are 2 more patches that I've just tested.
> 
> Martin

Hi.

There are 2 more patches that I've just tested.

Martin
>From e1299829fce26b60105e09e2c6e60d8b998a566b Mon Sep 17 00:00:00 2001
From: jakub 
Date: Fri, 27 Sep 2019 10:28:48 +
Subject: [PATCH 2/2] Backport r276178

gcc/ChangeLog:

	* tree-vectorizer.c (try_vectorize_loop_1): Add
	TODO_update_ssa_only_virtuals similarly to what slp pass does.

gcc/testsuite/ChangeLog:

2019-09-27  Jakub Jelinek  

	PR tree-optimization/91885
	* gcc.dg/pr91885.c (__int64_t): Change from long to long long.
	(__uint64_t): Change from unsigned long to unsigned long long.
---
 gcc/testsuite/gcc.dg/pr91885.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr91885.c b/gcc/testsuite/gcc.dg/pr91885.c
index 934e8d3e6c3..35be32be559 100644
--- a/gcc/testsuite/gcc.dg/pr91885.c
+++ b/gcc/testsuite/gcc.dg/pr91885.c
@@ -2,8 +2,8 @@
 /* { dg-options "-O3 -fprofile-generate" } */
 /* { dg-require-profiling "-fprofile-generate" } */
 
-typedef signed long int __int64_t;
-typedef unsigned long int __uint64_t;
+typedef signed long long int __int64_t;
+typedef unsigned long long int __uint64_t;
 typedef __int64_t int64_t;
 typedef __uint64_t uint64_t;
 inline void
-- 
2.23.0

>From 2f9a827e2f8b675ae18a9e192a80855ac41f4aaa Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 26 Sep 2019 07:40:09 +
Subject: [PATCH 1/2] Backport r276141

gcc/ChangeLog:

2019-09-26  Martin Liska  

	PR tree-optimization/91885
	* tree-vectorizer.c (try_vectorize_loop_1):
	Add TODO_update_ssa_only_virtuals similarly to what slp
	pass does.

gcc/testsuite/ChangeLog:

2019-09-26  Martin Liska  

	PR tree-optimization/91885
	* gcc.dg/pr91885.c: New test.
---
 gcc/testsuite/gcc.dg/pr91885.c | 47 ++
 gcc/tree-vectorizer.c  |  2 +-
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr91885.c

diff --git a/gcc/testsuite/gcc.dg/pr91885.c b/gcc/testsuite/gcc.dg/pr91885.c
new file mode 100644
index 000..934e8d3e6c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr91885.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fprofile-generate" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+
+typedef signed long int __int64_t;
+typedef unsigned long int __uint64_t;
+typedef __int64_t int64_t;
+typedef __uint64_t uint64_t;
+inline void
+BLI_endian_switch_int64 (int64_t *val)
+{
+  uint64_t tval = *val;
+  *val = ((tval >> 56)) | ((tval << 40) & 0x00ffll)
+	 | ((tval << 24) & 0xff00ll)
+	 | ((tval << 8) & 0x00ffll)
+	 | ((tval >> 8) & 0xff00ll)
+	 | ((tval >> 24) & 0x00ffll)
+	 | ((tval >> 40) & 0xff00ll) | ((tval << 56));
+}
+typedef struct anim_index_entry
+{
+  unsigned long long seek_pos_dts;
+  unsigned long long pts;
+} anim_index_entry;
+extern struct anim_index_entry *
+MEM_callocN (int);
+struct anim_index
+{
+  int num_entries;
+  struct anim_index_entry *entries;
+};
+struct anim_index *
+IMB_indexer_open (const char *name)
+{
+  char header[13];
+  struct anim_index *idx;
+  int i;
+  idx->entries = MEM_callocN (8);
+  if (((1 == 0) != (header[8] == 'V')))
+{
+  for (i = 0; i < idx->num_entries; i++)
+	{
+	  BLI_endian_switch_int64 ((int64_t *) &idx->entries[i].seek_pos_dts);
+	  BLI_endian_switch_int64 ((int64_t *) &idx->entries[i].pts);
+	}
+}
+}
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index d27104933a9..d89ec3b7c76 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -941,7 +941,7 @@ try_vectorize_loop_1 (hash_table *&simduid_to_vf_htab,
 	  fold_loop_internal_call (loop_vectorized_call,
    boolean_true_node);
 	  loop_vectorized_call = NULL;
-	  ret |= TODO_cleanup_cfg;
+	  ret |= TODO_cleanup_cfg | TODO_update_ssa_only_virtuals;
 	}
 	}
   /* If outer loop vectorization fails for LOOP_VECTORIZED guarded
-- 
2.23.0

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, Oct 23, 2019 at 1:51 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> This patch is the first of a series that tries to remove two
>> >> assumptions:
>> >>
>> >> (1) that all vectors involved in vectorisation must be the same size
>> >>
>> >> (2) that there is only one vector mode for a given element mode and
>> >> number of elements
>> >>
>> >> Relaxing (1) helps with targets that support multiple vector sizes or
>> >> that require the number of elements to stay the same.  E.g. if we're
>> >> vectorising code that operates on narrow and wide elements, and the
>> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
>> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
>> >> for the wide elements.
>> >>
>> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
>> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
>> >> vectors to work with -msve-vector-bits=256.
>> >>
>> >> The patch adds a new hook that targets can use to control how we
>> >> move from one vector mode to another.  The hook takes a starting vector
>> >> mode, a new element mode, and (optionally) a new number of elements.
>> >> The flexibility needed for (1) comes in when the number of elements
>> >> isn't specified.
>> >>
>> >> All callers in this patch specify the number of elements, but a later
>> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
>> >> for a few days, hence the RFC/A tag.
>> >>
>> >> Tested individually on aarch64-linux-gnu and as a series on
>> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
>> >> look OK?
>> >
>> > In isolation the idea looks good but maybe a bit limited?  I see
>> > how it works for the same-size case but if you consider x86
>> > where we have SSE, AVX256 and AVX512 what would it return
>> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
>> > kind of query not intended (where the component modes match
>> > but nunits is zero)?
>>
>> In that case we'd normally get V4SImode back.  It's an allowed
>> combination, but not very useful.
>>
>> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
>> > it just used to stay in the same register set for different component
>> > modes?
>>
>> Yeah, the idea is to use the original vector mode as essentially
>> a base architecture.
>>
>> The follow-on patches replace vec_info::vector_size with
>> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
>> with targetm.vectorize.autovectorize_vector_modes.  These are the
>> starting modes that would be passed to the hook in the nunits==0 case.
>>
>> E.g. for Advanced SIMD on AArch64, it would make more sense for
>> related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode.
>> I think things would work in a similar way for the x86_64 vector archs.
>>
>> For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the
>> Advanced SIMD mode) to autovectorize_vector_modes, even though they
>> happen to be the same size for 128-bit SVE.  We can then compare
>> 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring
>> that we consistently use all-SVE modes or all-Advanced SIMD modes
>> for each attempt.
>>
>> The plan for SVE is to add 4(!) modes to autovectorize_vector_modes:
>>
>> - VNx16QImode (full vector)
>> - VNx8QImode (half vector)
>> - VNx4QImode (quarter vector)
>> - VNx2QImode (eighth vector)
>>
>> and then pick the one with the lowest cost.  related_mode would
>> keep the number of units the same for nunits==0, within the limit
>> of the vector size.  E.g.:
>>
>> - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector)
>> - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector)
>> - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector)
>> - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector)
>>
>> and:
>>
>> - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector)
>> - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector)
>> - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector)
>> - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector)
>>
>> So when operating on multiple element sizes, the tradeoff is between
>> trying to make full use of the vector size (higher base nunits) vs.
>> trying to remove packs and unpacks between multiple vector copies
>> (lower base nunits).  The latter is useful because extending within
>> a vector is an in-lane rather than cross-lane operation and truncating
>> within a vector is a no-op.
>>
>> With a couple of tweaks, we seem to do a good job of guessing which
>> version has the lowest cost, at least for the simple cases I've tried
>> so far.
>>
>> Obviously there's going to be a bit of a compile-time cost
>> for SVE targets, but I think it's worth paying for.
>
> I would gue

[RFC C++ PATCH] OpenMP declare variant for C++

2019-10-23 Thread Jakub Jelinek

Hi!

The following patch attempts to implement the declare variant lookup.
As suggested by Jonathan and Jason, we have
#pragma omp declare variant(variant-func-id) clause new-line
function-declaration-or-definition
where variant-func-id is id-expression.  This is supposed to establish
a variant for the base function (declared or defined in the
function-declaration-or-definition), and the OpenMP spec says:
"The function variant is determined by base language standard name lookup rules 
([basic.lookup])
of variant-func-id with arguments that correspond to the argument types in the 
base function
declaration.
The variant-func-id and any expressions inside of the match clause are 
interpreted as if they
appeared at the scope of the trailing return type of the base function."
and
"The type of the function variant must be compatible with the type of the base 
function after the
implementation-defined transformation for its OpenMP context."
This implementation-defined transformation is only for simd (which the patch
doesn't handle yet, will need a target hook), otherwise the transformation
is a nop.

So, what this patch does is, because during parsing we still don't have
the base function decl, build attribute and stick into the attribute
everything we need later for the lookup as well as the context, and
later on, either in cp_finish_decl or finish_function or finish_struct,
it finalizes it by trying to construct a function call with dummy arguments
corresponding to the types.

What is unclear from the above wording is whether e.g. 
g++.dg/gomp/declare-variant-3.C
void T::fn25(int) can be a variant of void S::fn26(int) base function,
comptypes considers them different, but it is unclear if C++ considers them
to have different function type.  clang++ in their (also incomplete)
implementation accepts these unlike my patch, provided the base this type is
derived from the variant this type.  If it was meant to be supported, we'd
need to potentially adjust the this pointer, which can be offset by a
constant but with virtual bases even more difficult expression, plus not
sure if there is a way to call comptypes such that it would accept it
or say try to build pointer to members from the base (i.e. derived) class
and compare those?

Handling of the pragmas when processing_template_decl still needs work, will
do that as a follow-up.

Does this look reasonable to you?

2019-10-23  Jakub Jelinek  

* cp-tree.h (omp_declare_variant_finalize, build_local_temp): Declare.
* decl.c (declare_simd_adjust_this): Add forward declaration.
(omp_declare_variant_finalize_one, omp_declare_variant_finalize): New
function.
(cp_finish_decl, finish_function): Call omp_declare_variant_finalize.
* parser.c (cp_finish_omp_declare_variant): Adjust parsing of the
variant id-expression and propagate enough information to
omp_declare_variant_finalize_one in the attribute so that it can
finalize it.
* class.c (finish_struct): Call omp_declare_variant_finalize.
* tree.c (build_local_temp): No longer static, remove forward
declaration.

* c-c++-common/gomp/declare-variant-2.c: Add a test with , before
match clause.
* c-c++-common/gomp/declare-variant-6.c: Expect diagnostics also from
C++ FE and adjust regexp so that it handles C++ pretty printing of
function names.
* g++.dg/gomp/declare-variant-1.C: New test.
* g++.dg/gomp/declare-variant-2.C: New test.
* g++.dg/gomp/declare-variant-3.C: New test.
* g++.dg/gomp/declare-variant-4.C: New test.
* g++.dg/gomp/declare-variant-5.C: New test.

--- gcc/cp/cp-tree.h.jj 2019-10-22 12:41:34.286018978 +0200
+++ gcc/cp/cp-tree.h2019-10-23 12:21:48.075885842 +0200
@@ -6433,6 +6433,7 @@ extern tree groktypename  (cp_decl_spec
 extern tree start_decl (const cp_declarator *, 
cp_decl_specifier_seq *, int, tree, tree, tree *);
 extern void start_decl_1   (tree, bool);
 extern bool check_array_initializer(tree, tree, tree);
+extern void omp_declare_variant_finalize   (tree, tree);
 extern void cp_finish_decl (tree, tree, bool, tree, int);
 extern tree lookup_decomp_type (tree);
 extern void cp_maybe_mangle_decomp (tree, tree, unsigned int);
@@ -7290,6 +7291,7 @@ extern tree build_min_nt_call_vec (tree,
 extern tree build_min_non_dep_call_vec (tree, tree, vec 
*);
 extern vec* vec_copy_and_insert(vec*, tree, 
unsigned);
 extern tree build_cplus_new(tree, tree, tsubst_flags_t);
+extern tree build_local_temp   (tree);
 extern tree build_aggr_init_expr   (tree, tree);
 extern tree get_target_expr(tree);
 extern tree get_target_expr_sfinae (tree, tsubst_flags_t);
--- gcc/cp/decl.c.jj2019-10-22 16:52:23.416073416 +0200
+++ gcc/cp/d

[PATCH] Fix the easy part of PR65930

2019-10-23 Thread Richard Biener



The following enables vectorization of

int bar (unsigned int *x)
{
  int sum = 0;
  for (int i = 0; i < 32; ++i)
sum += x[i];
  return sum;
}

which is currently not done because the loop has a conversion
to unsigned int for 'sum' for doing the addition part of the
reduction.  That can now easily be relaxed after the recent
refactorings in reduction vectorization support.

There's more to actually fix PR65930 (and IIRC the case
occuring in x264), namely fix the SLP reduction case.  I'm
working on that right now.

As you can see the testsuite has a few instances of the above
thus I refrained from adding another testcase.

Bootstraped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-23  Richard Biener  

PR tree-optimization/65930
* tree-vect-loop.c (check_reduction_path): Allow conversions
that only change the sign.
(vectorizable_reduction): Relax latch def stmts we handle further.

* gcc.dg/vect/vect-reduc-2char-big-array.c: Adjust.
* gcc.dg/vect/vect-reduc-2char.c: Likewise.
* gcc.dg/vect/vect-reduc-2short.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Likewise.
* gcc.dg/vect/vect-reduc-pattern-2c.c: Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277312)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2695,7 +2695,11 @@ pop:
  if (gimple_assign_rhs2 (use_stmt) == op)
neg = ! neg;
}
-  if (*code == ERROR_MARK)
+  if (CONVERT_EXPR_CODE_P (use_code)
+ && tree_nop_conversion_p (TREE_TYPE (gimple_assign_lhs (use_stmt)),
+   TREE_TYPE (gimple_assign_rhs1 (use_stmt
+   ;
+  else if (*code == ERROR_MARK)
*code = use_code;
   else if (use_code != *code)
{
@@ -5692,19 +5696,6 @@ vectorizable_reduction (stmt_vec_info st
 which is defined by the loop-header-phi.  */
 
   gassign *stmt = as_a  (stmt_info->stmt);
-  switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
-{
-case GIMPLE_BINARY_RHS:
-case GIMPLE_TERNARY_RHS:
-  break;
-
-case GIMPLE_UNARY_RHS:
-case GIMPLE_SINGLE_RHS:
-  return false;
-
-default:
-  gcc_unreachable ();
-}
   enum tree_code code = gimple_assign_rhs_code (stmt);
   int op_type = TREE_CODE_LENGTH (code);
 
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-2char-big-array.c
===
--- gcc/testsuite/gcc.dg/vect/vect-reduc-2char-big-array.c  (revision 
277312)
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-2char-big-array.c  (working copy)
@@ -62,4 +62,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-2char.c
===
--- gcc/testsuite/gcc.dg/vect/vect-reduc-2char.c(revision 277312)
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-2char.c(working copy)
@@ -46,4 +46,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-2short.c
===
--- gcc/testsuite/gcc.dg/vect/vect-reduc-2short.c   (revision 277312)
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-2short.c   (working copy)
@@ -45,4 +45,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
===
--- gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  (revision 277312)
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  (working copy)
@@ -12,12 +12,6 @@ signed char Y[N] __attribute__ ((__align
 
 /* char->short->short dot product.
The dot-product pattern should be detected.
-   The reduction is currently not vectorized becaus of the 
signed->unsigned->signed
-   casts, since this patch:
-
- 2005-12-26  Kazu Hirata  
-   
 
-PR tree-optimization/25125
 
When the dot-product is detected, the loop should be vectorized on 
vect_sdot_qi 
targets (targets that support dot-product of signed char).  
@@ -60,5 +54,5 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 
"vect" { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 
1 "vect" } } */

[Patch, Fortran] PR91863 - fix call to bind(C) with array descriptor

2019-10-23 Thread Tobias Burnus


With the trunk, there are three issues:

(a) With bind(C), the callee side handles deallocation with intent(out).

This should produce the code:
    if (cfi.0 != 0B)
  {
    __builtin_free (cfi.0);
    cfi.0 = 0B;
  }
This fails as cfi.0 (of type 'void*') is dereferenced and
*cfi.0 = 0B' (i.e. assignment of type 'void') causes the ICE.


(b) With that fixed, one gets:
sub (cfi.4);
_gfortran_cfi_desc_to_gfc_desc (&a, &cfi.4);
if (cfi.4 != 0B)
  __builtin_free (cfi.4);
... code using "a" ...
That also won't shine as 'a.data' == 'cfi.4'; hence, one
accesses already freed memory.

I don't see whether freeing the cfi memory makes sense at all;
as I didn't come up with a reason, I removed it for good.


Those issues, I have solved. The third issue is now PR fortran/92189:
(c) When allocating memory in a Fortran-written Bind(C) function, the
shape/bounds changes are not propagated back to Fortran.
Namely, "sub" lacks some _gfortran_gfc_desc_to_cfi_desc call at the end!

The issue pops up, if you change 'dg-do compile' into 'dg-do run'. For
using a C-written function, that's a non-issue. Hence, it makes sense
to fix (a)+(b) of the bug separately.


OK for the trunk and GCC 9? (At least the ICE is a regression.)

Tobias


	PR fortran/91863
	* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Don't free data
	memory as that's done on the Fortran side.
	(gfc_conv_procedure_call): Handle void* pointers from
	gfc_conv_gfc_desc_to_cfi_desc.

	PR fortran/91863
	* gfortran.dg/bind-c-intent-out.f90: New.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 65238ff623d..7eba1bbd082 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -5206,7 +5206,6 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym)
   int attribute;
   int cfi_attribute;
   symbol_attribute attr = gfc_expr_attr (e);
-  stmtblock_t block;
 
   /* If this is a full array or a scalar, the allocatable and pointer
  attributes can be passed. Otherwise it is 'CFI_attribute_other'*/
@@ -5325,18 +5324,6 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, gfc_expr *e, gfc_symbol *fsym)
   /* The CFI descriptor is passed to the bind_C procedure.  */
   parmse->expr = cfi_desc_ptr;
 
-  /* Free the CFI descriptor.  */
-  gfc_init_block (&block);
-  cond = fold_build2_loc (input_location, NE_EXPR,
-			  logical_type_node, cfi_desc_ptr,
-			  build_int_cst (TREE_TYPE (cfi_desc_ptr), 0));
-  tmp = gfc_call_free (cfi_desc_ptr);
-  gfc_add_expr_to_block (&block, tmp);
-  tmp = build3_v (COND_EXPR, cond,
-		  gfc_finish_block (&block),
-		  build_empty_stmt (input_location));
-  gfc_prepend_expr_to_block (&parmse->post, tmp);
-
   /* Transfer values back to gfc descriptor.  */
   tmp = gfc_build_addr_expr (NULL_TREE, parmse->expr);
   tmp = build_call_expr_loc (input_location,
@@ -6250,8 +6237,14 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		  gfc_add_expr_to_block (&se->pre, tmp);
 		  }
 
-		  tmp = build_fold_indirect_ref_loc (input_location,
-		 parmse.expr);
+		  tmp = parmse.expr;
+		  /* With bind(C), the actual argument is replaced by a bind-C
+		 descriptor; in this case, the data component arrives here,
+		 which shall not be dereferenced, but still freed and
+		 nullified.  */
+		  if  (TREE_TYPE(tmp) != pvoid_type_node)
+		tmp = build_fold_indirect_ref_loc (input_location,
+		   parmse.expr);
 		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp)))
 		tmp = gfc_conv_descriptor_data_get (tmp);
 		  tmp = gfc_deallocate_with_status (tmp, NULL_TREE, NULL_TREE,
diff --git a/gcc/testsuite/gfortran.dg/bind-c-intent-out.f90 b/gcc/testsuite/gfortran.dg/bind-c-intent-out.f90
new file mode 100644
index 000..493e546d45d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bind-c-intent-out.f90
@@ -0,0 +1,41 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-original" }
+!
+! PR fortran/91863
+!
+! Contributed by G. Steinmetz
+!
+
+subroutine sub(x) bind(c)
+  implicit none (type, external)
+  integer, allocatable, intent(out) :: x(:)
+
+  allocate(x(3:5))
+  x(:) = [1, 2, 3]
+end subroutine sub
+
+
+program p
+  implicit none (type, external)
+  interface
+subroutine sub(x) bind(c)
+  integer, allocatable, intent(out) :: x(:)
+end
+  end interface
+  integer, allocatable :: a(:)
+
+  call sub(a)
+  if (.not.allocated(a)) stop 1
+  if (any(shape(a) /= [3])) stop 2
+  if (lbound(a,1) /= 3 .or. ubound(a,1) /= 5) stop 3
+  if (any(a /= [1, 2, 3])) stop 4
+end program p
+
+! "cfi" only appears in context of "a" -> bind-C descriptor
+! the intent(out) implies freeing in the callee (!), hence the "free"
+! It is the only 'free' as 'a' is part of the main program and, hence, implicitly has the SAVE attribute.
+! The  'cfi = 0' appears before the call due to the deallocate and when preparing the C descriptor
+
+! { dg-final { scan-tree-dump-times "__builtin_free" 1 "original" } }
+! { dg-final { scan-tree-dump-times

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-23 Thread Richard Earnshaw (lists)


On 23/10/2019 09:28, Christophe Lyon wrote:

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in 
such a way that it will run the GCC testsuite after each of your 
patches, so I'll keep in mind not to report regressions (I've noticed 
several already).



I can perform a manual validation taking your 29 patches as a single 
one and compare the results with those of the revision preceding the 
one were you committed patch #1. Do you think it would be useful?



Christophe




I think if you can filter out any that are removed by later patches 
and then report against the patch that caused the regression itself 
then that would be the best.  But I realise that would be more work 
for you, so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also 
reported by the validation of r277179 (early split most DImode 
comparison operations)


When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)


That's strange.  The assembler code generated for that test is unchanged 
from before the patch series, so I can't see how it can't be a problem 
in the test itself.  What's more, I can't seem to reproduce this myself.


Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r 
are also unchanged, while the malloc_[un]lock functions are empty stubs 
(not surprising as we aren't multi-threaded).


So the only thing that looks to have really changed are the linker 
offsets (some of the library code has changed, but I don't think it's 
really reached in practice, so shouldn't be relevant).




I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator new), 
then _malloc_r

The qemu traces end with:


What do you mean by 'end with'?  What's the failure mode of the test?  A 
crash, or the test exiting with a failure code?



IN: _malloc_r^M
0x00019224:  e3a00ffe  mov  r0, #0x3f8^M
0x00019228:  e3a0c07f  mov  ip, #0x7f^M
0x0001922c:  e3a0e07e  mov  lr, #0x7e^M
0x00019230:  eafffe41  b    #0x18b3c^M
^M
R00=00049418 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=0060 R13=fffeb460 R14=00018b14 R15=00019224^M
PSR=2010 --C- A usr32^M
^M
IN: _malloc_r^M
0x00018b3c:  e59f76f8  ldr  r7, [pc, #0x6f8]^M
0x00018b40:  e087  add  r0, r7, r0^M
0x00018b44:  e5903004  ldr  r3, [r0, #4]^M
0x00018b48:  e248  sub  r0, r0, #8^M
0x00018b4c:  e153  cmp  r0, r3^M
0x00018b50:  1a05  bne  #0x18b6c^M


But this block neither jumps to, nor falls through to 

^M
R00=03f8 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b3c^M
PSR=2010 --C- A usr32^M
R00=00049c30 R01= R02=0554 R03=00049c30^M
R04= R05=0808 R06=00049418 R07=00049840^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b54^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M


...here.  So there's some trace missing by the looks of it; or some 
other problem.



0x00019120:  e1a02a0b  lsl  r2, fp, #0x14^M
0x00019124:  e1a02a22  lsr  r2, r2, #0x14^M
0x00019128:  e352  cmp  r2, #0^M
0x0001912c:  1afffee7  bne  #0x18cd0^M


and the same here.


^M
R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001000 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00019120^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M
0x00019130:  e5974008  ldr  r4, [r7, #8]^M
0x00019134:  e0898008  add  r8, sb, r8^M
0x00019138:  e3888001  orr  r8, r8, #1^M
0x0001913c:  e5848004  str  r8, [r4, #4]^M
0x00019140:  ea14  b    #0x18d98

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Eduard-Mihai Burtescu

On Tue, Oct 22, 2019, at 9:39 PM, Ian Lance Taylor wrote:
> I have to assume that C++ demangling is still quite a bit more common
> than Rust demangling, so it's troubling that it looks like we're going
> to do extra work for each symbol that starts with _ZN, which is not a
> particularly uncommon prefix for a C++ mangled name.  Is there some
> way we can quickly separate out Rust symbols?  Or should we try C++
> demangling first?
> 
> Ian
>

I definitely agree, I don't want to make demangling plain C++ symbols
significantly slower. The old code was also doing extra work, at least
in the AUTO_DEMANGLING mode, but less than the parse_ident
loop in this patch.

I've come up with an extra quick check that regular C++ symbols
won't pass most of the time and placed it before the parse_ident
loop, that should make it comparable with the old implementation,
and tests pass just fine with the extra check.

The diff is below, but if you want me to send a combined patch,
or anything else for that matter, please let me know.

diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index da707dbab9b..4cb189c4019 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -384,6 +384,14 @@ rust_demangle_callback (const char *mangled, int options,
 return 0;
   rdm.sym_len--;

+  /* Legacy Rust symbols also always end with a path segment
+ that encodes a 16 hex digit hash, i.e. '17h[a-f0-9]{16}'.
+ This early check, before any parse_ident calls, should
+ quickly filter out most C++ symbols unrelated to Rust. */
+  if (!(rdm.sym_len > 19
+&& !strncmp (&rdm.sym[rdm.sym_len - 19], "17h", 3)))
+return 0;
+
   do
 {
   ident = parse_ident (&rdm);

[PATCH, GCC/ARM, 0/10] Add support for Armv8.1-M Mainline Security Extension

2019-10-23 Thread Mihail Ionescu


This is a patch series to implement support for the Armv8.1-M Mainline Security
Extensions. The specification can be found in:
https://developer.arm.com/docs/ddi0553/latest


Mihail Ionescu(10)
[PATCH, GCC/ARM, 1/10] Fix -mcmse check in libgcc
[PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline
[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions
[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM
[PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM
[PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function
[PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions
[PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function
[PATCH, GCC/ARM, 9/10] Call nscall function with blxns
[PATCH, GCC/ARM, 10/10] Enable -mcmse

Regards,
Mihail


all-patches.tar.gz
Description: application/gzip

[PATCH, GCC/ARM, 10/10] Enable -mcmse

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 10/10] Enable -mcmse

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable the
-mcmse option now that support for Armv8.1-M Security Extension is
complete.

=== Patch description ===

The patch is straightforward: it redefines ARMv8_1m_main as having the
same features as ARMv8m_main (and thus as having the cmse feature) with
the extra features represented by armv8_1m_main.  It also removes the
error for using -mcmse on Armv8.1-M Mainline.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm-cpus.in (ARMv8_1m_main): Redefine as an extension to
Armv8-M Mainline.
* config/arm/arm.c (arm_options_perform_arch_sanity_checks): Remove
error for using -mcmse when targeting Armv8.1-M Mainline.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
652f2a4be9388fd7a74f0ec4615a292fd1cfcd36..a845dd2f83a38519a1387515a2d4646761fb405f
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -259,10 +259,7 @@ define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
-# Feature cmse is omitted to disable Security Extensions support while secure
-# code compiled by GCC does not preserve FP context as allowed by Armv8.1-M
-# Mainline.
-define fgroup ARMv8_1m_main ARMv7m armv8 armv8_1m_main
+define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main
 
 # Useful combinations.
 define fgroup VFPv2vfpv2
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
cabcce8c8bd11c5ff3516c3102c0305b865b00cb..0f19b4eb4ec4fcca2df10e1b8e0b79d1a1e0a93d
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3742,9 +3742,6 @@ arm_options_perform_arch_sanity_checks (void)
   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
 sorry ("__fp16 and no ldrh");
 
-  if (use_cmse && arm_arch8_1m_main)
-error ("ARMv8.1-M Mainline Security Extensions is unsupported");
-
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");
 

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
652f2a4be9388fd7a74f0ec4615a292fd1cfcd36..a845dd2f83a38519a1387515a2d4646761fb405f
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -259,10 +259,7 @@ define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
-# Feature cmse is omitted to disable Security Extensions support while secure
-# code compiled by GCC does not preserve FP context as allowed by Armv8.1-M
-# Mainline.
-define fgroup ARMv8_1m_main ARMv7m armv8 armv8_1m_main
+define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main
 
 # Useful combinations.
 define fgroup VFPv2vfpv2
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
cabcce8c8bd11c5ff3516c3102c0305b865b00cb..0f19b4eb4ec4fcca2df10e1b8e0b79d1a1e0a93d
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3742,9 +3742,6 @@ arm_options_perform_arch_sanity_checks (void)
   if (!arm_arch4 && arm_fp16_format != ARM_FP16_FORMAT_NONE)
 sorry ("__fp16 and no ldrh");
 
-  if (use_cmse && arm_arch8_1m_main)
-error ("ARMv8.1-M Mainline Security Extensions is unsupported");
-
   if (use_cmse && !arm_arch_cmse)
 error ("target CPU does not support ARMv8-M Security Extensions");

[PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 7/10] Clear all VFP regs inline in hardfloat nscall functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
inline instructions to save, clear and restore callee-saved VFP
registers before doing a call to a function with the cmse_nonsecure_call
attribute.

=== Patch description ===

The patch is fairly straightforward in its approach and consist of the
following 3 logical changes:
- abstract the number of floating-point register to clear in
  max_fp_regno
- use max_fp_regno to decide how many registers to clear so that the
  same code works for Armv8-M and Armv8.1-M Mainline
- emit vpush and vpop instruction respectively before and after a
  nonsecure call

Note that as in the patch to clear GPRs inline, debug information has to
be disabled for VPUSH and VPOP due to VPOP adding CFA adjustment note
for SP when R7 is sometimes used as CFA.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm.c (vfp_emit_fstmd): Declare early.
(arm_emit_vfp_multi_reg_pop): Likewise.
(cmse_nonsecure_call_inline_register_clear): Abstract number of VFP
registers to clear in max_fp_regno.  Emit VPUSH and VPOP to save and
restore callee-saved VFP registers.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Add check for
VPUSH and VPOP and update expectation for VSCCLRM.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
c24996897eb21c641914326f7064a26bbb363411..bcc86d50a10f11d9672258442089a0aa5c450b2f
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -188,6 +188,8 @@ static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
 static void arm_emit_multi_reg_pop (unsigned long);
+static int vfp_emit_fstmd (int, int);
+static void arm_emit_vfp_multi_reg_pop (int, int, rtx);
 static int arm_arg_partial_bytes (cumulative_args_t,
  const function_arg_info &);
 static rtx arm_function_arg (cumulative_args_t, const function_arg_info &);
@@ -17834,8 +17836,10 @@ cmse_nonsecure_call_inline_register_clear (void)
  unsigned address_regnum, regno;
  unsigned max_int_regno =
clear_callee_saved ? IP_REGNUM : LAST_ARG_REGNUM;
+ unsigned max_fp_regno =
+   TARGET_HAVE_FPCTX_CMSE ? LAST_VFP_REGNUM : D7_VFP_REGNUM;
  unsigned maxregno =
-   TARGET_HARD_FLOAT_ABI ? D7_VFP_REGNUM : max_int_regno;
+   TARGET_HARD_FLOAT_ABI ? max_fp_regno : max_int_regno;
  auto_sbitmap to_clear_bitmap (maxregno + 1);
  rtx_insn *seq;
  rtx pat, call, unspec, clearing_reg, ip_reg, shift;
@@ -17883,7 +17887,7 @@ cmse_nonsecure_call_inline_register_clear (void)
 
  bitmap_clear (float_bitmap);
  bitmap_set_range (float_bitmap, FIRST_VFP_REGNUM,
-   D7_VFP_REGNUM - FIRST_VFP_REGNUM + 1);
+   max_fp_regno - FIRST_VFP_REGNUM + 1);
  bitmap_ior (to_clear_bitmap, to_clear_bitmap, float_bitmap);
}
 
@@ -17960,6 +17964,16 @@ cmse_nonsecure_call_inline_register_clear (void)
  /* Disable frame debug info in push because it needs to be
 disabled for pop (see below).  */
  RTX_FRAME_RELATED_P (push_insn) = 0;
+
+ /* Save VFP callee-saved registers.  */
+ if (TARGET_HARD_FLOAT_ABI)
+   {
+ vfp_emit_fstmd (D7_VFP_REGNUM + 1,
+ (max_fp_regno - D7_VFP_REGNUM) / 2);
+ /* Disable frame debug info in push because it needs to be
+disabled for vpop (see below).  */
+ RTX_FRAME_RELATED_P (get_last_insn ()) = 0;
+   }
}
 
  /* Clear caller-saved registers that leak before doing a non-secure
@@ -17974,9 +17988,25 @@ cmse_nonsecure_call_inline_register_clear (void)
 
  if (TARGET_HAVE_FPCTX_CMSE)
{
- rtx_insn *next, *pop_insn, *after = insn;
+ rtx_insn *next, *la

[PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 5/10] Clear VFP registers with VSCCLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
VSCCLRM to do all the VFP register clearing as well as clearing the VFP
register.

=== Patch description ===

This patch adds a new pattern for the VSCCLRM instruction.
cmse_clear_registers () is then modified to use the new VSCCLRM
instruction when targeting Armv8.1-M Mainline, thus, making the Armv8-M
register clearing code specific to Armv8-M.

Since the VSCCLRM instruction mandates VPR in the register list, the
pattern is encoded with a parallel which only requires an unspecified
VUNSPEC_CLRM_VPR constant modelling the APSR clearing. Other expression
in the parallel are expected to be set expression for clearing the VFP
registers.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm-protos.h (clear_operation_p): Adapt prototype.
* config/arm/arm.c (clear_operation_p): Extend to be able to check a
clear_vfp_multiple pattern based on a new vfp parameter.
(cmse_clear_registers): Generate VSCCLRM to clear VFP registers when
targeting Armv8.1-M Mainline.
(cmse_nonsecure_entry_clear_before_return): Clear VFP registers
unconditionally when targeting Armv8.1-M Mainline architecture.  Check
whether VFP registers are available before looking call_used_regs for a
VFP register.
* config/arm/predicates.md (clear_multiple_operation): Adapt to change
of prototype of clear_operation_p.
(clear_vfp_multiple_operation): New predicate.
* config/arm/unspecs.md (VUNSPEC_VSCCLRM_VPR): New volatile unspec.
* config/arm/vfp.md (clear_vfp_multiple): New define_insn.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/bitfield-1.c: Add check for VSCCLRM.
* gcc.target/arm/cmse/bitfield-2.c: Likewise.
* gcc.target/arm/cmse/bitfield-3.c: Likewise.
* gcc.target/arm/cmse/cmse-1.c: Likewise.
* gcc.target/arm/cmse/struct-1.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.

Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
1a948d2c97526ad7e67e8d4a610ac74cfdb13882..37a46982bbc1a8f17abe2fc76ba3cb7d65257c0d
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -77,7 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, 
HOST_WIDE_INT);
 extern int thumb1_legitimate_address_p (machine_mode, rtx, int);
 extern bool ldm_stm_operation_p (rtx, bool, machine_mode mode,
  bool, bool);
-extern bool clear_operation_p (rtx);
+extern bool clear_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int vfp3_const_double_rtx (rtx);
 extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, int *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f1f730cecff0fb3da7115ea1147dc8b9ab7076b7..5f3ce5c4605f609d1a0e31c0f697871266bdf835
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13499,8 +13499,9 @@ ldm_stm_operation_p (rtx op, bool load, machine_mode 
mode,
   return true;
 }
 
-/* Checks whether OP is a valid parallel pattern for a CLRM insn.  To be a
-   valid CLRM pattern, OP must have the following form:
+/* Checks whether OP is a valid parallel pattern for a CLRM (if VFP is false)
+   or VSCCLRM (otherwise) insn.  To be a valid CLRM pattern, OP must have the
+   following form:
 
[(set (reg:SI ) (const_int 0))
 (set (reg:SI ) (const_int 0))
@@ -13511,22 +13512,35 @@ ldm_stm_operation_p (rtx op, bool load, machine_mode 
mode,
 
Any number (including 0) of set expressions is valid, the volatile unspec

[PATCH, GCC/ARM, 1/10] Fix -mcmse check in libgcc

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 1/10] Fix -mcmse check in libgcc

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to fix the
check to determine whether -mcmse is supported by the host compiler.

=== Patch description ===

Code to detect whether cmse.c can be buit with -mcmse checks the output
of host GCC when invoked with -mcmse. However, an error from the
compiler does not prevent some minimal output so this always holds true.

This does not affect currently supported architectures since the test is
guarded by __ARM_FEATURE_CMSE which is only defined for Armv8-M Baseline
and Mainline and these two architectures accept -mcmse.

However, in the intermediate patches adding support for Armv8.1-M
Mainline, support for Security Extensions is disabled until fully
implemented. This leads to libgcc/config/arm/cmse.c being built with
-mcmse due to the broken test which fails in the intermediate commits.

This patch instead change the test to look at the return value of the
host gcc when invoked with -mcmse.


ChangeLog entry is as follows:

*** libgcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/t-arm: Check return value of gcc rather than lack of
output.

Testing: Bootstrapped and tested on arm-none-eabi.
Without this patch, GCC stops building after the second patch
of this series.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm
index 
274bf2a8ef33c5e8a8ee2b246aba92d30297abe1..f2b927f3686a8c0a8e37abfe2d7768f2050d4fb3
 100644
--- a/libgcc/config/arm/t-arm
+++ b/libgcc/config/arm/t-arm
@@ -3,7 +3,7 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi 
_thumb1_case_shi \
_thumb1_case_uhi _thumb1_case_si _speculation_barrier
 
 HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E 
- /dev/null),)
+ifeq ($(shell $(gcc_compile_bare) -E -mcmse - /dev/null 
2>/dev/null; echo $?),0)
 CMSE_OPTS:=-mcmse
 endif
 

diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm
index 
274bf2a8ef33c5e8a8ee2b246aba92d30297abe1..f2b927f3686a8c0a8e37abfe2d7768f2050d4fb3
 100644
--- a/libgcc/config/arm/t-arm
+++ b/libgcc/config/arm/t-arm
@@ -3,7 +3,7 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi 
_thumb1_case_shi \
_thumb1_case_uhi _thumb1_case_si _speculation_barrier
 
 HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E 
- /dev/null),)
+ifeq ($(shell $(gcc_compile_bare) -E -mcmse - /dev/null 
2>/dev/null; echo $?),0)
 CMSE_OPTS:=-mcmse
 endif

[PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 8/10] Do lazy store & load inline when calling nscall function

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
lazy store and load instruction inline when calling a function with the
cmse_nonsecure_call attribute with the soft or softfp floating-point
ABI.

=== Patch description ===

This patch adds two new patterns for the VLSTM and VLLDM instructions.
cmse_nonsecure_call_inline_register_clear is then modified to
generate VLSTM and VLLDM respectively before and after calls to
functions with the cmse_nonsecure_call attribute in order to have lazy
saving, clearing and restoring of VFP registers. Since these
instructions do not do writeback of the base register, the stack is adjusted
prior the lazy store and after the lazy load with appropriate frame
debug notes to describe the effect on the CFA register.

As with CLRM, VSCCLRM and VSTR/VLDR, the instruction is modeled as an
unspecified operation to the memory pointed to by the base register.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm.c (arm_add_cfa_adjust_cfa_note): Declare early.
(cmse_nonsecure_call_inline_register_clear): Define new lazy_fpclear
variable as true when floating-point ABI is not hard.  Replace
check against TARGET_HARD_FLOAT_ABI by checks against lazy_fpclear.
Generate VLSTM and VLLDM instruction respectively before and
after a function call to cmse_nonsecure_call function.
* config/arm/unspecs.md (VUNSPEC_VLSTM): Define unspec.
(VUNSPEC_VLLDM): Likewise.
* config/arm/vfp.md (lazy_store_multiple_insn): New define_insn.
(lazy_load_multiple_insn): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Add check for VLSTM 
and
VLLDM.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
bcc86d50a10f11d9672258442089a0aa5c450b2f..b10f996c023e830ca24ff83fcbab335caf85d4cb
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -186,6 +186,7 @@ static int arm_register_move_cost (machine_mode, 
reg_class_t, reg_class_t);
 static int arm_memory_move_cost (machine_mode, reg_class_t, bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
+static void arm_add_cfa_adjust_cfa_note (rtx, int, rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
 static void arm_emit_multi_reg_pop (unsigned long);
 static int vfp_emit_fstmd (int, int);
@@ -17830,6 +17831,9 @@ cmse_nonsecure_call_inline_register_clear (void)
   FOR_BB_INSNS (bb, insn)
{
  bool clear_callee_saved = TARGET_HAVE_FPCTX_CMSE;
+ /* frame = VFP regs + FPSCR + VPR.  */
+ unsigned lazy_store_stack_frame_size =
+   (LAST_VFP_REGNUM - FIRST_VFP_REGNUM + 1 + 2) * UNITS_PER_WORD;
  unsigned long callee_saved_mask =
((1 << (LAST_HI_REGNUM + 1)) - 1)
& ~((1 << (LAST_ARG_REGNUM + 1)) - 1);
@@ -17847,7 +17851,7 @@ cmse_nonsecure_call_inline_register_clear (void)
  CUMULATIVE_ARGS args_so_far_v;
  cumulative_args_t args_so_far;
  tree arg_type, fntype;
- bool first_param = true;
+ bool first_param = true, lazy_fpclear = !TARGET_HARD_FLOAT_ABI;
  function_args_iterator args_iter;
  uint32_t padding_bits_to_clear[4] = {0U, 0U, 0U, 0U};
 
@@ -17881,7 +17885,7 @@ cmse_nonsecure_call_inline_register_clear (void)
 -mfloat-abi=hard.  For -mfloat-abi=softfp we will be using the
 lazy store and loads which clear both caller- and callee-saved
 registers.  */
- if (TARGET_HARD_FLOAT_ABI)
+ if (!lazy_fpclear)
{
  auto_sbitmap float_bitmap (maxregno + 1);
 
@@ -17965,8 +17969,23 @@ cmse_nonsecure_call_inline_register_clear (void)
 disabled for pop (see below).  */
  RTX_FRAME_RELATED_P (push_insn) = 0;
 
+ /* Lazy store multiple.  */
+ if (lazy_fpcl

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM instruction
when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the register
list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm-protos.h (clear_operation_p): Declare.
* config/arm/arm.c (clear_operation_p): New function.
(cmse_clear_registers): Generate clear_multiple instruction pattern if
targeting Armv8.1-M Mainline or successor.
(output_return_instruction): Only output APSR register clearing if
Armv8.1-M Mainline instructions not available.
(thumb_exit): Likewise.
* config/arm/predicates.md (clear_multiple_operation): New predicate.
* config/arm/thumb2.md (clear_apsr): New define_insn.
(clear_multiple): Likewise.
* config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile unspec.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
* gcc.target/arm/cmse/bitfield-2.c: Likewise.
* gcc.target/arm/cmse/bitfield-3.c: Likewise.
* gcc.target/arm/cmse/struct-1.c: Likewise.
* gcc.target/arm/cmse/cmse-14.c: Likewise.
* gcc.target/arm/cmse/cmse-1.c: Likewise.  Restrict checks for Armv8-M
GPR clearing when CLRM is not available.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -77,6 +77,7 @@ extern int thumb_legitimate_offset_p (machine_mode, 
HOST_WIDE_INT);
 extern int thumb1_legitimate_address_p (machine_mode, rtx, int);
 extern boo

[PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 6/10] Clear GPRs inline when calling nscall function

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to generate
inline callee-saved register clearing when calling a function with the
cmse_nonsecure_call attribute with the ultimate goal of having the whole
call sequence inline.

=== Patch description ===

Besides changing the set of registers that needs to be cleared inline,
this patch also generates the push and pop to save and restore
callee-saved registers without trusting the callee inline. To make the
code more future-proof, this (currently) Armv8.1-M specific behavior is
expressed in terms of clearing of callee-saved registers rather than
directly based on the targets.

The patch contains 1 subtlety:

Debug information is disabled for push and pop because the
REG_CFA_RESTORE notes used to describe popping of registers do not stack.
Instead, they just reset the debug state for the register to the one at
the beginning of the function, which is incorrect for a register that is
pushed twice (in prologue and before nonsecure call) and then popped for
the first time. In particular, this occasionally trips CFI note creation
code when there are two codepaths to the epilogue, one of which does not
go through the nonsecure call. Obviously this mean that debugging
between the push and pop is not reliable.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm.c (arm_emit_multi_reg_pop): Declare early.
(cmse_nonsecure_call_clear_caller_saved): Rename into ...
(cmse_nonsecure_call_inline_register_clear): This.  Save and clear
callee-saved GPRs as well as clear ip register before doing a nonsecure
call then restore callee-saved GPRs after it when targeting
Armv8.1-M Mainline.
(arm_reorg): Adapt to function rename.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-1.c: Add check for PUSH and POP and update
CLRM check.
* gcc.target/arm/cmse/cmse-14.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/union-1.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
fca10801c87c5e635d573c0fbdc47a1ae229d0ef..12b4b42a66b0c5589690d9a2d8cf8e42712ca2c0
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -187,6 +187,7 @@ static int arm_memory_move_cost (machine_mode, reg_class_t, 
bool);
 static void emit_constant_insn (rtx cond, rtx pattern);
 static rtx_insn *emit_set_insn (rtx, rtx);
 static rtx emit_multi_reg_push (unsigned long, unsigned long);
+static void arm_emit_multi_reg_pop (unsigned long);
 static int arm_arg_partial_bytes (cumulative_args_t,
  const function_arg_info &);
 static rtx arm_function_arg (cumulative_args_t, const function_arg_info &);
@@ -17810,13 +17811,13 @@ cmse_clear_registers (sbitmap to_clear_bitmap, 
uint32_t *padding_bits_to_clear,
 }
 }
 
-/* Clears caller saved registers not used to pass arguments before a
-   cmse_nonsecure_ca

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable
saving/restoring of nonsecure FP context in function with the
cmse_nonsecure_entry attribute.

=== Motivation ===

In Armv8-M Baseline and Mainline, the FP context is cleared on return from
nonsecure entry functions. This means the FP context might change when
calling a nonsecure entry function. This patch uses the new VLDR and
VSTR instructions available in Armv8.1-M Mainline to save/restore the FP
context when calling a nonsecure entry functionfrom nonsecure code.

=== Patch description ===

This patch consists mainly of creating 2 new instruction patterns to
push and pop special FP registers via vldm and vstr and using them in
prologue and epilogue. The patterns are defined as push/pop with an
unspecified operation on the memory accessed, with an unspecified
constant indicating what special FP register is being saved/restored.

Other aspects of the patch include:
  * defining the set of special registers that can be saved/restored and
their name
  * reserving space in the stack frames for these push/pop
  * preventing return via pop
  * guarding the clearing of FPSCR to target architecture not having
Armv8.1-M Mainline instructions.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm.c (fp_sysreg_names): Declare and define.
(use_return_insn): Also return false for Armv8.1-M Mainline.
(output_return_instruction): Skip FPSCR clearing if Armv8.1-M
Mainline instructions are available.
(arm_compute_frame_layout): Allocate space in frame for FPCXTNS
when targeting Armv8.1-M Mainline Security Extensions.
(arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M
Mainline entry function.
(cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if
targeting Armv8.1-M Mainline or successor.
(arm_expand_epilogue): Fix indentation of caller-saved register
clearing.  Restore FPCXTNS if this is an Armv8.1-M Mainline
entry function.
* config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro.
(FP_SYSREGS): Likewise.
(enum vfp_sysregs_encoding): Define enum.
(fp_sysreg_names): Declare.
* config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile unspec.
* config/arm/vfp.md (push_fpsysreg_insn): New define_insn.
(pop_fpsysreg_insn): Likewise.

*** gcc/testsuite/Changelog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and VLDR.
* gcc.target/arm/cmse/bitfield-2.c: Likewise.
* gcc.target/arm/cmse/bitfield-3.c: Likewise.
* gcc.target/arm/cmse/cmse-1.c: Likewise.
* gcc.target/arm/cmse/struct-1.c: Likewise.
* gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M Mainline tests
from mainline/8m subdirectory and new Armv8.1-M Mainline tests from
mainline/8_1m subdirectory.
* gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This.
* gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This.
* gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This.
* gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This.
* gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This.
* gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This.
* gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move and rename
into ...
* gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This.  Clean up
dg-skip-if directive for float ABI.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This.  Clean up
dg-skip-if directive for float ABI.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This.  Clean up
dg-skip-if directive for float ABI.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Move into ...
* gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-8.c: This.  Clean up
dg-skip-if directive for float ABI.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: Move into ...
*

[PATCH, GCC/ARM, 9/10] Call nscall function with blxns

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 9/10] Call nscall function with blxns

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to call
functions with the cmse_nonsecure_call attribute directly using blxns
with no undue restriction on the register used for that.

=== Patch description ===

This change to use BLXNS to call a nonsecure function from secure
directly (not using a libcall) is made in 2 steps:
- change nonsecure_call patterns to use blxns instead of calling
  __gnu_cmse_nonsecure_call
- loosen requirement for function address to allow any register when
  doing BLXNS.

The former is a straightforward check over whether instructions added in
Armv8.1-M Mainline are available while the latter consist in making the
nonsecure call pattern accept any register by using match_operand and
changing the nonsecure_call_internal expander to no force r4 when
targeting Armv8.1-M Mainline.

The tricky bit is actually in the test update, specifically how to check
that register lists for CLRM have all registers except for the one
holding parameters (already done) and the one holding the address used
by BLXNS. This is achieved with 3 scan-assembler directives.

1) The first one lists all registers that can appear in CLRM but make
   each of them optional.
   Property guaranteed: no wrong register is cleared and none appears
   twice in the register list.
2) The second directive check that the CLRM is made of a fixed number
   of the right registers to be cleared. The number used is the number
   of registers that could contain a secret minus one (used to hold the
   address of the function to call.
   Property guaranteed: register list has the right number of registers
   Cumulated property guaranteed: only registers with a potential secret
   are cleared and they are all listed but ont
3) The last directive checks that we cannot find a CLRM with a register
   in it that also appears in BLXNS. This is check via the use of a
   back-reference on any of the allowed register in CLRM, the
   back-reference enforcing that whatever register match in CLRM must be
   the same in the BLXNS.
   Property guaranteed: register used for BLXNS is different from
   registers cleared in CLRM.

Some more care needs to happen for the gcc.target/arm/cmse/cmse-1.c
testcase due to there being two CLRM generated. To ensure the third
directive match the right CLRM to the BLXNS, a negative lookahead is
used between the CLRM register list and the BLXNS. The way negative
lookahead work is by matching the *position* where a given regular
expression does not match. In this case, since it comes after the CLRM
register list it is requesting that what comes after the register list
does not have a CLRM again followed by BLXNS. This guarantees that the
.*blxns after only matches a blxns without another CLRM before.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm.md (nonsecure_call_internal): Do not force memory
address in r4 when targeting Armv8.1-M Mainline.
(nonsecure_call_value_internal): Likewise.
* config/arm/thumb2.md (nonsecure_call_reg_thumb2): Make memory address
a register match_operand again.  Emit BLXNS when targeting
Armv8.1-M Mainline.
(nonsecure_call_value_reg_thumb2): Likewise.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-1.c: Add check for BLXNS when instructions
introduced in Armv8.1-M Mainline Security Extensions are available and
restrict checks for libcall to __gnu_cmse_nonsecure_call to Armv8-M
targets only.  Adapt CLRM check to verify register used for BLXNS is
not in the CLRM register list.
* gcc.target/arm/cmse/cmse-14.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise and adapt
check for LSB clearing bit to be using the same register as BLXNS when
targeting Armv8.1-M Mainline.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: Likewise.
* gcc.target/arm

[PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-10-23 Thread Mihail Ionescu

[PATCH, GCC/ARM, 2/10] Add command line support

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to add
command-line support for that new architecture.

=== Patch description ===

Besides the expected enabling of the new value for the -march
command-line option (-march=armv8.1-m.main) and its extensions (see
below), this patch disables support of the Security Extensions for this
newly added architecture. This is done both by not including the cmse
bit in the architecture description and by throwing an error message
when user request Armv8.1-M Mainline Security Extensions. Note that
Armv8-M Baseline and Mainline Security Extensions are still enabled.

Only extensions for already supported instructions are implemented in
this patch. Other extensions (MVE integer and float) will be added in
separate patches. The following configurations are allowed for Armv8.1-M
Mainline with regards to FPU and implemented in this patch:
+ no FPU (+nofp)
+ single precision VFPv5 with FP16 (+fp)
+ double precision VFPv5 with FP16 (+fp.dp)

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8_1m_main): New feature.
(ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k,
ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve,
ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
(ARMv8_1m_main): New feature group.
(armv8.1-m.main): New architecture.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1m_main): Define and default initialize.
(arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
(arm_options_perform_arch_sanity_checks): Error out when targeting
Armv8.1-M Mainline Security Extensions.
* config/arm/arm.h (arm_arch8_1m_main): Declare.

*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu  
2019-10-23  Thomas Preud'homme  

* lib/target-supports.exp
(check_effective_target_arm_arch_v8_1m_main_ok): Define.
(add_options_for_arm_arch_v8_1m_main): Likewise.
(check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; testsuite
shows no regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a4be9388fd7a74f0ec4615a292fd1cfcd36
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -126,6 +126,9 @@ define feature armv8_5
 # M-Profile security extensions.
 define feature cmse
 
+# Architecture rel 8.1-M.
+define feature armv8_1m_main
+
 # Floating point and Neon extensions.
 # VFPv1 is not supported in GCC.
 
@@ -223,21 +226,21 @@ define fgroup ALL_FPU_INTERNALvfpv2 vfpv3 vfpv4 fpv5 
fp16conv fp_dbl ALL_SIMD_I
 # -mfpu support.
 define fgroup ALL_FP   fp16 ALL_FPU_INTERNAL
 
-define fgroup ARMv4   armv4 notm
-define fgroup ARMv4t  ARMv4 thumb
-define fgroup ARMv5t  ARMv4t armv5t
-define fgroup ARMv5te ARMv5t armv5te
-define fgroup ARMv5tejARMv5te
-define fgroup ARMv6   ARMv5te armv6 be8
-define fgroup ARMv6j  ARMv6
-define fgroup ARMv6k  ARMv6 armv6k
-define fgroup ARMv6z  ARMv6
-define fgroup ARMv6kz ARMv6k quirk_armv6kz
-define fgroup ARMv6zk ARMv6k
-define fgroup ARMv6t2 ARMv6 thumb2
+define fgroup ARMv4 armv4 notm
+define fgroup ARMv4tARMv4 thumb
+define fgroup ARMv5tARMv4t armv5t
+define fgroup ARMv5te   ARMv5t armv5te
+define fgroup ARMv5tej  ARMv5te
+define fgroup ARMv6 ARMv5te armv6 be8
+define fgroup ARMv6jARMv6
+define fgroup ARMv6kARMv6 armv6k
+define fgroup ARMv6zARMv6
+define fgroup ARMv6kz   ARMv6k quirk_armv6kz
+define fgroup ARMv6zk   ARMv6k
+define fgroup ARMv6t2   ARMv6 thumb2
 # This is suspect.  ARMv6-m doesn't really pull in any useful features
 # from ARMv5* or ARMv6.
-define fgroup ARMv6m  armv4 thumb armv5t armv5te armv6 be8
+define fgroup ARMv6marmv4 thumb armv5t armv5te armv6 be8
 # This is suspect, the 'common' ARMv7 subset excludes the thumb2 'DSP' and
 # integer SIMD instructions that are in ARMv6T2.  */
 define fgroup ARMv7   ARMv6m thumb2 armv7
@@ -256,6 +259,10 @@ define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
+# Feature cmse is omitted to disable Security Extensions support while secure
+# code compiled by GCC does not preserve FP context as allowed by Armv8

Reduce inline-heuristics-hint-percent (to fix exchange2 regression)

2019-10-23 Thread Jan Hubicka

Hi,
this patch reduces inline-heuristics-hint-percent so inliner behaves
more similarly to what it did before I introduced this param.

Bootstrapped/regtested x86_64-linux, comitted.
I plan to do more tuning on this parameter, but the value of 1600 was
actually a typo.

Index: ChangeLog
===
--- ChangeLog   (revision 277332)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2019-10-23  Jan Hubicka  
+
+   PR ipa/92074
+   * params.def (inline-heuristics-hint-percent): Set to 600.
+
 2019-10-23  Richard Biener  
 
PR tree-optimization/65930
Index: params.def
===
--- params.def  (revision 277330)
+++ params.def  (working copy)
@@ -105,7 +105,7 @@ DEFPARAM (PARAM_MAX_INLINE_INSNS_SMALL,
 DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT,
  "inline-heuristics-hint-percent",
  "The scale (in percents) applied to inline-insns-single and auto 
limits when heuristics hints that inlining is very profitable with -O3 and 
-Ofast.",
- 1600, 100, 100)
+ 600, 100, 100)
 
 DEFPARAM (PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2,
  "inline-heuristics-hint-percent-O2",

Re: Reduce inline-heuristics-hint-percent (to fix exchange2 regression)

2019-10-23 Thread Jeff Law

On 10/23/19 9:07 AM, Jan Hubicka wrote:
> Hi,
> this patch reduces inline-heuristics-hint-percent so inliner behaves
> more similarly to what it did before I introduced this param.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> I plan to do more tuning on this parameter, but the value of 1600 was
> actually a typo.
> 
> Index: ChangeLog
> ===
> --- ChangeLog (revision 277332)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,8 @@
> +2019-10-23  Jan Hubicka  
> +
> + PR ipa/92074
> + * params.def (inline-heuristics-hint-percent): Set to 600.
Funny, I just contacted all the package owners that were failing because
of the inliner heuristic changes.  Given they were using the old
semantics, it's still good to encourage them to fix their code :-)


Jeff

Re: [PATCH] Fix algo constexpr tests in Debug mode

2019-10-23 Thread Jonathan Wakely


On 28/09/19 23:12 +0200, François Dumont wrote:

Here is what I just commited.

I try to use the asm trick in the _GLIBCXX_DEBUG_VERIFY_COND_AT but 
didn't notice any enhancement. So for now I kept my solution to just 
have a non-constexpr call compiler error.


I fix my patch to use __builtin_is_constant_evaluated rather than 
std::is_constant_evaluated in __valid_range.


    * include/bits/stl_algobase.h (__memmove): Return _Tp*.
    (__memmove): Loop as long as __n is not 0.
    (__copy_move<>::__copy_m): Likewise.
    (__copy_move_backward<>::__copy_move_b): Likewise.
    * testsuite/25_algorithms/copy/constexpr.cc: Add check on copied 
values.

    * testsuite/25_algorithms/copy_backward/constexpr.cc: Likewise.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New.
    * testsuite/25_algorithms/copy_backward/constexpr.cc: New.

    * include/debug/forward_list
(_Sequence_traits<__debug::forward_list<>>::_S_size): Returns __dp_sign
    distance when not empty.
    * include/debug/list
    (_Sequence_traits<__debug::list<>>::_S_size): Likewise.
    * include/debug/helper_functions.h (__dp_sign_max_size): New
    _Distance_precision enum entry.
    * include/debug/safe_iterator.h
    (__copy_move_a(_II, _II, const _Safe_iterator<>&)): Check for output
    iterator _M_can_advance as soon as input range distance precision is
    strictly higher than __dp_size.
    (__copy_move_a(const _Safe_iterator<>&, const _Safe_iterator<>&,
    const _Safe_iterator<>&)): Likewise.
    (__copy_move_backward_a(_II, _II, const _Safe_iterator<>&)): Likewise.
    (__copy_move_backward_a(const _Safe_iterator<>&,
    const _Safe_iterator<>&, const _Safe_iterator<>&)): Likewise.
    (__equal_aux(_II, _II, const _Safe_iterator<>&)): Likewise.
    (__equal_aux(const _Safe_iterator<>&,
    const _Safe_iterator<>&, const _Safe_iterator<>&)): Likewise.


I'm going to commit this small fix.


commit d78a141b86aca5a1265ec2df96428ef387492a1f
Author: Jonathan Wakely 
Date:   Wed Oct 23 16:19:28 2019 +0100

Only qualify function as constexpr for C++14 and later

This helper function is not a valid constexpr function in C++11, so
should only be marked constexpr for C++14 and later.

* include/debug/helper_functions.h (__valid_range): Change
_GLIBCXX_CONSTEXPR to _GLIBCXX14_CONSTEXPR.

diff --git a/libstdc++-v3/include/debug/helper_functions.h b/libstdc++-v3/include/debug/helper_functions.h
index 5a920bb9a6f..c3e7478f649 100644
--- a/libstdc++-v3/include/debug/helper_functions.h
+++ b/libstdc++-v3/include/debug/helper_functions.h
@@ -221,7 +221,7 @@ namespace __gnu_debug
 #endif
 
   template
-_GLIBCXX_CONSTEXPR
+_GLIBCXX14_CONSTEXPR
 inline bool
 __valid_range(_InputIterator __first, _InputIterator __last)
 {

Re: Fix PR90796

2019-10-23 Thread Michael Matz

Hello,

On Tue, 22 Oct 2019, Rainer Orth wrote:

> > testsuite/
> > * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case.
> 
> this testcase now FAILs on 32-bit targets (seen on i386-pc-solaris2.11
> and sparc-sun-solaris2.11, also reports for i686-pc-linux-gnu and
> i586-unknown-freebsd11.2):
> 
> +FAIL: gcc.dg/unroll-and-jam.c scan-tree-dump-times unrolljam "applying 
> unroll and jam" 6

Hrmpf, I'll have a look :-/  Thanks for noticing.


Ciao,
Michael.

Re: [PATCH] Remove redundant std::allocator members for C++20

2019-10-23 Thread Jonathan Wakely


On 22/10/19 23:09 +0100, Jonathan Wakely wrote:

On 22/10/19 22:40 +0100, Jonathan Wakely wrote:

C++20 removes a number of std::allocator members that have correct
defaults provided by std::allocator_traits, so aren't needed.

Several extensions including __gnu_cxx::hash_map and tr1 containers are
no longer usable with std::allocator in C++20 mode. They need to be
updated to use __gnu_cxx::__alloc_traits in a follow-up patch.

Tested powerpc64le-linux, committed to trunk.


I forgot to add the [[deprecated]] attribute to the members in C++17
mode, I'll do that when I fix the extensions and TR1 containers to
work in C++20 mode again.


This fixes the extensions, but still no deprecated attribute on the
std::allocator members.

Committed to trunk.

commit ea034f3d691dcd45d4ae750b69717d2cf19df6d4
Author: Jonathan Wakely 
Date:   Wed Oct 23 00:16:25 2019 +0100

Adjust extension types to use allocator_traits

This makes these extensions work with types meeting the Cpp17Allocator
requirements as well as the C++98 Allocator requirements.

* include/backward/hash_set (hash_set): Use __alloc_traits.
* include/backward/hashtable.h (_Hashtable): Likewise.
* include/ext/alloc_traits.h (__alloc_traits::allocate): Add overload
taking a hint.
* include/ext/extptr_allocator.h (_ExtPtr_allocator::allocate): Ignore
hint.
* include/ext/slist (_Slist_base): Use __alloc_traits.
* include/tr1/hashtable.h (_Hashtable): Likewise.
* include/tr1/regex (match_results): Use vector::const_reference
instead of assuming the allocator defines it.
* testsuite/backward/hash_map/23528.cc: Use allocator_traits in C++11.
* testsuite/tr1/6_containers/unordered_map/capacity/29134-map.cc: Use
__gnu_test::max_size.
* testsuite/tr1/6_containers/unordered_multimap/capacity/
29134-multimap.cc: Likewise.
* testsuite/tr1/6_containers/unordered_multiset/capacity/
29134-multiset.cc: Likewise.
* testsuite/tr1/6_containers/unordered_set/capacity/29134-set.cc:
Likewise.

diff --git a/libstdc++-v3/include/backward/hash_set b/libstdc++-v3/include/backward/hash_set
index 1445aa61e11..7f743fdf3af 100644
--- a/libstdc++-v3/include/backward/hash_set
+++ b/libstdc++-v3/include/backward/hash_set
@@ -88,6 +88,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_class_requires3(_HashFcn, size_t, _Value, _UnaryFunctionConcept)
   __glibcxx_class_requires3(_EqualKey, _Value, _Value, _BinaryPredicateConcept)
 
+  typedef __alloc_traits<_Alloc> _Alloc_traits;
+
 private:
   typedef hashtable<_Value, _Value, _HashFcn, _Identity<_Value>,
 			_EqualKey, _Alloc> _Ht;
@@ -101,10 +103,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   
   typedef typename _Ht::size_type size_type;
   typedef typename _Ht::difference_type difference_type;
-  typedef typename _Alloc::pointer pointer;
-  typedef typename _Alloc::const_pointer const_pointer;
-  typedef typename _Alloc::reference reference;
-  typedef typename _Alloc::const_reference const_reference;
+  typedef typename _Alloc_traits::pointer pointer;
+  typedef typename _Alloc_traits::const_pointer const_pointer;
+  typedef typename _Alloc_traits::reference reference;
+  typedef typename _Alloc_traits::const_reference const_reference;
   
   typedef typename _Ht::const_iterator iterator;
   typedef typename _Ht::const_iterator const_iterator;
diff --git a/libstdc++-v3/include/backward/hashtable.h b/libstdc++-v3/include/backward/hashtable.h
index df6ad85191c..cfb9cf957d2 100644
--- a/libstdc++-v3/include/backward/hashtable.h
+++ b/libstdc++-v3/include/backward/hashtable.h
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
@@ -280,14 +281,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef _Hashtable_node<_Val> _Node;
 
 public:
-  typedef typename _Alloc::template rebind::other allocator_type;
+  typedef typename __gnu_cxx::__alloc_traits<_Alloc>::template
+	rebind::other allocator_type;
+
   allocator_type
   get_allocator() const
   { return _M_node_allocator; }
 
 private:
-  typedef typename _Alloc::template rebind<_Node>::other _Node_Alloc;
-  typedef typename _Alloc::template rebind<_Node*>::other _Nodeptr_Alloc;
+  typedef __gnu_cxx::__alloc_traits _Alloc_traits;
+  typedef typename _Alloc_traits::template rebind<_Node>::other
+	_Node_Alloc;
+  typedef typename _Alloc_traits::template rebind<_Node*>::other
+	_Nodeptr_Alloc;
   typedef std::vector<_Node*, _Nodeptr_Alloc> _Vector_type;
 
   _Node_Alloc _M_node_allocator;
@@ -608,7 +614,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__n->_M_next = 0;
 	__try
 	  {
-	this->get_allocator().construct(&__n->_M_val, __obj);
+	allocator_type _

[PATCH] Adjust pb_ds extensions to use allocator_traits

2019-10-23 Thread Jonathan Wakely


This fixes the PDBS containers to support C++11-style allocators. I
consider the hours spent doing this completely wasted, but it was
necessary to keep the testsuite clean. I still want this code to go
away.

Tested powerpc64le-linux, committed to trunk.

These changes are largely useless, because most of them are simply
allowing 'reference' and 'const_reference' types to be obtained from an
allocator, and since C++11 allocators don't define reference types (they
just use plain lvalue references. Pretending to support C++98 allocators
with user-defined reference types is a waste of time (especially as
several of the pb_ds types appear to use a static allocator object which
means stateful allocators are not supported).

* include/ext/pb_ds/detail/bin_search_tree_/bin_search_tree_.hpp:
Use detail::rebind_traits.
* include/ext/pb_ds/detail/bin_search_tree_/node_iterators.hpp:
Likewise.
* include/ext/pb_ds/detail/bin_search_tree_/traits.hpp: Likewise.
* include/ext/pb_ds/detail/binary_heap_/binary_heap_.hpp: Likewise.
* include/ext/pb_ds/detail/binary_heap_/entry_cmp.hpp: Likewise.
* include/ext/pb_ds/detail/binary_heap_/entry_pred.hpp: Likewise.
* include/ext/pb_ds/detail/binary_heap_/point_const_iterator.hpp:
Likewise.
* include/ext/pb_ds/detail/binomial_heap_base_/binomial_heap_base_.hpp:
Likewise.
* include/ext/pb_ds/detail/branch_policy/branch_policy.hpp: Likewise.
* include/ext/pb_ds/detail/cc_hash_table_map_/cc_ht_map_.hpp: Likewise.
* include/ext/pb_ds/detail/cond_dealtor.hpp: Likewise.
* include/ext/pb_ds/detail/eq_fn/hash_eq_fn.hpp (has_eq_fn): Likewise.
* include/ext/pb_ds/detail/gp_hash_table_map_/gp_ht_map_.hpp: Likewise.
* include/ext/pb_ds/detail/hash_fn/ranged_hash_fn.hpp: Likewise.
* include/ext/pb_ds/detail/hash_fn/ranged_probe_fn.hpp: Likewise.
* include/ext/pb_ds/detail/left_child_next_sibling_heap_/
left_child_next_sibling_heap_.hpp: Likewise.
* include/ext/pb_ds/detail/left_child_next_sibling_heap_/node.hpp:
Likewise.
* include/ext/pb_ds/detail/left_child_next_sibling_heap_/
point_const_iterator.hpp: Likewise.
* include/ext/pb_ds/detail/list_update_map_/lu_map_.hpp: Likewise.
* include/ext/pb_ds/detail/ov_tree_map_/
constructors_destructor_fn_imps.hpp: Likewise.
* include/ext/pb_ds/detail/ov_tree_map_/node_iterators.hpp: Likewise.
* include/ext/pb_ds/detail/ov_tree_map_/ov_tree_map_.hpp: Likewise.
* include/ext/pb_ds/detail/pairing_heap_/pairing_heap_.hpp: Likewise.
* include/ext/pb_ds/detail/pat_trie_/pat_trie_.hpp: Likewise.
* include/ext/pb_ds/detail/pat_trie_/pat_trie_base.hpp: Likewise.
* include/ext/pb_ds/detail/rb_tree_map_/node.hpp: Likewise.
* include/ext/pb_ds/detail/rc_binomial_heap_/rc.hpp: Likewise.
* include/ext/pb_ds/detail/splay_tree_/node.hpp: Likewise.
* include/ext/pb_ds/detail/thin_heap_/thin_heap_.hpp: Likewise.
* include/ext/pb_ds/detail/trie_policy/sample_trie_access_traits.hpp:
Likewise.
* include/ext/pb_ds/detail/type_utils.hpp: Fix typo in comment.
* include/ext/pb_ds/detail/types_traits.hpp (stored_value): Add
bool parameter to control whether the hash value is stored.
(select_base_type): New class template and partial specialization.
(maybe_null_type): Likewise.
(rebind_traits): New class template.
(type_base): Remove four nearly identical specializations.
(type_dispatch): Remove.
(type_traits): Use select_base_type and maybe_null_type instead of
type_base to control differences between specializations.
* include/ext/pb_ds/list_update_policy.hpp: Use detail::rebind_traits.
* include/ext/pb_ds/priority_queue.hpp: Likewise.
* include/ext/pb_ds/tree_policy.hpp: Likewise.
* include/ext/pb_ds/trie_policy.hpp: Likewise.
commit 238cc9d024d2c1392ca9d695bb6e5eb9dab521fe
Author: Jonathan Wakely 
Date:   Wed Oct 23 11:52:17 2019 +0100

Adjust pb_ds extensions to use allocator_traits

These changes are largely useless, because most of them are simply
allowing 'reference' and 'const_reference' types to be obtained from an
allocator, and since C++11 allocators don't define reference types (they
just use plain lvalue references. Pretending to support C++98 allocators
with user-defined reference types is a waste of time (especially as
several of the pb_ds types appear to use a static allocator object which
means stateful allocators are not supported).

* include/ext/pb_ds/detail/bin_search_tree_/bin_search_tree_.hpp:
Use detail::rebind_traits.
* include/ext/pb_ds/detail/bin_search_tree_/node_iterators.hpp:
Likewise.
* include/ext/pb_ds/detail/bin_search_tree_/tr

[PATCH] Qualify type names in

2019-10-23 Thread Jonathan Wakely


* include/ext/throw_allocator.h (throw_allocator_base): Qualify
size_t and ptrdiff_t.

Tested powerpc64le-linux, committed to trunk.

commit a1dcc5b28035e241ac766c7699559f06b88f786c
Author: Jonathan Wakely 
Date:   Wed Oct 23 15:23:11 2019 +0100

Qualify type names in 

* include/ext/throw_allocator.h (throw_allocator_base): Qualify
size_t and ptrdiff_t.

diff --git a/libstdc++-v3/include/ext/throw_allocator.h 
b/libstdc++-v3/include/ext/throw_allocator.h
index f5da751eb69..a4b9fbc0176 100644
--- a/libstdc++-v3/include/ext/throw_allocator.h
+++ b/libstdc++-v3/include/ext/throw_allocator.h
@@ -796,8 +796,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public annotate_base, public _Cond
 {
 public:
-  typedef size_t   size_type;
-  typedef ptrdiff_tdifference_type;
+  typedef std::size_t  size_type;
+  typedef std::ptrdiff_t   difference_type;
   typedef _Tp  value_type;
   typedef value_type*  pointer;
   typedef const value_type*const_pointer;

[PATCH] Replace C++14 feature used in C++11 test

2019-10-23 Thread Jonathan Wakely


* testsuite/20_util/bind/91371.cc: Fix test to compile as C++11.

Tested powerpvc64le-linux, committed to trunk.

commit d51d04301d30e199c53a705777d263b3f596e86f
Author: Jonathan Wakely 
Date:   Wed Oct 23 16:49:55 2019 +0100

Replace C++14 feature used in C++11 test

* testsuite/20_util/bind/91371.cc: Fix test to compile as C++11.

diff --git a/libstdc++-v3/testsuite/20_util/bind/91371.cc 
b/libstdc++-v3/testsuite/20_util/bind/91371.cc
index 1c6f55e9ece..ad032ce1663 100644
--- a/libstdc++-v3/testsuite/20_util/bind/91371.cc
+++ b/libstdc++-v3/testsuite/20_util/bind/91371.cc
@@ -32,6 +32,6 @@ test01()
 
   static_assert(std::is_function::value, "");
   static_assert(std::is_function::value, "");
-  static_assert(std::is_pointer>::value, "");
-  static_assert(std::is_pointer>::value, "");
+  static_assert(std::is_pointer::type>::value, "");
+  static_assert(std::is_pointer::type>::value, "");
 }

Re: Reduce inline-heuristics-hint-percent (to fix exchange2 regression)

2019-10-23 Thread Jan Hubicka

> On 10/23/19 9:07 AM, Jan Hubicka wrote:
> > Hi,
> > this patch reduces inline-heuristics-hint-percent so inliner behaves
> > more similarly to what it did before I introduced this param.
> > 
> > Bootstrapped/regtested x86_64-linux, comitted.
> > I plan to do more tuning on this parameter, but the value of 1600 was
> > actually a typo.
> > 
> > Index: ChangeLog
> > ===
> > --- ChangeLog   (revision 277332)
> > +++ ChangeLog   (working copy)
> > @@ -1,3 +1,8 @@
> > +2019-10-23  Jan Hubicka  
> > +
> > +   PR ipa/92074
> > +   * params.def (inline-heuristics-hint-percent): Set to 600.
> Funny, I just contacted all the package owners that were failing because
> of the inliner heuristic changes.  Given they were using the old
> semantics, it's still good to encourage them to fix their code :-)

I think most of changes in warnings you see is due to enabling
-finline-functions at -O2.  This change affects -O3/-Ofast only and not
by that much. :)

Honza
> 
> 
> Jeff
>

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Alexander Monakov

On Wed, 23 Oct 2019, Eduard-Mihai Burtescu wrote:

> @@ -384,6 +384,14 @@ rust_demangle_callback (const char *mangled, int options,
>  return 0;
>rdm.sym_len--;
>  
> +  /* Legacy Rust symbols also always end with a path segment
> + that encodes a 16 hex digit hash, i.e. '17h[a-f0-9]{16}'.
> + This early check, before any parse_ident calls, should
> + quickly filter out most C++ symbols unrelated to Rust. */
> +  if (!(rdm.sym_len > 19
> +&& !strncmp (&rdm.sym[rdm.sym_len - 19], "17h", 3)))

This can be further optimized by using memcmp in place of strncmp, since from
the length check you know that you won't see the null terminator among the three
chars you're checking.

The compiler can expand memcmp(buf, "abc", 3) inline as two comparisons against
a 16-bit immediate and an 8-bit immediate.  It can't do the same for strncmp.

Alexander

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Segher Boessenkool

On Wed, Oct 23, 2019 at 07:22:47PM +0300, Alexander Monakov wrote:
> On Wed, 23 Oct 2019, Eduard-Mihai Burtescu wrote:
> > @@ -384,6 +384,14 @@ rust_demangle_callback (const char *mangled, int 
> > options,
> >  return 0;
> >rdm.sym_len--;
> >  
> > +  /* Legacy Rust symbols also always end with a path segment
> > + that encodes a 16 hex digit hash, i.e. '17h[a-f0-9]{16}'.
> > + This early check, before any parse_ident calls, should
> > + quickly filter out most C++ symbols unrelated to Rust. */
> > +  if (!(rdm.sym_len > 19
> > +&& !strncmp (&rdm.sym[rdm.sym_len - 19], "17h", 3)))
> 
> This can be further optimized by using memcmp in place of strncmp, since from
> the length check you know that you won't see the null terminator among the 
> three
> chars you're checking.
> 
> The compiler can expand memcmp(buf, "abc", 3) inline as two comparisons 
> against
> a 16-bit immediate and an 8-bit immediate.  It can't do the same for strncmp.

The compiler does not currently do that, but it *could*.  Or why not?  The
compiler is always allowed to load 3 characters here, whether some string
has a NUL character earlier or not.


Segher

Re:lemire.me:Increase the number of visitors

2019-10-23 Thread laylalopez


Dear lemire.me,

Is your website truly effective in reaching its audience?

Which is why I propose to SEO optimize your website lemire.me, so it could  
capture the attention of its customers.. in its targeted geography and  
demography.


My research on lemire.me suggests that though the website sees some organic  
traffic here and there, these numbers could be significantly increased.  
Because A. There are certain high volume keywords that you are NOT yet  
optimized for, and B. Your Social Media presence could use a few makeovers  
suited to the current times.


Basically it means, to optimize you for the high volume key-terms related  
to- lemire.me


Where I will-

· Publish rich content through your social media channels, namely Facebook,  
Twitter, Pinterest, Instagram & Google+


· use Videos effectively through YouTube & Vimeo

· Promote your site through relevant Blog Networks, Forums & Review based  
sites.



I guarantee a drastic increase in website traffic and your search rankings  
once I perform these tasks.


Are you interested in this? If yes, please allow me to send you a No  
Obligation Audit Report and quote.


Hoping to hear from you and form a partnership ahead. Also, if this is not  
the correct contact, could you please pass this to the right person at your  
Organization?


Best Regards,
Layla Lopez
Search Marketing Specialist
--

Note: To opt-out of future mailings, just send a blank reply. (Hit reply  
and send.)

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Jakub Jelinek

On Wed, Oct 23, 2019 at 11:37:26AM -0500, Segher Boessenkool wrote:
> On Wed, Oct 23, 2019 at 07:22:47PM +0300, Alexander Monakov wrote:
> > On Wed, 23 Oct 2019, Eduard-Mihai Burtescu wrote:
> > > @@ -384,6 +384,14 @@ rust_demangle_callback (const char *mangled, int 
> > > options,
> > >  return 0;
> > >rdm.sym_len--;
> > >  
> > > +  /* Legacy Rust symbols also always end with a path segment
> > > + that encodes a 16 hex digit hash, i.e. '17h[a-f0-9]{16}'.
> > > + This early check, before any parse_ident calls, should
> > > + quickly filter out most C++ symbols unrelated to Rust. */
> > > +  if (!(rdm.sym_len > 19
> > > +&& !strncmp (&rdm.sym[rdm.sym_len - 19], "17h", 3)))
> > 
> > This can be further optimized by using memcmp in place of strncmp, since 
> > from
> > the length check you know that you won't see the null terminator among the 
> > three
> > chars you're checking.
> > 
> > The compiler can expand memcmp(buf, "abc", 3) inline as two comparisons 
> > against
> > a 16-bit immediate and an 8-bit immediate.  It can't do the same for 
> > strncmp.
> 
> The compiler does not currently do that, but it *could*.  Or why not?  The
> compiler is always allowed to load 3 characters here, whether some string
> has a NUL character earlier or not.

It is valid to call strncmp (mmap(...)+page_size-1, "abc", 3), the reading
of the string should stop when 0 is seen.
Of course, it might be that there is a strlen call visible and the strlen
pass could figure out that rdm.sym_len contains the strlen, but maybe it
isn't visible or there is some call in between that might in theory
invalidate it.

Jakub

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Segher Boessenkool

On Wed, Oct 23, 2019 at 06:46:14PM +0200, Jakub Jelinek wrote:
> On Wed, Oct 23, 2019 at 11:37:26AM -0500, Segher Boessenkool wrote:
> > On Wed, Oct 23, 2019 at 07:22:47PM +0300, Alexander Monakov wrote:
> > > On Wed, 23 Oct 2019, Eduard-Mihai Burtescu wrote:
> > > > @@ -384,6 +384,14 @@ rust_demangle_callback (const char *mangled, int 
> > > > options,
> > > >  return 0;
> > > >rdm.sym_len--;
> > > >  
> > > > +  /* Legacy Rust symbols also always end with a path segment
> > > > + that encodes a 16 hex digit hash, i.e. '17h[a-f0-9]{16}'.
> > > > + This early check, before any parse_ident calls, should
> > > > + quickly filter out most C++ symbols unrelated to Rust. */
> > > > +  if (!(rdm.sym_len > 19
> > > > +&& !strncmp (&rdm.sym[rdm.sym_len - 19], "17h", 3)))
> > > 
> > > This can be further optimized by using memcmp in place of strncmp, since 
> > > from
> > > the length check you know that you won't see the null terminator among 
> > > the three
> > > chars you're checking.
> > > 
> > > The compiler can expand memcmp(buf, "abc", 3) inline as two comparisons 
> > > against
> > > a 16-bit immediate and an 8-bit immediate.  It can't do the same for 
> > > strncmp.
> > 
> > The compiler does not currently do that, but it *could*.  Or why not?  The
> > compiler is always allowed to load 3 characters here, whether some string
> > has a NUL character earlier or not.
> 
> It is valid to call strncmp (mmap(...)+page_size-1, "abc", 3), the reading
> of the string should stop when 0 is seen.

Where does it say that, though?  I don't see where it prohibits reading
more characters (up to 3 here), and you can get much better code using
that.

I of course know that for e.g. strcmp or strlen we need to be careful of
page crossings; but this is strncmp, which has a size argument saying the
size of the array objects of its arguments!


Segher

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Jakub Jelinek

On Wed, Oct 23, 2019 at 12:19:10PM -0500, Segher Boessenkool wrote:
> I of course know that for e.g. strcmp or strlen we need to be careful of
> page crossings; but this is strncmp, which has a size argument saying the
> size of the array objects of its arguments!

https://pubs.opengroup.org/onlinepubs/009695399/functions/strncmp.html
The strncmp() function shall compare not more than n bytes
(bytes that follow a null byte are not compared)
from the array pointed to by s1 to the array pointed to by s2.

In particular the second line.

Similarly C11 7.24.4.4:
The strncmp function compares not more than n characters (characters that 
follow a
null character are not compared) from the array pointed to by s1 to the array 
pointed to
by s2.

Similarly C99.

Jakub

[PATCH][MSP430] Use hardware multiply routine to perform HImode widening multiplication (mulhisi3)

2019-10-23 Thread Jozef Lawrynowicz

For MSP430 in some configurations, GCC will generate code for mulhisi3 by
inserting instructions to widen each 16-bit operand before calling a library
routine for mulsi3.
However, there exists a hardware multiply routine to perform this widening
multiplication, but it is only made use of at -O3 where it is inserted
inline into program.

We can reduce code size and improve performance by always calling the mspabi
helper function to perform this widening multiplication when hardware multiply
is available. 

I also benchmarked the effect of using a library function for mulsidi3
but it resulted in slower execution both with and without hardware multiply
support. It also increased code size for a variety of programs.

Successfully regtested on trunk.

Additionally regtested msp430.exp at -O1, -O2, -O3 and -Os.
There are tests which check each supported hardware multiply option
executes correctly, so running at these optimization levels verifies the changes
in this patch.

Ok for trunk?
>From 695ae0e560396034bc1fc2e9d9e601ab7b3d901b Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 23 Oct 2019 13:19:45 +0100
Subject: [PATCH] MSP430: Use mspabi helper function to perform HImode widening
 multiplication

gcc/ChangeLog:

2019-10-23  Jozef Lawrynowicz  

	* config/msp430/msp430.c (msp430_expand_helper): Support expansion of
	calls to __mspabi_mpy* functions.
	* config/msp430/msp430.md (mulhisi3): New define_expand.
	(umulhisi3): New define_expand.
	(*mulhisi3_inline): Use old mulhisi3 define_insn.
	(*umulhisi3_inline): Use old umulhisi3 define_insn.
---
 gcc/config/msp430/msp430.c  | 69 ++---
 gcc/config/msp430/msp430.md | 46 +++--
 2 files changed, 101 insertions(+), 14 deletions(-)

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index cd394333983..8a5579f8bce 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -53,6 +53,7 @@
 
 
 static void msp430_compute_frame_info (void);
+static bool use_32bit_hwmult (void);
 
 
 
@@ -2710,7 +2711,7 @@ void
 msp430_expand_helper (rtx *operands, const char *helper_name,
 		  bool const_variants)
 {
-  rtx c, f;
+  rtx c, fusage, fsym;
   char *helper_const = NULL;
   int arg1 = 12;
   int arg2 = 13;
@@ -2719,8 +2720,14 @@ msp430_expand_helper (rtx *operands, const char *helper_name,
   machine_mode arg1mode = GET_MODE (operands[1]);
   machine_mode arg2mode = GET_MODE (operands[2]);
   int have_430x = msp430x ? 1 : 0;
+  int expand_mpy = strncmp (helper_name, "__mspabi_mpy",
+			sizeof ("__mspabi_mpy") - 1) == 0;
+  /* This function has been used incorrectly if CONST_VARIANTS is TRUE for a
+ hwmpy function.  */
+  gcc_assert (!(expand_mpy && const_variants));
 
-  if (CONST_INT_P (operands[2]))
+  /* Emit size-optimal insns for small shifts we can easily do inline.  */
+  if (CONST_INT_P (operands[2]) && !expand_mpy)
 {
   int i;
 
@@ -2737,6 +2744,10 @@ msp430_expand_helper (rtx *operands, const char *helper_name,
 	}
 }
 
+  if (arg1mode != VOIDmode && arg2mode != VOIDmode)
+/* Modes of arguments must be equal if not constants.  */
+gcc_assert (arg1mode == arg2mode);
+
   if (arg1mode == VOIDmode)
 arg1mode = arg0mode;
   if (arg2mode == VOIDmode)
@@ -2749,12 +2760,13 @@ msp430_expand_helper (rtx *operands, const char *helper_name,
 }
   else if (arg1mode == DImode)
 {
-  /* Shift value in R8:R11, shift amount in R12.  */
   arg1 = 8;
   arg1sz = 4;
   arg2 = 12;
 }
 
+  /* Use the "const_variant" of a shift library function if requested.
+ These are faster, but have larger code size.  */
   if (const_variants
   && CONST_INT_P (operands[2])
   && INTVAL (operands[2]) >= 1
@@ -2768,25 +2780,58 @@ msp430_expand_helper (rtx *operands, const char *helper_name,
 		(int) INTVAL (operands[2]));
 }
 
+  /* Setup the arguments to the helper function.  */
   emit_move_insn (gen_rtx_REG (arg1mode, arg1),
 		  operands[1]);
   if (!helper_const)
 emit_move_insn (gen_rtx_REG (arg2mode, arg2),
 		operands[2]);
 
-  c = gen_call_value_internal (gen_rtx_REG (arg0mode, 12),
-			   gen_rtx_SYMBOL_REF (VOIDmode, helper_const
-		   ? helper_const
-		   : helper_name),
-			   GEN_INT (0));
+  if (expand_mpy)
+{
+  if (msp430_use_f5_series_hwmult ())
+	fsym = gen_rtx_SYMBOL_REF (VOIDmode, concat (helper_name,
+		 "_f5hw", NULL));
+  else if (use_32bit_hwmult ())
+	{
+	  /* When the arguments are 16-bits, the 16-bit hardware multiplier is
+	 used.  */
+	  if (arg1mode == HImode)
+	fsym = gen_rtx_SYMBOL_REF (VOIDmode, concat (helper_name,
+			 "_hw", NULL));
+	  else
+	fsym = gen_rtx_SYMBOL_REF (VOIDmode, concat (helper_name,
+			 "_hw32", NULL));
+	}
+  /* 16-bit hardware multiply.  */
+  else if (msp430_has_hwmult ())
+	fsym = gen_rtx_SYMBOL_REF (VOIDmode, concat (helper_name,
+		 "_hw", NULL));
+  else
+	fsym = gen_rtx_SYMBOL

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-23 Thread Segher Boessenkool

On Wed, Oct 23, 2019 at 07:28:48PM +0200, Jakub Jelinek wrote:
> On Wed, Oct 23, 2019 at 12:19:10PM -0500, Segher Boessenkool wrote:
> > I of course know that for e.g. strcmp or strlen we need to be careful of
> > page crossings; but this is strncmp, which has a size argument saying the
> > size of the array objects of its arguments!
> 
> https://pubs.opengroup.org/onlinepubs/009695399/functions/strncmp.html
> The strncmp() function shall compare not more than n bytes
> (bytes that follow a null byte are not compared)
> from the array pointed to by s1 to the array pointed to by s2.
> 
> In particular the second line.
> 
> Similarly C11 7.24.4.4:
> The strncmp function compares not more than n characters (characters that 
> follow a
> null character are not compared) from the array pointed to by s1 to the array 
> pointed to
> by s2.
> 
> Similarly C99.

Yes, and that does not say you cannot read more characters.  It also does
not say it compares the strings pointed to, it explicitly says characters
from the array pointed to.


Segher

[COMMITTED][MSP430] Cleanup code in hardware multiply library

2019-10-23 Thread Jozef Lawrynowicz

The libgcc hardware multiply library for MSP430 uses its own naming scheme,
which has some similarities, but is still different, to how TI names the
registers across the documentation for all its MSP430 devices.

Furthermore, 32-bit and f5series specific hwmult registers have their addresses
hard-coded into the assembly code which prepares the hardware multiply routines.

The attached patch standardizes the naming scheme to match how TI names the
registers. It also defines new symbols for the currently hard-coded 32-bit and
f5series hardware multiply registers, to improve readability, ease the effort in
implementing further hardware multiply support and help prevent bugs from
mis-typed addresses.

The patch also has a small fix to the syntax used in some assembly code. This
code doesn't appear to ever actually get run but is retained in case we need it
in the future.

Regtested and applied on trunk as obvious.
>From 6f6e061fa292c7afd699294163a67e39732aedec Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Wed, 23 Oct 2019 16:52:47 +
Subject: [PATCH] 2019-10-23  Jozef Lawrynowicz  

	* config/msp430/lib2hw_mul.S: Fix wrong syntax in branch instruction.
	s/RESULT_LO/RESLO, s/RESULT_HI/RESHI, s/MPY_OP1/MPY,
	s/MPY_OP1_S/MPYS, s/MAC_OP1/MAC, s/MPY_OP2/OP2, s/MAC_OP2/OP2.
	Define symbols for 32-bit and f5series hardware multiply
	register addresses.
	Replace hard-coded register addresses with symbols.
	Fix "_mspabi*" typo.
	Fix whitespace.
	* config/msp430/lib2mul.c: Add comment.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277340 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgcc/ChangeLog  |  12 +++
 libgcc/config/msp430/lib2hw_mul.S | 170 ++
 libgcc/config/msp430/lib2mul.c|   3 +
 3 files changed, 118 insertions(+), 67 deletions(-)

diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index ed0e9006377..99199944652 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,15 @@
+2019-10-23  Jozef Lawrynowicz  
+
+	* config/msp430/lib2hw_mul.S: Fix wrong syntax in branch instruction.
+	s/RESULT_LO/RESLO, s/RESULT_HI/RESHI, s/MPY_OP1/MPY, 
+	s/MPY_OP1_S/MPYS, s/MAC_OP1/MAC, s/MPY_OP2/OP2, s/MAC_OP2/OP2.
+	Define symbols for 32-bit and f5series hardware multiply
+	register addresses.
+	Replace hard-coded register addresses with symbols.
+	Fix "_mspabi*" typo.
+	Fix whitespace.
+	* config/msp430/lib2mul.c: Add comment.
+
 2019-10-15  John David Anglin  
 
 	* config/pa/fptr.c (_dl_read_access_allowed): Change argument to
diff --git a/libgcc/config/msp430/lib2hw_mul.S b/libgcc/config/msp430/lib2hw_mul.S
index 1a0e6e84ee9..894c551cbf0 100644
--- a/libgcc/config/msp430/lib2hw_mul.S
+++ b/libgcc/config/msp430/lib2hw_mul.S
@@ -81,9 +81,9 @@
 	.type \gcc_name , @function
 \gcc_name:
 #ifdef __MSP430X_LARGE__
-	BRA	\eabi_soft_name
+	BRA	#\eabi_soft_name
 #else
-	BR	\eabi_soft_name
+	BR	#\eabi_soft_name
 #endif
 	.size \gcc_name , . - \gcc_name
 	.popsection
@@ -109,7 +109,7 @@
 	MOV.W	&\RESULT, r12		; Move result into return register
 .endm
 
-.macro mult1632 OP1, OP2, RESULT_LO, RESULT_HI
+.macro mult1632 OP1, OP2, RESLO, RESHI
 ;* * 16-bit hardware multiply with a 32-bit result:
 ;*	int32 = int16 * int16
 ;* 	uint32 = uint16 * uint16
@@ -127,11 +127,11 @@
 	
 	MOV.W	r12, &\OP1		; Load operand 1 into multiplier
 	MOV.W	r13, &\OP2		; Load operand 2 which triggers MPY
-	MOV.W	&\RESULT_LO, r12	; Move low result into return register
-	MOV.W	&\RESULT_HI, r13	; Move high result into return register
+	MOV.W	&\RESLO, r12		; Move low result into return register
+	MOV.W	&\RESHI, r13		; Move high result into return register
 .endm
 
-.macro mult32 OP1, OP2, MAC_OP1, MAC_OP2, RESULT_LO, RESULT_HI
+.macro mult32 OP1, OP2, MAC_OP1, MAC_OP2, RESLO, RESHI
 ;* * 32-bit hardware multiply with a 32-bit result using 16 multiply and accumulate:
 ;*	int32 = int32 * int32
 ;*  
@@ -149,16 +149,16 @@
 	MOV.W	r12, &\OP1		; Load operand 1 Low into multiplier
 	MOV.W	r14, &\OP2		; Load operand 2 Low which triggers MPY
 	MOV.W	r12, &\MAC_OP1		; Load operand 1 Low into mac
-	MOV.W   &\RESULT_LO, r12	; Low 16-bits of result ready for return
-	MOV.W   &\RESULT_HI, &\RESULT_LO; MOV intermediate mpy high into low
+	MOV.W   &\RESLO, r12		; Low 16-bits of result ready for return
+	MOV.W   &\RESHI, &\RESLO	; MOV intermediate mpy high into low
 	MOV.W	r15, &\MAC_OP2		; Load operand 2 High, trigger MAC
 	MOV.W	r13, &\MAC_OP1		; Load operand 1 High
 	MOV.W	r14, &\MAC_OP2		; Load operand 2 Lo, trigger MAC
-	MOV.W	&\RESULT_LO, r13; Upper 16-bits result ready for return
+	MOV.W	&\RESLO, r13		; Upper 16-bits result ready for return
 .endm
 
 
-.macro mult32_hw  OP1_LO  OP1_HI  OP2_LO  OP2_HI  RESULT_LO  RESULT_HI
+.macro mult32_hw  OP1_LO  OP1_HI  OP2_LO  OP2_HI  RESLO  RESHI
 ;* * 32-bit hardware multiply with a 32-bit result
 ;*	int32 = int32 * int32
 ;*  
@@ -177,8 +177,8 @@
 	MOV.W	r13, &\OP1_HI		; Load operand 1 High into multiplier
 	MOV.W	r14, &\OP2_LO		; Load operand 2 Low into multiplier
 	MO

[COMMITTED][MSP430] Fix incorrect determination of hardware multiply support

2019-10-23 Thread Jozef Lawrynowicz

Some areas of the MSP430 backend modify code generation based on whether the
target device has hardware multiply support. However comparisons of the form
"msp430_hwmult_type != MSP430_HWMULT_NONE" are invalid, since MSP430_HWMULT_AUTO
might be set (to infer hwmult support from the MCU specified with -mmcu), and
the target might still not have hardware multiply support.

This is causing hardware multiply instructions to be generated for 16-bit and
32-bit widening multiplication at -O3, when the target does not have hardware
multiply support. This results in incorrect execution.

This patch fixes that and replaces the msp430_no_hwmult() function with
msp430_has_hwmult(), since the former was only ever used as
"!msp430_no_hwmult()".

Regtested and applied on trunk as obvious.
>From c5edfbaf16a73b91faa30a8b4ce9204f0ff02d3e Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Wed, 23 Oct 2019 16:55:44 +
Subject: [PATCH] 2019-10-23  Jozef Lawrynowicz  

	* config/msp430/msp430-protos.h (msp430_has_hwmult): New.
	* config/msp430/msp430.c (msp430_no_hwmult): Remove.
	(msp430_has_hwmult): New.
	(msp430_output_labelref):
	s/msp430_hwmult_type != MSP430_HWMULT_NONE/msp430_has_hwmult ()/
	* config/msp430/msp430.md (mulhisi3): Likewise.
	(umulhisi3): Likewise.
	(mulsidi3): Likewise.
	(umulsidi3): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277341 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog | 12 
 gcc/config/msp430/msp430-protos.h |  1 +
 gcc/config/msp430/msp430.c| 22 --
 gcc/config/msp430/msp430.md   |  8 
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ad2cb01d49a..7dc6885399c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,15 @@
+2019-10-23  Jozef Lawrynowicz  
+
+	* config/msp430/msp430-protos.h (msp430_has_hwmult): New.
+	* config/msp430/msp430.c (msp430_no_hwmult): Remove.
+	(msp430_has_hwmult): New.
+	(msp430_output_labelref):
+	s/msp430_hwmult_type != MSP430_HWMULT_NONE/msp430_has_hwmult ()/
+	* config/msp430/msp430.md (mulhisi3): Likewise.
+	(umulhisi3): Likewise.
+	(mulsidi3): Likewise.
+	(umulsidi3): Likewise.
+
 2019-10-23  Jan Hubicka  
 
 	PR ipa/92074
diff --git a/gcc/config/msp430/msp430-protos.h b/gcc/config/msp430/msp430-protos.h
index 37ca48297ac..98470ef647e 100644
--- a/gcc/config/msp430/msp430-protos.h
+++ b/gcc/config/msp430/msp430-protos.h
@@ -48,6 +48,7 @@ int msp430_split_addsi (rtx *);
 voidmsp430_start_function (FILE *, const char *, tree);
 rtx	msp430_subreg (machine_mode, rtx, machine_mode, int);
 boolmsp430_use_f5_series_hwmult (void);
+bool	msp430_has_hwmult (void);
 bool msp430_op_not_in_high_mem (rtx op);
 
 #endif /* GCC_MSP430_PROTOS_H */
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 31029395c3d..cd394333983 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -3097,20 +3097,22 @@ use_32bit_hwmult (void)
 /* Returns true if the current MCU does not have a
hardware multiplier of any kind.  */
 
-static bool
-msp430_no_hwmult (void)
+bool
+msp430_has_hwmult (void)
 {
   static const char * cached_match = NULL;
   static bool cached_result;
 
   if (msp430_hwmult_type == MSP430_HWMULT_NONE)
-return true;
+return false;
 
+  /* TRUE for any other explicit hwmult specified.  */
   if (msp430_hwmult_type != MSP430_HWMULT_AUTO)
-return false;
+return true;
 
+  /* Now handle -mhwmult=auto.  */
   if (target_mcu == NULL)
-return true;
+return false;
 
   if (target_mcu == cached_match)
 return cached_result;
@@ -3119,11 +3121,11 @@ msp430_no_hwmult (void)
 
   msp430_extract_mcu_data (target_mcu);
   if (extracted_mcu_data.name != NULL)
-return cached_result = extracted_mcu_data.hwmpy == 0;
+return cached_result = extracted_mcu_data.hwmpy != 0;
 
   /* If we do not recognise the MCU name, we assume that it does not support
  any kind of hardware multiply - this is the safest assumption to make.  */
-  return cached_result = true;
+  return cached_result = false;
 }
 
 /* This function does the same as the default, but it will replace GCC
@@ -3143,13 +3145,13 @@ msp430_output_labelref (FILE *file, const char *name)
 
   /* If we have been given a specific MCU name then we may be
  able to make use of its hardware multiply capabilities.  */
-  if (msp430_hwmult_type != MSP430_HWMULT_NONE)
+  if (msp430_has_hwmult ())
 {
   if (strcmp ("__mspabi_mpyi", name) == 0)
 	{
 	  if (msp430_use_f5_series_hwmult ())
 	name = "__mulhi2_f5";
-	  else if (! msp430_no_hwmult ())
+	  else
 	name = "__mulhi2";
 	}
   else if (strcmp ("__mspabi_mpyl", name) == 0)
@@ -3158,7 +3160,7 @@ msp430_output_labelref (FILE *file, const char *name)
 	name = "__mulsi2_f5";
 	  else if (use_32bit_hwmult ())
 	name = "__mulsi2_hw32";
-	  else if (! msp430_no_hwmult ())
+	  else
 	name = "__mulsi2";
 	}
 }
diff --git a/gcc/con

Re: Pass the data vector mode to get_mask_mode

2019-10-23 Thread Bernhard Reutner-Fischer

On 23 October 2019 13:16:19 CEST, Richard Sandiford  
wrote:

>+++ gcc/config/gcn/gcn.c   2019-10-23 12:13:54.091122156 +0100
>@@ -3786,8 +3786,7 @@ gcn_expand_builtin (tree exp, rtx target
>a vector.  */
> 
> opt_machine_mode
>-gcn_vectorize_get_mask_mode (poly_uint64 ARG_UNUSED (nunits),
>-   poly_uint64 ARG_UNUSED (length))
>+gcn_vectorize_get_mask_mode (nachine_mode)

nachine?

If that really compiles someone should fix that preexisting typo, I suppose. 
Didn't look though.
Cheers,

> {
>   /* GCN uses a DImode bit-mask.  */
>   return DImode;

C++ PATCH for c++/91548 - fix detecting modifying const objects for ARRAY_REF

2019-10-23 Thread Marek Polacek

This fixes a bogus "modifying a const object" error for an array that actually
isn't declared const.  The problem was how I handled ARRAY_REFs here; we
shouldn't look at the ARRAY_REF itself, but at the array its accessing.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-10-23  Marek Polacek  

PR c++/91548 - fix detecting modifying const objects for ARRAY_REF.
* constexpr.c (cxx_eval_store_expression): Don't call
modifying_const_object_p for ARRAY_REF.

* g++.dg/cpp1y/constexpr-tracking-const15.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const16.C: New test.
* g++.dg/cpp1z/constexpr-tracking-const1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 11a1eaa0e82..6b4e854e35c 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -3910,10 +3910,6 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
tree elt = TREE_OPERAND (probe, 1);
if (TREE_CODE (elt) == FIELD_DECL && DECL_MUTABLE_P (elt))
  mutable_p = true;
-   if (evaluated
-   && modifying_const_object_p (TREE_CODE (t), probe, mutable_p)
-   && const_object_being_modified == NULL_TREE)
- const_object_being_modified = probe;
if (TREE_CODE (probe) == ARRAY_REF)
  {
elt = eval_and_check_array_index (ctx, probe, false,
@@ -3921,6 +3917,15 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
if (*non_constant_p)
  return t;
  }
+   /* We don't check modifying_const_object_p for ARRAY_REFs.  Given
+  "int a[10]", an ARRAY_REF "a[2]" can be "const int", even though
+  the array isn't const.  Instead, check "a" in the next iteration;
+  that will detect modifying "const int a[10]".  */
+   else if (evaluated
+&& modifying_const_object_p (TREE_CODE (t), probe,
+ mutable_p)
+&& const_object_being_modified == NULL_TREE)
+ const_object_being_modified = probe;
vec_safe_push (refs, elt);
vec_safe_push (refs, TREE_TYPE (probe));
probe = ob;
diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-tracking-const15.C 
gcc/testsuite/g++.dg/cpp1y/constexpr-tracking-const15.C
new file mode 100644
index 000..db1b2bb7ea6
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-tracking-const15.C
@@ -0,0 +1,21 @@
+// PR c++/91548 - fix detecting modifying const objects for ARRAY_REF.
+// { dg-do compile { target c++14 } }
+
+constexpr int& impl(const int (&array)[10], int index) {
+  return const_cast(array[index]);
+}
+
+struct A {
+  constexpr int& operator[](int i) { return impl(elems, i); }
+  int elems[10];
+};
+
+constexpr bool
+f()
+{
+  A arr = {};
+  arr[2] = true;
+  return false;
+}
+
+constexpr bool b = f();
diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-tracking-const16.C 
gcc/testsuite/g++.dg/cpp1y/constexpr-tracking-const16.C
new file mode 100644
index 000..5a5b92bc8cc
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-tracking-const16.C
@@ -0,0 +1,22 @@
+// PR c++/91548 - fix detecting modifying const objects for ARRAY_REF.
+// { dg-do compile { target c++14 } }
+
+constexpr int& impl(const int (&array)[10], int index) {
+  return const_cast(array[index]);
+}
+
+struct A {
+  constexpr int& operator[](int i) { return impl(elems, i); }
+  const int elems[10];
+};
+
+constexpr bool
+f()
+{
+  A arr = {};
+  arr[2] = 1; // { dg-error "modifying a const object" }
+  return false;
+}
+
+constexpr bool b = f(); // { dg-message "in .constexpr. expansion of " }
+// { dg-message "originally declared" "" { target *-*-* } .-1 }
diff --git gcc/testsuite/g++.dg/cpp1z/constexpr-tracking-const1.C 
gcc/testsuite/g++.dg/cpp1z/constexpr-tracking-const1.C
new file mode 100644
index 000..a3856b8e7ec
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-tracking-const1.C
@@ -0,0 +1,25 @@
+// PR c++/91548 - fix detecting modifying const objects for ARRAY_REF.
+// { dg-do compile { target c++17 } }
+
+using size_t = decltype(sizeof(0));
+
+template 
+constexpr T& impl(T const (&array)[N], size_t index) {
+return const_cast(array[index]);
+}
+
+template 
+struct my_array {
+constexpr T& operator[](size_t i) { return impl(elems, i); }
+constexpr T const& operator[](size_t i) const { return elems[i]; }
+T elems[N];
+};
+
+bool f(int i) {
+static constexpr auto table = []() {
+my_array arr = {};
+arr[2] = true;
+return arr;
+}();
+return table[i];
+}

[PATCH] PR fortran/92178 -- Re-order argument deallocation

2019-10-23 Thread Steve Kargl

The attached patch has been tested on x86_64-*-freebsd.  OK to commit?

2019-10-23  Steven G. Kargl  

PR fortran/92178
* trans-expr.c (gfc_conv_procedure_call): Evaluate args and then
deallocate actual args assocated with intent(out) dummies.

2019-10-23  Steven G. Kargl  

PR fortran/92178
* gfortran.dg/pr92178.f90: New test.

Note, in gfc_conv_procedure_call() there are 3 blocks of 
code that deal with the deallocation of actual arguments
assocated with intent(out) dummy arguments.  The patch
affects the first and third blocks.  The 2nd block, lines
6071-6111, concerns CLASS and finalization.  I use neither,
so have no idea what Fortran requires.  More importantly,
I have very little understanding of gfortran's internal
implementation for CLASS and finalization.  Someone who
cares about CLASS and finalization will need to consider
how to possibly fix a possible issue.

-- 
Steve
Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c	(revision 277296)
+++ gcc/fortran/trans-expr.c	(working copy)
@@ -5405,6 +5405,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym
   gfc_component *comp = NULL;
   int arglen;
   unsigned int argc;
+  stmtblock_t dealloc_blk;
+  bool saw_dealloc = false;
 
   arglist = NULL;
   retargs = NULL;
@@ -5445,6 +5447,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym
 info = NULL;
 
   gfc_init_block (&post);
+  gfc_init_block (&dealloc_blk);
   gfc_init_interface_mapping (&mapping);
   if (!comp)
 {
@@ -5976,8 +5979,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym
 			}
 		  else
 			tmp = gfc_finish_block (&block);
-
-		  gfc_add_expr_to_block (&se->pre, tmp);
+		  saw_dealloc = true;
+		  gfc_add_expr_to_block (&dealloc_blk, tmp);
 		}
 
 		  if (fsym && (fsym->ts.type == BT_DERIVED
@@ -6265,7 +6268,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym
  void_type_node,
  gfc_conv_expr_present (e->symtree->n.sym),
    tmp, build_empty_stmt (input_location));
-		  gfc_add_expr_to_block (&se->pre, tmp);
+		  saw_dealloc = true; 
+		  gfc_add_expr_to_block (&dealloc_blk, tmp);
 		}
 	}
 	}
@@ -6636,6 +6640,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym
 
   vec_safe_push (arglist, parmse.expr);
 }
+  if (saw_dealloc)
+gfc_add_block_to_block (&se->pre, &dealloc_blk);
   gfc_finish_interface_mapping (&mapping, &se->pre, &se->post);
 
   if (comp)
Index: gcc/testsuite/gfortran.dg/pr92178.f90
===
--- gcc/testsuite/gfortran.dg/pr92178.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr92178.f90	(working copy)
@@ -0,0 +1,22 @@
+! { dg-do run }
+! Original code contributed by Vladimir Fuka
+! PR fortran/92178
+program foo
+
+   implicit none
+
+   integer, allocatable :: a(:)
+
+   allocate(a, source=[1])
+
+   call assign(a, (a(1)))
+  
+   if (allocated(a) .neqv. .false.) stop 1
+
+   contains
+  subroutine assign(a, b)
+ integer, allocatable, intent(out) :: a(:) 
+ integer :: b
+ if (b /= 1) stop 2
+  end subroutine
+end program

Re: PING*2 : Fwd: [PATCH][gcov-profile/91971]Profile directory concatenated with object file path

2019-10-23 Thread Qing Zhao

Thank you!

Just committed the change at:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=277344 


Qing
> On Oct 23, 2019, at 5:15 AM, Martin Liška  wrote:
> 
> On 10/21/19 5:32 PM, Qing Zhao wrote:
>> Please let me know whether this patch is reasonable or not.
> 
> The patch is fine. Please add PR entry to the ChangeLog and
> install the patch.
> 
> Thanks,
> Martin

Re: Make ipa-reference bitmaps dense

2019-10-23 Thread Jan Hubicka

Hi,
this is variant of patch I comitted.  It additionally register variable
removal hook to be sure that we do not mix up the static variable for
some other decl allocated later.

Bootstrapped/regtested x86_64

Honza

2019-10-13  Jan Hubicka  

* ipa-reference.h (ipa_reference_var_uid): Move offline.
* ipa-reference.c (reference_vars_map_t): new type.
(ipa_reference_vars_map, ipa_reference_vars_uids): New static vars.
(ipa_reference_var_uid): Implement.
(varpool_node_hooks): New static var.
(varpool_removal_hook): New function.
(is_improper): Do not check bitmap for id==-1
(get_static_name): Update.
(ipa_init): Initialize new datastructures.
(analyze_function): Do not recompute ids.
(propagate): Free reference_vars_to_consider.
(stream_out_bitmap): Update.
(ipa_reference_read_optimization_summary): Update.

Index: ipa-reference.c
===
--- ipa-reference.c (revision 277330)
+++ ipa-reference.c (working copy)
@@ -93,9 +93,11 @@ typedef struct ipa_reference_vars_info_d
 
 /* This map contains all of the static variables that are
being considered by the compilation level alias analysis.  */
-typedef hash_map, tree>
-reference_vars_to_consider_t;
-static reference_vars_to_consider_t *reference_vars_to_consider;
+typedef hash_map reference_vars_map_t;
+static reference_vars_map_t *ipa_reference_vars_map;
+static int ipa_reference_vars_uids;
+static vec *reference_vars_to_consider;
+varpool_node_hook_list *varpool_node_hooks;
 
 /* Set of all interesting module statics.  A bit is set for every module
static we are considering.  This is added to the local info when asm
@@ -137,6 +139,31 @@ public:
 
 static ipa_ref_opt_summary_t *ipa_ref_opt_sum_summaries = NULL;
 
+/* Return ID used by ipa-reference bitmaps.  -1 if failed.  */
+int
+ipa_reference_var_uid (tree t)
+{
+  if (!ipa_reference_vars_map)
+return -1;
+  int *id = ipa_reference_vars_map->get
+(symtab_node::get (t)->ultimate_alias_target (NULL)->decl);
+  if (!id)
+return -1;
+  return *id;
+}
+
+/* Return ID used by ipa-reference bitmaps.  Create new entry if
+   T is not in map.  Set EXISTED accordinly  */
+int
+ipa_reference_var_get_or_insert_uid (tree t, bool *existed)
+{
+  int &id = ipa_reference_vars_map->get_or_insert
+(symtab_node::get (t)->ultimate_alias_target (NULL)->decl, existed);
+  if (!*existed)
+id = ipa_reference_vars_uids++;
+  return id;
+}
+
 /* Return the ipa_reference_vars structure starting from the cgraph NODE.  */
 static inline ipa_reference_vars_info_t
 get_reference_vars_info (struct cgraph_node *node)
@@ -257,7 +284,9 @@ is_improper (symtab_node *n, void *v ATT
 static inline bool
 is_proper_for_analysis (tree t)
 {
-  if (bitmap_bit_p (ignore_module_statics, ipa_reference_var_uid (t)))
+  int id = ipa_reference_var_uid (t);
+
+  if (id != -1 && bitmap_bit_p (ignore_module_statics, id))
 return false;
 
   if (symtab_node::get (t)
@@ -273,7 +302,7 @@ is_proper_for_analysis (tree t)
 static const char *
 get_static_name (int index)
 {
-  return fndecl_name (*reference_vars_to_consider->get (index));
+  return fndecl_name ((*reference_vars_to_consider)[index]);
 }
 
 /* Dump a set of static vars to FILE.  */
@@ -402,6 +431,16 @@ propagate_bits (ipa_reference_global_var
 }
 }
 
+/* Delete NODE from map.  */
+
+static void
+varpool_removal_hook (varpool_node *node, void *)
+{
+  int *id = ipa_reference_vars_map->get (node->decl)
+  if (id)
+ipa_reference_vars_map->remove (*id);
+}
+
 static bool ipa_init_p = false;
 
 /* The init routine for analyzing global static variable usage.  See
@@ -414,8 +453,19 @@ ipa_init (void)
 
   ipa_init_p = true;
 
-  if (dump_file)
-reference_vars_to_consider = new reference_vars_to_consider_t(251);
+  vec_alloc (reference_vars_to_consider, 10);
+
+
+  if (ipa_ref_opt_sum_summaries != NULL)
+{
+  delete ipa_ref_opt_sum_summaries;
+  ipa_ref_opt_sum_summaries = NULL;
+  delete ipa_reference_vars_map;
+}
+  ipa_reference_vars_map = new reference_vars_map_t(257);
+  varpool_node_hooks
+= symtab->add_varpool_removal_hook (varpool_removal_hook, NULL);
+  ipa_reference_vars_uids = 0;
 
   bitmap_obstack_initialize (&local_info_obstack);
   bitmap_obstack_initialize (&optimization_summary_obstack);
@@ -424,12 +474,6 @@ ipa_init (void)
 
   if (ipa_ref_var_info_summaries == NULL)
 ipa_ref_var_info_summaries = new ipa_ref_var_info_summary_t (symtab);
-
-  if (ipa_ref_opt_sum_summaries != NULL)
-{
-  delete ipa_ref_opt_sum_summaries;
-  ipa_ref_opt_sum_summaries = NULL;
-}
 }
 
 
@@ -464,6 +508,8 @@ analyze_function (struct cgraph_node *fn
   local = init_function_info (fn);
   for (i = 0; fn->iterate_reference (i, ref); i++)
 {
+  int id;
+  bool existed;
   if (!is_a  (ref->referred))
continue;
   var = ref-

Re: Make ipa-reference bitmaps dense

2019-10-23 Thread Jakub Jelinek

On Wed, Oct 23, 2019 at 08:20:12PM +0200, Jan Hubicka wrote:
> Hi,
> this is variant of patch I comitted.  It additionally register variable
> removal hook to be sure that we do not mix up the static variable for
> some other decl allocated later.
> 
> Bootstrapped/regtested x86_64

This doesn't build.  Completely untested patch to unbreak it is below,
though not sure if it is enough.

--- gcc/ipa-reference.c.jj  2019-10-23 20:38:01.392850897 +0200
+++ gcc/ipa-reference.c 2019-10-23 20:56:17.006239699 +0200
@@ -436,9 +436,9 @@ propagate_bits (ipa_reference_global_var
 static void
 varpool_removal_hook (varpool_node *node, void *)
 {
-  int *id = ipa_reference_vars_map->get (node->decl)
+  int *id = ipa_reference_vars_map->get (node->decl);
   if (id)
-ipa_reference_vars_map->remove (*id);
+ipa_reference_vars_map->remove (node->decl);
 }
 
 static bool ipa_init_p = false;
@@ -455,7 +455,6 @@ ipa_init (void)
 
   vec_alloc (reference_vars_to_consider, 10);
 
-
   if (ipa_ref_opt_sum_summaries != NULL)
 {
   delete ipa_ref_opt_sum_summaries;
@@ -1051,7 +1050,6 @@ ipa_reference_write_optimization_summary
}
 }
 
-
   if (ltrans_statics_bitcount)
 for (i = 0; i < lto_symtab_encoder_size (encoder); i++)
   {
@@ -1291,7 +1289,7 @@ ipa_reference_c_finalize (void)
   ipa_ref_opt_sum_summaries = NULL;
   delete ipa_reference_vars_map;
   ipa_reference_vars_map = NULL;
-  symtab->remove_varpool_removal_hook (varpool_node_hooks)
+  symtab->remove_varpool_removal_hook (varpool_node_hooks);
 }
 
   if (ipa_init_p)


Jakub

Re: Make ipa-reference bitmaps dense

2019-10-23 Thread Jan Hubicka

> On Wed, Oct 23, 2019 at 08:20:12PM +0200, Jan Hubicka wrote:
> > Hi,
> > this is variant of patch I comitted.  It additionally register variable
> > removal hook to be sure that we do not mix up the static variable for
> > some other decl allocated later.
> > 
> > Bootstrapped/regtested x86_64
> 
> This doesn't build.  Completely untested patch to unbreak it is below,
> though not sure if it is enough.

Sorry, I managed to patch one tree and build different tree. I am
testing similar fix and will commmit it soon.

Honza
> 
> --- gcc/ipa-reference.c.jj2019-10-23 20:38:01.392850897 +0200
> +++ gcc/ipa-reference.c   2019-10-23 20:56:17.006239699 +0200
> @@ -436,9 +436,9 @@ propagate_bits (ipa_reference_global_var
>  static void
>  varpool_removal_hook (varpool_node *node, void *)
>  {
> -  int *id = ipa_reference_vars_map->get (node->decl)
> +  int *id = ipa_reference_vars_map->get (node->decl);
>if (id)
> -ipa_reference_vars_map->remove (*id);
> +ipa_reference_vars_map->remove (node->decl);
>  }
>  
>  static bool ipa_init_p = false;
> @@ -455,7 +455,6 @@ ipa_init (void)
>  
>vec_alloc (reference_vars_to_consider, 10);
>  
> -
>if (ipa_ref_opt_sum_summaries != NULL)
>  {
>delete ipa_ref_opt_sum_summaries;
> @@ -1051,7 +1050,6 @@ ipa_reference_write_optimization_summary
>   }
>  }
>  
> -
>if (ltrans_statics_bitcount)
>  for (i = 0; i < lto_symtab_encoder_size (encoder); i++)
>{
> @@ -1291,7 +1289,7 @@ ipa_reference_c_finalize (void)
>ipa_ref_opt_sum_summaries = NULL;
>delete ipa_reference_vars_map;
>ipa_reference_vars_map = NULL;
> -  symtab->remove_varpool_removal_hook (varpool_node_hooks)
> +  symtab->remove_varpool_removal_hook (varpool_node_hooks);
>  }
>  
>if (ipa_init_p)
> 
> 
>   Jakub
>

[PATCH] PR c++/91369 Implement P0784R7 changes to allocation and construction

2019-10-23 Thread Jonathan Wakely


This patch is the first part of library support for constexpr
std::vector and std::string. This only includes the changes to
std::allocator, std::allocator_traits, std::construct_at,
std::destroy_at, std::destroy and std::destroy_n.

std::allocator::allocate and std::allocator::deallocate need to be
added so that they can be intercepted by the compiler during constant
evaluation. Outside of constant evaluation those new member functions
just forward to the existing implementation in the base class.

PR c++/91369 Implement P0784R7 changes to allocation and construction
* include/bits/alloc_traits.h: Include .
(allocator_traits::_S_allocate, allocator_traits::_S_construct)
(allocator_traits::_S_destroy, allocator_traits::_S_max_size)
(allocator_traits::_S_select, allocator_traits::allocate)
(allocator_traits::deallocate, allocator_traits::construct)
(allocator_traits::destroy, allocator_traits::max_size)
(allocator_traits::select_on_container_copy_construction)
(allocator_traits>): Add constexpr specifier for C++20.
(allocator_traits>::construct): Use construct_at.
(allocator_traits>::destroy): Use destroy_at.
(__alloc_on_copy, __alloc_on_move, __alloc_on_swap): Add constexpr
specifier.
(_Destroy(ForwardIterator, ForwardIterator, Alloc&))
(_Destroy(ForwardIterator, ForwardIterator, allocator&)): Move here
from .
* include/bits/allocator.h (allocator::~allocator): Remove for C++20.
(allocator::allocate, allocate::deallocate): Define for C++20 and up.
(operator==, operator!=): Add constexpr specifier for C++20.
* include/bits/stl_construct.h: Don't include .
(destroy_at): For C++20 add constexpr specifier and support for
destroying arrays.
(construct_at): Define new function for C++20.
(_Construct): Return result of placement new-expression. For C++11 and
up add constexpr. For C++20 dispatch to std::construct_at during
constant evaluation.
(_Destroy(pointer)): Add constexpr specifier. For C++20 dispatch to
std::destroy_at during constant evaluation.
(_Destroy_aux::__destroy, _Destroy_n_aux::__destroy_n): Add constexpr
specifier for C++20.
(_Destroy(ForwardIterator, ForwardIterator))
(_Destroy(ForwardIterator, Size)): Likewise. Do not elide trivial
destructors during constant evaluation.
(destroy, destroy_n): Add constexpr specifier for C++20.
(_Destroy(ForwardIterator, ForwardIterator, Alloc&))
(_Destroy(ForwardIterator, ForwardIterator, allocator&)): Move to
, to remove dependency on allocators.
* include/bits/stl_uninitialized.h: Include .
Include  instead of .
* include/ext/alloc_traits.h: Always include .
(__alloc_traits::construct, __alloc_traits::destroy)
(__alloc_traits::_S_select_on_copy, __alloc_traits::_S_on_swap): Add
constexpr specifier.
* include/ext/malloc_allocator.h  (operator==, operator!=): Add
constexpr specifier for C++20.
* include/ext/new_allocator.h (operator==, operator!=): Likewise.
* testsuite/20_util/headers/memory/synopsis.cc: Add constexpr.
* testsuite/20_util/scoped_allocator/69293_neg.cc: Ignore additional
errors due to constexpr function called after failed static_assert.
* testsuite/20_util/specialized_algorithms/construct_at/1.cc: New test.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Ignore additional errors due to constexpr function called after failed
static_assert.
* testsuite/23_containers/vector/cons/destructible_neg.cc: Likewise.

Tested x86_64-linux and powerpc64le-linux, for every -std=gnu++NN
mode.

Committed to trunk.

commit da6af26699bec59f9e151a6057c74c3e060c0a79
Author: Jonathan Wakely 
Date:   Tue Oct 22 23:05:11 2019 +0100

PR c++/91369 Implement P0784R7 changes to allocation and construction

This patch is the first part of library support for constexpr
std::vector and std::string. This only includes the changes to
std::allocator, std::allocator_traits, std::construct_at,
std::destroy_at, std::destroy and std::destroy_n.

std::allocator::allocate and std::allocator::deallocate need to be
added so that they can be intercepted by the compiler during constant
evaluation. Outside of constant evaluation those new member functions
just forward to the existing implementation in the base class.

PR c++/91369 Implement P0784R7 changes to allocation and 
construction
* include/bits/alloc_traits.h: Include .
(allocator_traits::_S_allocate, allocator_traits::_S_construct)
(allocator_traits::_S_destroy, allocator_traits::_S_max_size)
(allocator_traits::_S_select, allocator_traits::allocate)
(allocator_traits::

[PATCH] Make std::invoke usable in constant expressions

2019-10-23 Thread Jonathan Wakely


* include/std/functional (invoke): Add constexpr for C++20.
* include/std/version (__cpp_lib_constexpr_invoke): Define.
* testsuite/20_util/function_objects/invoke/constexpr.cc: New test.

This is an easy one, because I already made std::__invoke constexpr,
so all that's needed for C++20 is to add _GLIBCXX20_CONSTEXPR to the
public std::invoke function that calls std::__invoke.

Tested powerpc64le-linux, committed to trunk.


commit dae24283df4341aa66f455c8cee6f7935470f7b5
Author: Jonathan Wakely 
Date:   Wed Oct 23 17:40:42 2019 +0100

Make std::invoke usable in constant expressions

* include/std/functional (invoke): Add constexpr for C++20.
* include/std/version (__cpp_lib_constexpr_invoke): Define.
* testsuite/20_util/function_objects/invoke/constexpr.cc: New test.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 7ad29a1a335..113a13b4a37 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -72,19 +72,22 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-#if __cplusplus > 201402L
-# define __cpp_lib_invoke 201411
+#if __cplusplus >= 201703L
+# define __cpp_lib_invoke 201411L
+# if __cplusplus > 201703L
+#  define __cpp_lib_constexpr_invoke 201907L
+# endif
 
   /// Invoke a callable object.
   template
-inline invoke_result_t<_Callable, _Args...>
+inline _GLIBCXX20_CONSTEXPR invoke_result_t<_Callable, _Args...>
 invoke(_Callable&& __fn, _Args&&... __args)
 noexcept(is_nothrow_invocable_v<_Callable, _Args...>)
 {
   return std::__invoke(std::forward<_Callable>(__fn),
   std::forward<_Args>(__args)...);
 }
-#endif
+#endif // C++17
 
   template::value>
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 21cc28b3450..ccaedd090b0 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -158,6 +158,7 @@
 #endif
 #define __cpp_lib_constexpr 201711L
 #define __cpp_lib_constexpr_algorithms 201806L
+#define __cpp_lib_constexpr_invoke 201907L
 #if __cpp_impl_destroying_delete
 # define __cpp_lib_destroying_delete 201806L
 #endif
diff --git 
a/libstdc++-v3/testsuite/20_util/function_objects/invoke/constexpr.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/invoke/constexpr.cc
new file mode 100644
index 000..f65caa21936
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function_objects/invoke/constexpr.cc
@@ -0,0 +1,38 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+#ifndef __cpp_lib_constexpr_invoke
+# error "Feature test macro for constexpr invoke is missing"
+#elif __cpp_lib_constexpr_invoke < 201907L
+# error "Feature test macro for constexpr invoke has wrong value"
+#endif
+
+constexpr int inc(int i) { return i + 1; }
+constexpr auto inc_f = &inc;
+static_assert( std::invoke(inc_f, 2) == 3);
+
+struct Dec
+{
+  constexpr int operator()(int i) const { return i - 1; }
+};
+
+static_assert( std::invoke(Dec{}, 5) == 4 );

[PATCH] Add C++20 jthread type to --text follows this line-<#part type="text/x-patch" filename="/home/remote/trodgers/src/oss/gcc/jt/0001-Add-C-20-jthread-type-to-thread.patch" disposition=at

2019-10-23 Thread Thomas Rodgers

[PATCH] Add C++20 jthread type to (2nd attempt)

2019-10-23 Thread Thomas Rodgers

From 56b78956a003b91e538cd5c680d614fdaee9c9eb Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Wed, 23 Oct 2019 12:32:31 -0700
Subject: [PATCH] Add C++20 jthread type to 

---
 libstdc++-v3/ChangeLog|   8 +
 libstdc++-v3/include/std/stop_token   |  14 ++
 libstdc++-v3/include/std/thread   | 125 +++
 .../testsuite/30_threads/jthread/1.cc |  27 +++
 .../testsuite/30_threads/jthread/2.cc |  27 +++
 .../testsuite/30_threads/jthread/jthread.cc   | 198 ++
 6 files changed, 399 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/30_threads/jthread/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/jthread/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/jthread/jthread.cc

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 970c5c2a018..523620da1c3 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,11 @@
+2019-10-23  Thomas Rodgers  
+
+	* include/std/stop_token (stop_token): Add operator==(), operator!=().
+	* include/std/thread: Add jthread type.
+	* testsuite/30_threads/jthread/1.cc: New test.
+	* testsuite/30_threads/jthread/2.cc: New test.
+	* testsuite/30_threads/jthread/jthread.cc: New test.
+
 2019-10-22  Thomas Rodgers  
 
 	* include/Makefile.am: Add  header.
diff --git a/libstdc++-v3/include/std/stop_token b/libstdc++-v3/include/std/stop_token
index b3655b85eae..04b9521d24e 100644
--- a/libstdc++-v3/include/std/stop_token
+++ b/libstdc++-v3/include/std/stop_token
@@ -87,6 +87,20 @@ namespace std _GLIBCXX_VISIBILITY(default)
   return stop_possible() && _M_state->_M_stop_requested();
 }
 
+[[nodiscard]]
+friend bool
+operator==(const stop_token& __a, const stop_token& __b)
+{
+  return __a._M_state == __b._M_state;
+}
+
+[[nodiscard]]
+friend bool
+operator!=(const stop_token& __a, const stop_token& __b)
+{
+  return __a._M_state == __b._M_state;
+}
+
   private:
 friend stop_source;
 template
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 90b4be6cd16..93afa766d18 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -39,6 +39,13 @@
 #include 
 #include 
 #include 
+
+#if __cplusplus > 201703L
+#define __cpp_lib_jthread 201907L
+#include 
+#include 
+#endif
+
 #include 
 #include 
 #include 
@@ -409,6 +416,124 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // @} group threads
 
+#ifdef __cpp_lib_jthread
+
+  class jthread
+  {
+  public:
+using id = std::thread::id;
+using native_handle_type = std::thread::native_handle_type;
+
+jthread() noexcept
+: _M_stop_source{ nostopstate_t{ } }
+{ }
+
+template, jthread>>>
+explicit
+jthread(_Callable&& __f, _Args&&... __args)
+  : _M_thread{[](stop_token __token, auto&& __cb, auto&&... __args)
+  {
+if constexpr(std::is_invocable_v<_Callable, stop_token, _Args...>)
+  {
+std::invoke(std::forward(__cb),
+std::move(__token),
+std::forward(__args)...);
+  }
+else
+  {
+std::invoke(std::forward(__cb),
+std::forward(__args)...);
+  }
+  },
+  _M_stop_source.get_token(),
+  std::forward<_Callable>(__f),
+  std::forward<_Args>(__args)...}
+{ }
+
+jthread(const jthread&) = delete;
+jthread(jthread&&) noexcept = default;
+
+~jthread()
+{
+  if (joinable())
+{
+  request_stop();
+  join();
+}
+}
+
+jthread&
+operator=(const jthread&) = delete;
+
+jthread&
+operator=(jthread&&) noexcept = default;
+
+void
+swap(jthread& __other) noexcept
+{
+  std::swap(_M_stop_source, __other._M_stop_source);
+  std::swap(_M_thread, __other._M_thread);
+}
+
+bool
+joinable() const noexcept
+{
+  return _M_thread.joinable();
+}
+
+void
+join()
+{
+  _M_thread.join();
+}
+
+void
+detach()
+{
+  _M_thread.detach();
+}
+
+id
+get_id() const noexcept
+{
+  _M_thread.get_id();
+}
+
+native_handle_type
+native_handle()
+{
+  return _M_thread.native_handle();
+}
+
+static unsigned
+hardware_concurrency() noexcept
+{
+  return std::thread::hardware_concurrency();
+}
+
+[[nodiscard]] stop_source
+get_stop_source() noexcept
+{
+  return _M_stop_source;
+}
+
+[[nodiscard]] stop_token
+get_stop_token() const noexcept
+{
+  return _M_stop_source.get_token();
+}
+
+bool request_stop() noexcept
+{
+  return get_stop_source().request_stop();
+}
+
+  private:
+stop_source _

Re: [PATCH] Add support for C++2a stop_token

2019-10-23 Thread Thomas Rodgers


Thomas Rodgers writes:

Let's try this again.From 23e1c9402cc15666d099fd61b58a0019181a9115 Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Tue, 22 Oct 2019 17:53:00 -0700
Subject: [PATCH] Add support for C++2a stop_token

  * include/Makefile.am: Add  header.
  * include/Makefile.in: Regenerate.
	* include/std/stop_token: New file.
	* include/std/version (__cpp_lib_jthread): New value.
	* testsuite/30_threads/stop_token/1.cc: New test.
	* testsuite/30_threads/stop_token/2.cc: New test.
	* testsuite/30_threads/stop_token/stop_token.cc: New test.
---
 libstdc++-v3/ChangeLog|  10 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/std/stop_token   | 338 ++
 libstdc++-v3/include/std/version  |   1 +
 .../testsuite/30_threads/stop_token/1.cc  |  27 ++
 .../testsuite/30_threads/stop_token/2.cc  |  27 ++
 .../30_threads/stop_token/stop_token.cc   |  93 +
 8 files changed, 498 insertions(+)
 create mode 100644 libstdc++-v3/include/std/stop_token
 create mode 100644 libstdc++-v3/testsuite/30_threads/stop_token/1.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/stop_token/2.cc
 create mode 100644 libstdc++-v3/testsuite/30_threads/stop_token/stop_token.cc

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 2ea0fe4ec40..970c5c2a018 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,13 @@
+2019-10-22  Thomas Rodgers  
+
+	* include/Makefile.am: Add  header.
+* include/Makefile.in: Regenerate.
+	* include/std/stop_token: New file.
+	* include/std/version (__cpp_lib_jthread): New value.
+	* testsuite/30_threads/stop_token/1.cc: New test.
+	* testsuite/30_threads/stop_token/2.cc: New test.
+	* testsuite/30_threads/stop_token/stop_token.cc: New test.
+
 2019-09-05  Jonathan Wakely  
 
 	* doc/xml/manual/status_cxx2020.xml: Update status for P0122R7 and
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index b8b786d9260..fb6777366bd 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -72,6 +72,7 @@ std_headers = \
 	${std_srcdir}/sstream \
 	${std_srcdir}/stack \
 	${std_srcdir}/stdexcept \
+	${std_srcdir}/stop_token \
 	${std_srcdir}/streambuf \
 	${std_srcdir}/string \
 	${std_srcdir}/string_view \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index cd1e9df5482..9b4ab670315 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -416,6 +416,7 @@ std_headers = \
 	${std_srcdir}/sstream \
 	${std_srcdir}/stack \
 	${std_srcdir}/stdexcept \
+	${std_srcdir}/stop_token \
 	${std_srcdir}/streambuf \
 	${std_srcdir}/string \
 	${std_srcdir}/string_view \
diff --git a/libstdc++-v3/include/std/stop_token b/libstdc++-v3/include/std/stop_token
new file mode 100644
index 000..b3655b85eae
--- /dev/null
+++ b/libstdc++-v3/include/std/stop_token
@@ -0,0 +1,338 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2008-2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/stop_token
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_STOP_TOKEN
+#define _GLIBCXX_STOP_TOKEN
+
+#include 
+#include 
+#include 
+#include 
+
+#define __cpp_lib_jthread 201907L
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+  _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  class stop_source;
+  template
+  class stop_callback;
+
+  struct nostopstate_t { explicit nostopstate_t() = default; };
+  inline constexpr nostopstate_t nostopstate();
+
+  class stop_token {
+  public:
+stop_token() noexcept = default;
+
+stop_token(const stop_token& __other) noexcept
+  : _M_state(__other._M_state)
+{ }
+
+stop_token(stop_token&& __other) noexcept
+  : _M_state(std::move(__other._M_state))
+{ }
+
+~stop_token() = default;
+
+stop_token&
+operator=(const stop_token& __

Re: Order symbols before section copying in the lto streamer

2019-10-23 Thread Jan Hubicka

> Hi,
> this patch orders symbols where we copy sections to match the order
> of files in the command line.  This optimizes streaming process since we
> are not opening and closing files randomly and also we read them more
> sequentially.  This saves some kernel time though I think more can be
> done if we avoid doing pair of mmap/unmap for every file section we
> read.
> 
> We also read files in random order in ipa-cp and during devirt.
> I guess also summary streaming can be refactored to stream all summaries
> for a given file instead of reading one sumarry from all files.
> 
> Bootstrapped/regtested x86_64-linux, plan to commit it this afternoon if
> there are no complains.
> 
> Honza
> 
>   * lto-common.c (lto_file_finalize): Add order attribute.
>   (lto_create_files_from_ids): Pass order.
>   (lto_file_read): UPdate call of lto_create_files_from_ids.
>   * lto-streamer-out.c (output_constructor): Push CTORS_OUT timevar.
>   (cmp_symbol_files): New.
>   (lto_output): Copy sections in file order.
>   * lto-streamer.h (lto_file_decl_data): Add field order.
Hi,
I have commited the patch but messed up testing so it broke builds with
static libraries and checking enabled. This is fixes by this patch

* lto-streamer-out.c (cmp_symbol_files): Watch for overflow.
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 277346)
+++ lto-streamer-out.c  (working copy)
@@ -2447,7 +2447,12 @@ cmp_symbol_files (const void *pn1, const
 
   /* Order within static library.  */
   if (n1->lto_file_data && n1->lto_file_data->id != n2->lto_file_data->id)
-return n1->lto_file_data->id - n2->lto_file_data->id;
+{
+  if (n1->lto_file_data->id > n2->lto_file_data->id)
+   return 1;
+  if (n1->lto_file_data->id < n2->lto_file_data->id)
+   return -1;
+}
 
   /* And finaly order by the definition order.  */
   return n1->order - n2->order;

Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-10-23 Thread Tobias Burnus


Hi Thomas,

Updated version attached. Changes:
* Use "true" instead of "openacc" for the OpenACC-only "copy()" clause 
(as not shared w/ OpenMP)

* Add some documentation to gimplify.c
* Use GOVD_FIRSTPRIVATE also for "kernel"

The patch survived bootstrapping + regtesting on my laptop (no 
offloading) and on a build server (with nvptx offloading).


On 10/18/19 3:26 PM, Thomas Schwinge wrote:
I'll be quick to note that I don't have any first-hand experience with 
Fortran common blocks. :-P 


To quote you from below: "So, please do study that closer. ;-P"

I also do not have first-hand experience (as I started with Fortran 95 + 
some of F2003), but common blocks were a nice idea of the early 1960 to 
provide access to global memory, avoiding to pass all data as arguments 
(which also has stack issues). They have been replaced by derived types 
and variables declared at module level since Fortran 90. See 
https://j3-fortran.org/doc/year/18/18-007r1.pdf or 
https://web.stanford.edu/class/me200c/tutorial_77/13_common.html



On 10/18/19 3:26 PM, Thomas Schwinge wrote:

For OpenACC, gfortran already supports common blocks for 
device_resident/usedevice/cache/flush/link.
[…] [Of those, only "copy()" is also an OpenMP clause name.]

I'm confused: in […] "OpenMP doesn't have a copy clause, so I'd expect true 
here":


I concur – only "copyin" and "copyprivate" exist in OpenMP. (But thanks 
to "if (openacc)" no "openacc" is needed, either.)



I'll defer to your judgement there, but just one comment: I noticed 
that OpenACC 2.7 in 2.7. "Data Clauses" states that "For all clauses 
except 'deviceptr' and 'present', the list argument may include a 
Fortran_common block_ name enclosed within slashes, if that _common 
block_ name also appears in a 'declare' directive 'link' clause".


Are we already properly handling the aspect that requires that the 
"that _common block_ name also appears in a 'declare' directive 'link' 
clause"? 


I don't know neither the OpenACC spec nor the GCC implementation well 
enough to claim proper (!) handling. However, as stated above: 
device_resident/usedevice/cache/flush/link do support common block 
arguments.


Looking at the testsuite, link and device_resident are tested in 
gfortran.dg/goacc/declare-2.f95. (list.f95 and reduction.f95 also use 
come common blocks.) – And gfortran.dg/goacc/common-block-1.f90 has been 
added.




The libgomp execution test cases you're adding all state that "This test does not 
exercise ACC DECLARE", yet they supposedly already do work fine. Or am I 
understading the OpenACC specification wrongly here?


You need to ask Cesar, who wrote the test case and that comment, why he 
added it.


The patch does not touch 'link'/'device_resident' clauses of 'declare', 
hence, I think he didn't see a reason to add a run-time test case for 
it. – That's independent from whether it is supported by the OpenACC 
spec and whether it is "properly" implemented in GCC/gfortran.



I'm certainly aware of (big) deficiencies in the OpenACC 'declare' handling so 
I guess my question here may be whether these test cases are valid after all?


Well, you are the OpenACC specialist – both spec wise and 
GCC-implementation wise. However, as the test cases are currently 
parsing-only test cases, I think they should be fine.




gcc/gimplify.c: oacc_default_clause contains some changes; there are 
additionally two lines which only differ for ORT_ACC – Hence, it is an 
OpenACC-only change!
The ME change is about privatizing common blocks (I haven't studied this part 
closer.)

So, please do study that closer.  ;-P

In
I raised some questions, got a bit of an answer, and in

asked further, didn't get an answer.

All the rationale from Cesar's original submission email should be
transferred into 'gcc/gimplify.c' as much as feasible, to make that
"voodoo code" better understandable.



I have now added some comments to the patch. I also changed GOVD_MAP to 
GOVD_FIRSTPRIVATE for "acc kernels" to match "acc parallel"; I think 
that makes sense in terms of what Cesar has written – but I am not 
completely sure about this.


Cross ref: The original email is 
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg00950.html ; the review 
starts here https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00250.html 
(same email as mid.mail-archive.com link above).


BTW: That patch – rediffed for OG9 and augmented by several other 
patches (including deviceptr) – was then submitted at 
https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01911.html and first 
reviewed at https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00176.html and 
then committed to OG9 at 
https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00051.html




Due to the wonders of GIT – when not requiring linear history and due to 
rebasing with GCC9, it is also part of the OG9 commit 
ac6c90812344f4f4cfe4d2f5901c1a9d

[C++ PATCH] Implement P1286R2, Contra CWG1778

2019-10-23 Thread Jason Merrill

The C++11 requirement that an explicit exception-specification on a
defaulted function match the implicit one was found to be problematic for
std::atomic.  This paper, adopted in February, simply removes that
requirement: if an explicitly defaulted function has a different
exception-specification, that now works just like a user-written function:
either it isn't noexcept when it could be, or it is noexcept and will call
terminate if an exception is thrown.

Tested x86_64-pc-linux-gnu, applying to trunk.

* method.c (defaulted_late_check): Don't check explicit
exception-specification on defaulted function.
(after_nsdmi_defaulted_late_checks): Remove.
* parser.h (struct cp_unparsed_functions_entry): Remove classes.
* parser.c (unparsed_classes): Remove.
(push_unparsed_function_queues, cp_parser_class_specifier_1):
Adjust.
---
 gcc/cp/parser.h  |  4 --
 gcc/cp/method.c  | 69 +++-
 gcc/cp/parser.c  | 14 +
 gcc/testsuite/g++.dg/DRs/dr1778.C|  7 +++
 gcc/testsuite/g++.dg/cpp0x/defaulted23.C |  4 +-
 gcc/testsuite/g++.dg/cpp0x/defaulted43.C | 10 ++--
 6 files changed, 21 insertions(+), 87 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/DRs/dr1778.C

diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
index 91b5916622d..200498281b5 100644
--- a/gcc/cp/parser.h
+++ b/gcc/cp/parser.h
@@ -163,10 +163,6 @@ struct GTY(()) cp_unparsed_functions_entry {
  FIELD_DECLs appear in this list in declaration order.  */
   vec *nsdmis;
 
-  /* Nested classes go in this vector, so that we can do some final
- processing after parsing any NSDMIs.  */
-  vec *classes;
-
   /* Functions with noexcept-specifiers that require post-processing.  */
   vec *noexcepts;
 };
diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 73a01147ff9..b613e5df871 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -2204,40 +2204,12 @@ defaulted_late_check (tree fn)
   return;
 }
 
-  /* 8.4.2/2: An explicitly-defaulted function (...) may have an explicit
- exception-specification only if it is compatible (15.4) with the 
- exception-specification on the implicit declaration.  If a function
- is explicitly defaulted on its first declaration, (...) it is
- implicitly considered to have the same exception-specification as if
- it had been implicitly declared.  */
-  maybe_instantiate_noexcept (fn);
-  tree fn_spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (fn));
-  if (!fn_spec)
-{
-  if (DECL_DEFAULTED_IN_CLASS_P (fn))
-   TREE_TYPE (fn) = build_exception_variant (TREE_TYPE (fn), eh_spec);
-}
-  else if (UNEVALUATED_NOEXCEPT_SPEC_P (fn_spec))
-/* Equivalent to the implicit spec.  */;
-  else if (DECL_DEFAULTED_IN_CLASS_P (fn)
-  && !CLASSTYPE_TEMPLATE_INSTANTIATION (ctx))
-/* We can't compare an explicit exception-specification on a
-   constructor defaulted in the class body to the implicit
-   exception-specification until after we've parsed any NSDMI; see
-   after_nsdmi_defaulted_late_checks.  */;
-  else
-{
-  tree eh_spec = get_defaulted_eh_spec (fn);
-  if (!comp_except_specs (fn_spec, eh_spec, ce_normal))
-   {
- if (DECL_DEFAULTED_IN_CLASS_P (fn))
-   DECL_DELETED_FN (fn) = true;
- else
-   error ("function %q+D defaulted on its redeclaration "
-  "with an exception-specification that differs from "
-  "the implicit exception-specification %qX", fn, eh_spec);
-   }
-}
+  /* If a function is explicitly defaulted on its first declaration without an
+ exception-specification, it is implicitly considered to have the same
+ exception-specification as if it had been implicitly declared.  */
+  if (!TYPE_RAISES_EXCEPTIONS (TREE_TYPE (fn))
+  && DECL_DEFAULTED_IN_CLASS_P (fn))
+TREE_TYPE (fn) = build_exception_variant (TREE_TYPE (fn), eh_spec);
 
   if (DECL_DEFAULTED_IN_CLASS_P (fn)
   && DECL_DECLARED_CONSTEXPR_P (implicit_fn))
@@ -2264,35 +2236,6 @@ defaulted_late_check (tree fn)
 }
 }
 
-/* OK, we've parsed the NSDMI for class T, now we can check any explicit
-   exception-specifications on functions defaulted in the class body.  */
-
-void
-after_nsdmi_defaulted_late_checks (tree t)
-{
-  if (uses_template_parms (t))
-return;
-  if (t == error_mark_node)
-return;
-  for (tree fn = TYPE_FIELDS (t); fn; fn = DECL_CHAIN (fn))
-if (!DECL_ARTIFICIAL (fn)
-   && DECL_DECLARES_FUNCTION_P (fn)
-   && DECL_DEFAULTED_IN_CLASS_P (fn))
-  {
-   tree fn_spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (fn));
-   if (UNEVALUATED_NOEXCEPT_SPEC_P (fn_spec))
- continue;
-
-   tree eh_spec = get_defaulted_eh_spec (fn);
-   if (eh_spec == error_mark_node)
- continue;
-
-   if (!comp_except_specs (TYPE_RAISES_EXCEPTIONS (TREE_TYPE (fn)),
-   eh_spe

Re: [PATCH] V6, #1 of 17: Use ADJUST_INSN_LENGTH for prefixed instructions

2019-10-23 Thread Michael Meissner

On Tue, Oct 22, 2019 at 05:27:19PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Oct 16, 2019 at 09:35:33AM -0400, Michael Meissner wrote:
> > This patch uses the target hook ADJUST_INSN_LENGTH to change the length of
> > instructions that contain prefixed memory/add instructions.
> 
> That made this amazingly hard to review.  But it might well be worth it,
> thankfully :-)
> 
> > There are 2 new insn attributes:
> > 
> > 1) num_insns: If non-zero, returns the number of machine instructions in an
> > insn.  This simplifies the calculations in rs6000_insn_cost.
> 
> This is great.
> 
> > 2) max_prefixed_insns: Returns the maximum number of prefixed instructions 
> > in
> > an insn.  Normally this is 1, but in the insns that load up 128-bit values 
> > into
> > GPRs, it will be 2.
> 
> This one, I am not so sure.

I wanted it to be simple, so in general it was just a constant.  Since the only
user of it has already checked that the insn is prefixed, I didn't think it
needed the prefixed test to set it to 0.

> > -  int n = get_attr_length (insn) / 4;
> > +  /* If the insn tells us how many insns there are, use that.  Otherwise 
> > use
> > + the length/4.  Adjust the insn length to remove the extra size that
> > + prefixed instructions take.  */
> 
> This should be temporary, until we have converted everything to use
> num_insns, right?

Well there were some 200+ places where length was set.

> > --- gcc/config/rs6000/rs6000.h  (revision 277017)
> > +++ gcc/config/rs6000/rs6000.h  (working copy)
> > @@ -1847,9 +1847,30 @@ extern scalar_int_mode rs6000_pmode;
> >  /* Adjust the length of an INSN.  LENGTH is the currently-computed length 
> > and
> > should be adjusted to reflect any required changes.  This macro is used 
> > when
> > there is some systematic length adjustment required that would be 
> > difficult
> > -   to express in the length attribute.  */
> > +   to express in the length attribute.
> >  
> > -/* #define ADJUST_INSN_LENGTH(X,LENGTH) */
> > +   In the PowerPC, we use this to adjust the length of an instruction if 
> > one or
> > +   more prefixed instructions are generated, using the attribute
> > +   num_prefixed_insns.  A prefixed instruction is 8 bytes instead of 4, 
> > but the
> > +   hardware requires that a prefied instruciton not cross a 64-byte 
> > boundary.
> 
> "prefixed instruction does not"

Thanks.

> > +   This means the compiler has to assume the length of the first prefixed
> > +   instruction is 12 bytes instead of 8 bytes.  Since the length is 
> > already set
> > +   for the non-prefixed instruction, we just need to udpate for the
> > +   difference.  */
> > +
> > +#define ADJUST_INSN_LENGTH(INSN,LENGTH)
> > \
> > +{  \
> > +  if (NONJUMP_INSN_P (INSN))   
> > \
> > +{  
> > \
> > +  rtx pattern = PATTERN (INSN);
> > \
> > +  if (GET_CODE (pattern) != USE && GET_CODE (pattern) != CLOBBER   
> > \
> > + && get_attr_prefixed (INSN) == PREFIXED_YES)  \
> > +   {   \
> > + int num_prefixed = get_attr_max_prefixed_insns (INSN);\
> > + (LENGTH) += 4 * (num_prefixed + 1);   \
> > +   }   \
> > +}  
> > \
> > +}
> 
> Please use a function, not a function-like macro.

Ok, I added rs6000_adjust_insn_length in rs6000.c.

> So this computes the *maximum* RTL instruction length, not considering how
> many of the machine insns in it need a prefix insn.  Can't we do better?
> Hrm, I guess in all cases that matter we will split early anyway.

Well before register allocation for the 128-bit types, you really can't say
what the precise length is, even if it is not prefixed.

And of course even after register allocation, it isn't precise, since the
length of a prefixed instruction is normally 8, but sometimes 12.  So we have
to use 12.

> 
> > +;; Return the number of real hardware instructions in a combined insn.  If 
> > it
> > +;; is 0, just use the length / 4.
> > +(define_attr "num_insns" "" (const_int 0))
> 
> So we could have the default value *be* length/4, not 0?

Only if you make sure that every place sets num_insns.  As the comment says,
until it is set every where, you run the risk of a deadly embrace.

> > +;; If an insn is prefixed, return the maximum number of prefixed 
> > instructions
> > +;; in the insn.  The macro ADJUST_INSN_LENGTH uses this number to adjust 
> > the
> > +;; insn length.
> > +(define_attr "max_prefixed_insns" "" (const_int 1))
> 
> "maximum number of prefixed machine instructions in the RTL instruction".
> 
> So

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread Jim Wilson

On Wed, Oct 23, 2019 at 4:16 AM Richard Biener
 wrote:
> Note I delayed thinking about relaxing the single-vector-size
> constraint in the vectorizer until after we're SLP only because
> that looked more easily done there.  I also remember patches
> relaxing this a bit from RISCV folks.

Probably not from us RISC-V folks.  I don't know of anyone looking at
RISC-V vector support in gcc other than me, and I've only had time for
some initial exploration.

Jim

Re: [PATCH] V6, #1 of 17: Use ADJUST_INSN_LENGTH for prefixed instructions

2019-10-23 Thread Segher Boessenkool

On Wed, Oct 23, 2019 at 05:00:58PM -0400, Michael Meissner wrote:
> On Tue, Oct 22, 2019 at 05:27:19PM -0500, Segher Boessenkool wrote:
> > On Wed, Oct 16, 2019 at 09:35:33AM -0400, Michael Meissner wrote:
> > > -  int n = get_attr_length (insn) / 4;
> > > +  /* If the insn tells us how many insns there are, use that.  Otherwise 
> > > use
> > > + the length/4.  Adjust the insn length to remove the extra size that
> > > + prefixed instructions take.  */
> > 
> > This should be temporary, until we have converted everything to use
> > num_insns, right?
> 
> Well there were some 200+ places where length was set.

Yes, and I did volunteer to do this work, if needed / wanted.

> > Please use a function, not a function-like macro.
> 
> Ok, I added rs6000_adjust_insn_length in rs6000.c.

Thanks.

> > > +;; Return the number of real hardware instructions in a combined insn.  
> > > If it
> > > +;; is 0, just use the length / 4.
> > > +(define_attr "num_insns" "" (const_int 0))
> > 
> > So we could have the default value *be* length/4, not 0?
> 
> Only if you make sure that every place sets num_insns.  As the comment says,
> until it is set every where, you run the risk of a deadly embrace.

Sure :-)


Segher

PR92163

2019-10-23 Thread Prathamesh Kulkarni

Hi,
The attached patch tries to fix PR92163 by calling
gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
Does it look OK ?

Thanks,
Prathamesh
2019-10-24  Prathamesh Kulkarni  

PR tree-optimization/92163
* tree-if-conv.c (ifcvt_local_dce): Call gimple_purge_dead_eh_edges
if eh cleanup is required.
* tree-ssa-dse.c (delete_dead_or_redundant_assignment): Change return 
type
to bool and return the return value of gsi_remove.
* tree-ssa-dse.h (delete_dead_or_redundant_assignment): Adjust 
prototype.

testsuite/
* gcc.dg/tree-ssa/pr92163.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
new file mode 100644
index 000..f64eaea6517
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fopenacc" } */
+
+void
+xr (int *k7)
+{
+  int qa;
+
+#pragma acc parallel
+#pragma acc loop vector
+  for (qa = 0; qa < 3; ++qa)
+if (qa % 2 != 0)
+  k7[qa] = 0;
+else
+  k7[qa] = 1;
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index df9046a3014..3e2769dd02d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2963,6 +2963,7 @@ ifcvt_local_dce (class loop *loop)
}
 }
   /* Delete dead statements.  */
+  bool do_eh_cleanup = false;
   gsi = gsi_start_bb (bb);
   while (!gsi_end_p (gsi))
 {
@@ -2975,7 +2976,7 @@ ifcvt_local_dce (class loop *loop)
 
  if (dse_classify_store (&write, stmt, false, NULL, NULL, latch_vdef)
  == DSE_STORE_DEAD)
-   delete_dead_or_redundant_assignment (&gsi, "dead");
+   do_eh_cleanup |= delete_dead_or_redundant_assignment (&gsi, "dead");
  else
gsi_next (&gsi);
  continue;
@@ -2994,6 +2995,9 @@ ifcvt_local_dce (class loop *loop)
   gsi_remove (&gsi, true);
   release_defs (stmt);
 }
+
+  if (do_eh_cleanup)
+gimple_purge_dead_eh_edges (bb);
 }
 
 /* If-convert LOOP when it is legal.  For the moment this pass has no
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 25cd4709b31..deec6c07c50 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -77,7 +77,6 @@ along with GCC; see the file COPYING3.  If not see
fact, they are the same transformation applied to different views of
the CFG.  */
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
 static void delete_dead_or_redundant_call (gimple_stmt_iterator *, const char 
*);
 
 /* Bitmap of blocks that have had EH statements cleaned.  We should
@@ -899,7 +898,7 @@ delete_dead_or_redundant_call (gimple_stmt_iterator *gsi, 
const char *type)
 
 /* Delete a dead store at GSI, which is a gimple assignment. */
 
-void
+bool
 delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char 
*type)
 {
   gimple *stmt = gsi_stmt (*gsi);
@@ -915,12 +914,14 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator 
*gsi, const char *type
 
   /* Remove the dead store.  */
   basic_block bb = gimple_bb (stmt);
-  if (gsi_remove (gsi, true))
+  bool eh_cleanup_required = gsi_remove (gsi, true);
+  if (eh_cleanup_required && need_eh_cleanup)
 bitmap_set_bit (need_eh_cleanup, bb->index);
 
   /* And release any SSA_NAMEs set in this statement back to the
  SSA_NAME manager.  */
   release_defs (stmt);
+  return eh_cleanup_required;
 }
 
 /* Attempt to eliminate dead stores in the statement referenced by BSI.
diff --git a/gcc/tree-ssa-dse.h b/gcc/tree-ssa-dse.h
index a5eccbd746d..80b6d9b2616 100644
--- a/gcc/tree-ssa-dse.h
+++ b/gcc/tree-ssa-dse.h
@@ -31,6 +31,6 @@ enum dse_store_status
 dse_store_status dse_classify_store (ao_ref *, gimple *, bool, sbitmap,
 bool * = NULL, tree = NULL);
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
+bool delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
 
 #endif   /* GCC_TREE_SSA_DSE_H  */

Re: [PATCHv2] Change the library search path when using --with-advance-toolchain

2019-10-23 Thread Peter Bergner

On 10/5/19 12:20 PM, Segher Boessenkool wrote:
> On Fri, Oct 04, 2019 at 06:31:34PM -0300, Tulio Magno Quites Machado Filho 
> wrote:
>> Remove all -L directories from LINK_OS_EXTRA_SPEC32 and
>> LINK_OS_EXTRA_SPEC64 so that user directories specified at
>> build time have higher preference over the advance toolchain libraries.
>>
>> Set MD_STARTFILE_PREFIX to $prefix/lib/ and MD_STARTFILE_PREFIX_1 to
>> $at/lib/ so that a compiler library has preference over the Advance
>> Toolchain libraries.
> 
> This is fine, approved for all branches.  Thank you!  And thanks to Mike
> for the testing.

I've committed the back ports to the FSF 7, 8 and 9 branches now after
clean bootstraps and regtesting.

Peter

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-23 Thread H.J. Lu

On Wed, Oct 23, 2019 at 4:51 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
> >  wrote:
> >>
> >> This patch is the first of a series that tries to remove two
> >> assumptions:
> >>
> >> (1) that all vectors involved in vectorisation must be the same size
> >>
> >> (2) that there is only one vector mode for a given element mode and
> >> number of elements
> >>
> >> Relaxing (1) helps with targets that support multiple vector sizes or
> >> that require the number of elements to stay the same.  E.g. if we're
> >> vectorising code that operates on narrow and wide elements, and the
> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> >> for the wide elements.
> >>
> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> >> vectors to work with -msve-vector-bits=256.
> >>
> >> The patch adds a new hook that targets can use to control how we
> >> move from one vector mode to another.  The hook takes a starting vector
> >> mode, a new element mode, and (optionally) a new number of elements.
> >> The flexibility needed for (1) comes in when the number of elements
> >> isn't specified.
> >>
> >> All callers in this patch specify the number of elements, but a later
> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> >> for a few days, hence the RFC/A tag.
> >>
> >> Tested individually on aarch64-linux-gnu and as a series on
> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> >> look OK?
> >
> > In isolation the idea looks good but maybe a bit limited?  I see
> > how it works for the same-size case but if you consider x86
> > where we have SSE, AVX256 and AVX512 what would it return
> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
> > kind of query not intended (where the component modes match
> > but nunits is zero)?
>
> In that case we'd normally get V4SImode back.  It's an allowed
> combination, but not very useful.
>
> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
> > it just used to stay in the same register set for different component
> > modes?
>
> Yeah, the idea is to use the original vector mode as essentially
> a base architecture.
>
> The follow-on patches replace vec_info::vector_size with
> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
> with targetm.vectorize.autovectorize_vector_modes.  These are the
> starting modes that would be passed to the hook in the nunits==0 case.
>

For a target with different vector sizes,
targetm.vectorize.autovectorize_vector_sizes
doesn't return the optimal vector sizes for known trip count and
unknown trip count.
For a target with 128-bit and 256-bit vectors, 256-bit followed by
128-bit works well for
known trip count since vectorizer knows the maximum usable vector size.  But for
unknown trip count, we may want to use 128-bit vector when 256-bit
code path won't
be used at run-time, but 128-bit vector will.  At the moment, we can
only use one
set of vector sizes for both known trip count and unknown trip count.
  Can vectorizer
support 2 sets of vector sizes, one for known trip count and the other
for unknown
trip count?

H.J.

[C++ PATCH] Fix a C++17/20 regression on indirect function calls (PR c++/92201)

2019-10-23 Thread Jakub Jelinek

Hi!

For middle-end pointer conversions are useless, which means the gimplifier
can change the type of the value being gimplified.
gimplify_call_expr is careful about this and remembers the fnptrtype early
before gimplification and
  else
/* Remember the original function type.  */
CALL_EXPR_FN (*expr_p) = build1 (NOP_EXPR, fnptrtype,
 CALL_EXPR_FN (*expr_p));
it at the end if needed, but my cp_gimplify_expr changes didn't do that
(and unfortunately got backported to 9.x already).

The following patch fixes it similarly to what gimplify_call_expr uses.
I haven't changed gimplify_to_rvalue to do that, as it is quite specific to
the CALL_EXPR_FN, while there is just one caller right now, if more are
added, most likely they won't need such behavior or it might be even
harmful, because the NOP_EXPR then is not is_gimple_val.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 9.3?

2019-10-23  Jakub Jelinek  

PR c++/92201
* cp-gimplify.c (cp_gimplify_expr): If gimplify_to_rvalue changes the
function pointer type, re-add cast to the original one.

* g++.dg/other/pr92201.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2019-10-18 00:16:09.905545389 +0200
+++ gcc/cp/cp-gimplify.c2019-10-23 20:49:30.255410883 +0200
@@ -838,11 +838,17 @@ cp_gimplify_expr (tree *expr_p, gimple_s
  && CALL_EXPR_FN (*expr_p)
  && cp_get_callee_fndecl_nofold (*expr_p) == NULL_TREE)
{
+ tree fnptrtype = TREE_TYPE (CALL_EXPR_FN (*expr_p));
  enum gimplify_status t
= gimplify_to_rvalue (&CALL_EXPR_FN (*expr_p), pre_p, NULL,
  is_gimple_call_addr);
  if (t == GS_ERROR)
ret = GS_ERROR;
+ /* GIMPLE considers most pointer conversion useless, but for
+calls we actually care about the exact function pointer type.  */
+ else if (TREE_TYPE (CALL_EXPR_FN (*expr_p)) != fnptrtype)
+   CALL_EXPR_FN (*expr_p)
+ = build1 (NOP_EXPR, fnptrtype, CALL_EXPR_FN (*expr_p));
}
   if (!CALL_EXPR_FN (*expr_p))
/* Internal function call.  */;
--- gcc/testsuite/g++.dg/other/pr92201.C.jj 2019-10-23 20:59:41.375139038 
+0200
+++ gcc/testsuite/g++.dg/other/pr92201.C2019-10-23 20:58:33.068175381 
+0200
@@ -0,0 +1,7 @@
+// PR c++/92201
+
+int
+foo (void (*p) ())
+{
+  return (*reinterpret_cast (p)) ();
+}

Jakub

[Committed] Update Fortran expression dumper for BT_BOZ

2019-10-23 Thread Steve Kargl

The attached and committed patch updates gfortran 
expression dumper to do something sensible with a
gfc_expr that is a BT_BOZ basic type.  That is, for

program a
   real :: x = real(b'10101')
   real :: y = real(o'')
   real :: z = real(z'abcd')
   print *, x, y, z
end program a

if one is in gdb, you now get (in gfc_boz2real).

(gdb) call debug(x)
b'10101' (BOZ 0)
 
(gdb) call debug(x)
o'' (BOZ 0)

(gdb) call debug(x)
z'abcd' (BOZ 0)

2019-10-23  Steven G. Kargl  

 dump-parse-tree.c (show_expr): Add dumping of BT_BOZ constants.

-- 
Steve
Index: gcc/fortran/dump-parse-tree.c
===
--- gcc/fortran/dump-parse-tree.c	(revision 277296)
+++ gcc/fortran/dump-parse-tree.c	(working copy)
@@ -559,6 +559,16 @@ show_expr (gfc_expr *p)
 	  fputc (')', dumpfile);
 	  break;
 
+	case BT_BOZ:
+	  if (p->boz.rdx == 2)
+	fputs ("b'", dumpfile);
+	  else if (p->boz.rdx == 8)
+	fputs ("o'", dumpfile);
+	  else
+	fputs ("z'", dumpfile);
+	  fprintf (dumpfile, "%s'", p->boz.str);
+	  break;
+
 	case BT_HOLLERITH:
 	  fprintf (dumpfile, HOST_WIDE_INT_PRINT_DEC "H",
 		   p->representation.length);

[C++ PATCH] 'std' identifier not needed

2019-10-23 Thread Nathan Sidwell

There's no need to retain "std" identifier as a global tree -- we can 
simply use {push,/pop}_ nested_namespace on the std_node we have there. 
Also simplify the in-std-namespace predicate by checking against std::node.


applying to trunk.

nathan
--
Nathan Sidwell
2019-10-23  Nathan Sidwell  

	* cp-tree.c (CPTI_STD_IDENTIFIER): Delete.
	(std_identifier): Delete.
	(DECL_NAME_SPACE_STD_P): Compare against std_node.
	* decl.c (initialize_predefined_identifiers): 'std' is not needed.
	(cxx_init_decl_processing): Adjust creation of ::std.  Use
	{push,pop}_nested_namespace.
	(cxx_builtin_function): Use {push,pop}_nested_namespace.
	* except.c (init_exception_processing): Likewise.
	* rtti.c (init_rtti_processing): Likewise.

Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h	(revision 277357)
+++ gcc/cp/cp-tree.h	(working copy)
@@ -149,7 +149,6 @@ enum cp_tree_index
 CPTI_PFN_IDENTIFIER,
 CPTI_VPTR_IDENTIFIER,
 CPTI_GLOBAL_IDENTIFIER,
-CPTI_STD_IDENTIFIER,
 CPTI_ANON_IDENTIFIER,
 CPTI_AUTO_IDENTIFIER,
 CPTI_DECLTYPE_AUTO_IDENTIFIER,
@@ -289,7 +288,6 @@ extern GTY(()) tree cp_global_trees[CPTI
 #define vptr_identifier			cp_global_trees[CPTI_VPTR_IDENTIFIER]
 /* The name of the ::, std & anon namespaces.  */
 #define global_identifier		cp_global_trees[CPTI_GLOBAL_IDENTIFIER]
-#define std_identifier			cp_global_trees[CPTI_STD_IDENTIFIER]
 #define anon_identifier			cp_global_trees[CPTI_ANON_IDENTIFIER]
 /* auto and declspec(auto) identifiers.  */
 #define auto_identifier			cp_global_trees[CPTI_AUTO_IDENTIFIER]
@@ -3335,9 +,7 @@ struct GTY(()) lang_decl {
 
 /* Nonzero if NODE is the std namespace.  */
 #define DECL_NAMESPACE_STD_P(NODE)			\
-  (TREE_CODE (NODE) == NAMESPACE_DECL			\
-   && CP_DECL_CONTEXT (NODE) == global_namespace	\
-   && DECL_NAME (NODE) == std_identifier)
+  ((NODE) == std_node)
 
 /* In a TREE_LIST in an attribute list, indicates that the attribute
must be applied at instantiation time.  */
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c	(revision 277357)
+++ gcc/cp/decl.c	(working copy)
@@ -4178,7 +4178,6 @@ initialize_predefined_identifiers (void)
 {"_vptr", &vptr_identifier, cik_normal},
 {"__vtt_parm", &vtt_parm_identifier, cik_normal},
 {"::", &global_identifier, cik_normal},
-{"std", &std_identifier, cik_normal},
   /* The demangler expects anonymous namespaces to be called
 	 something starting with '_GLOBAL__N_'.  It no longer needs
 	 to be unique to the TU.  */
@@ -4262,7 +4261,7 @@ cxx_init_decl_processing (void)
   current_lang_name = lang_name_c;
 
   /* Create the `std' namespace.  */
-  push_namespace (std_identifier);
+  push_namespace (get_identifier ("std"));
   std_node = current_namespace;
   pop_namespace ();
 
@@ -4392,14 +4391,14 @@ cxx_init_decl_processing (void)
 	tree bad_alloc_type_node;
 	tree bad_alloc_decl;
 
-	push_namespace (std_identifier);
+	push_nested_namespace (std_node);
 	bad_alloc_id = get_identifier ("bad_alloc");
 	bad_alloc_type_node = make_class_type (RECORD_TYPE);
 	TYPE_CONTEXT (bad_alloc_type_node) = current_namespace;
 	bad_alloc_decl
 	  = create_implicit_typedef (bad_alloc_id, bad_alloc_type_node);
 	DECL_CONTEXT (bad_alloc_decl) = current_namespace;
-	pop_namespace ();
+	pop_nested_namespace (std_node);
 
 	new_eh_spec
 	  = add_exception_specifier (NULL_TREE, bad_alloc_type_node, -1);
@@ -4451,11 +4450,11 @@ cxx_init_decl_processing (void)
 
 if (aligned_new_threshold)
   {
-	push_namespace (std_identifier);
+	push_nested_namespace (std_node);
 	tree align_id = get_identifier ("align_val_t");
 	align_type_node = start_enum (align_id, NULL_TREE, size_type_node,
   NULL_TREE, /*scoped*/true, NULL);
-	pop_namespace ();
+	pop_nested_namespace (std_node);
 
 	/* operator new (size_t, align_val_t); */
 	newtype = build_function_type_list (ptr_type_node, size_type_node,
@@ -4663,10 +4662,10 @@ cxx_builtin_function (tree decl)
 {
   tree std_decl = copy_decl (decl);
 
-  push_namespace (std_identifier);
+  push_nested_namespace (std_node);
   DECL_CONTEXT (std_decl) = FROB_CONTEXT (std_node);
   pushdecl (std_decl);
-  pop_namespace ();
+  pop_nested_namespace (std_node);
 }
 
   DECL_CONTEXT (decl) = FROB_CONTEXT (current_namespace);
Index: gcc/cp/except.c
===
--- gcc/cp/except.c	(revision 277357)
+++ gcc/cp/except.c	(working copy)
@@ -51,14 +51,14 @@ init_exception_processing (void)
   tree tmp;
 
   /* void std::terminate (); */
-  push_namespace (std_identifier);
+  push_nested_namespace (std_node);
   tmp = build_function_type_list (void_type_node, NULL_TREE);
   terminate_fn = build_cp_library_fn_ptr ("terminate", tmp,
 	   ECF_NOTHROW | ECF_NORETURN
 	   | ECF_COLD);
   gcc_checking_assert (TREE_THIS_VOLATILE (terminate_fn)
 		   && TREE_NOTH

Re: [SVE] PR91272

2019-10-23 Thread Prathamesh Kulkarni

On Tue, 22 Oct 2019 at 13:12, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index acdd90784dc..dfd33b142ed 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -10016,25 +10016,26 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >/* See whether another part of the vectorized code applies a loop
> >mask to the condition, or to its inverse.  */
> >
> > +  vec_loop_masks *masks = NULL;
> >if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> >   {
> > -   scalar_cond_masked_key cond (cond_expr, ncopies);
> > -   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > - {
> > -   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > -   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, 
> > j);
> > - }
> > +   if (reduction_type == EXTRACT_LAST_REDUCTION)
> > + masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > else
> >   {
> > -   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > -   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > + masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +   else
> >   {
> > -   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > -   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> > -   vectype, j);
> > -   cond_code = cond.code;
> > -   swap_cond_operands = true;
> > +   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > +   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > + {
> > +   masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +   cond_code = cond.code;
> > +   swap_cond_operands = true;
> > + }
> >   }
> >   }
> >   }
> > @@ -10116,6 +10117,13 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >vec_then_clause = vec_oprnds2[i];
> >vec_else_clause = vec_oprnds3[i];
> >
> > +  if (masks)
> > + {
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
> > + }
> > +
>
> I don't think we need an extra "if" here.  "loop_mask" only feeds
> the later "if (loop_mask)" block, so we might as well change that
> later "if" to "if (masks)" and make the "loop_mask" variable local
> to the "if" body.
>
> > if (swap_cond_operands)
> >   std::swap (vec_then_clause, vec_else_clause);
> >
> > @@ -10194,23 +10202,6 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> > vec_compare = tmp;
> >   }
> >
> > -   tree tmp2 = make_ssa_name (vec_cmp_type);
> > -   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
> > - vec_compare, loop_mask);
> > -   vect_finish_stmt_generation (stmt_info, g, gsi);
> > -   vec_compare = tmp2;
> > - }
> > -
> > -   if (reduction_type == EXTRACT_LAST_REDUCTION)
> > - {
> > -   if (!is_gimple_val (vec_compare))
> > - {
> > -   tree vec_compare_name = make_ssa_name (vec_cmp_type);
> > -   gassign *new_stmt = gimple_build_assign (vec_compare_name,
> > -vec_compare);
> > -   vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > -   vec_compare = vec_compare_name;
> > - }
>
> This form is simpler than:
>
>   if (COMPARISON_CLASS_P (vec_compare))
> {
>   tree tmp = make_ssa_name (vec_cmp_type);
>   tree op0 = TREE_OPERAND (vec_compare, 0);
>   tree op1 = TREE_OPERAND (vec_compare, 1);
>   gassign *g = gimple_build_assign (tmp,
> TREE_CODE (vec_compare),
> op0, op1);
>   vect_finish_stmt_generation (stmt_info, g, gsi);
>   vec_compare = tmp;
> }
>
> so I think it'd be better to keep the EXTRACT_LAST_REDUCTION version.
Does the attached version look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
index d4f

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-23 Thread luoxhu

Hi,


On 2019/10/17 16:23, Feng Xue OS wrote:
> IPA does not allow constant propagation on parameter that is used to control
> function recursion.
> 
> recur_fn (i)
> {
>if ( !terminate_recursion (i))
>  {
>...
>recur_fn (i + 1);
>...
>  }
>...
> }
> 
> This patch is composed to enable multi-versioning for self-recursive function,
> and versioning copies is limited by a specified option.
> 
> Feng
> ---
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 045072e02ec..6255a766e4d 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -229,7 +229,9 @@ public:
> inline bool set_contains_variable ();
> bool add_value (valtype newval, cgraph_edge *cs,
> ipcp_value *src_val = NULL,
> -   int src_idx = 0, HOST_WIDE_INT offset = -1);
> +   int src_idx = 0, HOST_WIDE_INT offset = -1,
> +   ipcp_value **val_pos_p = NULL,
> +   bool unlimited = false);
> void print (FILE * f, bool dump_sources, bool dump_benefits);
>   };
>   
> @@ -1579,22 +1581,37 @@ allocate_and_init_ipcp_value 
> (ipa_polymorphic_call_context source)
>   /* Try to add NEWVAL to LAT, potentially creating a new ipcp_value for it.  
> CS,
>  SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
>  meaning.  OFFSET -1 means the source is scalar and not a part of an
> -   aggregate.  */
> +   aggregate.  If non-NULL, VAL_POS_P specifies position in value list,
> +   after which newly created ipcp_value will be inserted, and it is also
> +   used to record address of the added ipcp_value before function returns.
> +   UNLIMITED means whether value count should not exceed the limit given
> +   by PARAM_IPA_CP_VALUE_LIST_SIZE.  */
>   
>   template 
>   bool
>   ipcp_lattice::add_value (valtype newval, cgraph_edge *cs,
> ipcp_value *src_val,
> -   int src_idx, HOST_WIDE_INT offset)
> +   int src_idx, HOST_WIDE_INT offset,
> +   ipcp_value **val_pos_p,
> +   bool unlimited)
>   {
> ipcp_value *val;
>   
> +  if (val_pos_p)
> +{
> +  for (val = values; val && val != *val_pos_p; val = val->next);
> +  gcc_checking_assert (val);
> +}
> +
> if (bottom)
>   return false;
>   
> for (val = values; val; val = val->next)
>   if (values_equal_for_ipcp_p (val->value, newval))
> {
> + if (val_pos_p)
> +   *val_pos_p = val;
> +
>   if (ipa_edge_within_scc (cs))
> {
>   ipcp_value_source *s;
> @@ -1609,7 +1626,8 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>   return false;
> }
>   
> -  if (values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
> +  if (!unlimited
> +  && values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
>   {
> /* We can only free sources, not the values themselves, because 
> sources
>of other values in this SCC might point to them.   */
> @@ -1623,6 +1641,9 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>   }
>   }
>   
> +  if (val_pos_p)
> + *val_pos_p = NULL;
> +
> values = NULL;
> return set_to_bottom ();
>   }
> @@ -1630,8 +1651,54 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
> values_count++;
> val = allocate_and_init_ipcp_value (newval);
> val->add_source (cs, src_val, src_idx, offset);
> -  val->next = values;
> -  values = val;
> +  if (val_pos_p)
> +{
> +  val->next = (*val_pos_p)->next;
> +  (*val_pos_p)->next = val;
> +  *val_pos_p = val;
> +}
> +  else
> +{
> +  val->next = values;
> +  values = val;
> +}
> +
> +  return true;
> +}
> +
> +/* Return true, if a ipcp_value VAL is orginated from parameter value of
> +   self-feeding recursive function by applying non-passthrough arithmetic
> +   transformation.  */
> +
> +static bool
> +self_recursively_generated_p (ipcp_value *val)
> +{
> +  class ipa_node_params *info = NULL;
> +
> +  for (ipcp_value_source *src = val->sources; src; src = src->next)
> +{
> +  cgraph_edge *cs = src->cs;
> +
> +  if (!src->val || cs->caller != cs->callee->function_symbol ()
> +   || src->val == val)
> + return false;
> +
> +  if (!info)
> + info = IPA_NODE_REF (cs->caller);
> +
> +  class ipcp_param_lattices *plats = ipa_get_parm_lattices (info,
> + src->index);
> +  ipcp_lattice *src_lat = src->offset == -1 ? &plats->itself
> +   : plats->aggs;

Thanks for the patch.
This function doesn't handle the by-ref case after rebasing this patch to your
previous ipa-cp by-ref of arith patch as below (Also some conflicts need fix):

foo (int * p) { ...  return foo(*(&p) + 1); }

It will cause value explosion.

Secondly, the self_recu

[PATCH] Fall back to SLP reduction discovery when reduction group fails

2019-10-23 Thread Richard Biener



This helps saving some IVs (though I guess the situation where it
matches in practice is scarce).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-10-24  Richard Biener  

* tree-vect-slp.c (vect_analyze_slp): When reduction group
SLP discovery fails try to handle the reduction as part
of SLP reduction discovery.

* gcc.dg/vect/slp-reduc-9.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 277330)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2271,14 +2271,18 @@ vect_analyze_slp (vec_info *vinfo, unsig
  {
/* Dissolve reduction chain group.  */
stmt_vec_info vinfo = first_element;
+   stmt_vec_info last = NULL;
while (vinfo)
  {
stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (vinfo);
REDUC_GROUP_FIRST_ELEMENT (vinfo) = NULL;
REDUC_GROUP_NEXT_ELEMENT (vinfo) = NULL;
+   last = vinfo;
vinfo = next;
  }
STMT_VINFO_DEF_TYPE (first_element) = vect_internal_def;
+   /* It can be still vectorized as part of an SLP reduction.  */
+   loop_vinfo->reductions.safe_push (last);
  }
}
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-9.c
===
--- gcc/testsuite/gcc.dg/vect/slp-reduc-9.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-9.c (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int_mult } */
+
+int
+bar (int *x, int a, int b, int n)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  int sum1 = 0;
+  int sum2 = 0;
+  for (int i = 0; i < n; ++i)
+{
+  /* Reduction chain vectorization fails here because of the
+ different operations but we can still vectorize both
+reductions as SLP reductions, saving IVs.  */
+  sum1 += x[2*i] - a;
+  sum1 += x[2*i+1] * b;
+  sum2 += x[2*i] - b;
+  sum2 += x[2*i+1] * a;
+}
+  return sum1 + sum2;
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-23 Thread Feng Xue OS

Hi,

  Actually, this patch is not final one. Since the previous cp-by-ref patch is 
still on the way to the trunk,
I tailored it to only handle by-value recursion, so that it is decoupled with 
the previous patch, and can
be reviewed independently. I attached the final patch, you can have a try.

 CP propagation stage might generate values outside of recursion ranges, but 
cloning stage will not
make a duplicate for invalid recursion value, with which we do not need extra 
code for recursion range
analysis.

Feng


From: luoxhu 
Sent: Thursday, October 24, 2019 1:44 PM
To: Feng Xue OS; gcc-patches@gcc.gnu.org; Jan Hubicka; Martin Jambor
Subject: Re: [PATCH] Support multi-versioning on self-recursive function 
(ipa/92133)

Hi,


On 2019/10/17 16:23, Feng Xue OS wrote:
> IPA does not allow constant propagation on parameter that is used to control
> function recursion.
>
> recur_fn (i)
> {
>if ( !terminate_recursion (i))
>  {
>...
>recur_fn (i + 1);
>...
>  }
>...
> }
>
> This patch is composed to enable multi-versioning for self-recursive function,
> and versioning copies is limited by a specified option.
>
> Feng
> ---
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 045072e02ec..6255a766e4d 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -229,7 +229,9 @@ public:
> inline bool set_contains_variable ();
> bool add_value (valtype newval, cgraph_edge *cs,
> ipcp_value *src_val = NULL,
> -   int src_idx = 0, HOST_WIDE_INT offset = -1);
> +   int src_idx = 0, HOST_WIDE_INT offset = -1,
> +   ipcp_value **val_pos_p = NULL,
> +   bool unlimited = false);
> void print (FILE * f, bool dump_sources, bool dump_benefits);
>   };
>
> @@ -1579,22 +1581,37 @@ allocate_and_init_ipcp_value 
> (ipa_polymorphic_call_context source)
>   /* Try to add NEWVAL to LAT, potentially creating a new ipcp_value for it.  
> CS,
>  SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
>  meaning.  OFFSET -1 means the source is scalar and not a part of an
> -   aggregate.  */
> +   aggregate.  If non-NULL, VAL_POS_P specifies position in value list,
> +   after which newly created ipcp_value will be inserted, and it is also
> +   used to record address of the added ipcp_value before function returns.
> +   UNLIMITED means whether value count should not exceed the limit given
> +   by PARAM_IPA_CP_VALUE_LIST_SIZE.  */
>
>   template 
>   bool
>   ipcp_lattice::add_value (valtype newval, cgraph_edge *cs,
> ipcp_value *src_val,
> -   int src_idx, HOST_WIDE_INT offset)
> +   int src_idx, HOST_WIDE_INT offset,
> +   ipcp_value **val_pos_p,
> +   bool unlimited)
>   {
> ipcp_value *val;
>
> +  if (val_pos_p)
> +{
> +  for (val = values; val && val != *val_pos_p; val = val->next);
> +  gcc_checking_assert (val);
> +}
> +
> if (bottom)
>   return false;
>
> for (val = values; val; val = val->next)
>   if (values_equal_for_ipcp_p (val->value, newval))
> {
> + if (val_pos_p)
> +   *val_pos_p = val;
> +
>   if (ipa_edge_within_scc (cs))
> {
>   ipcp_value_source *s;
> @@ -1609,7 +1626,8 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>   return false;
> }
>
> -  if (values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
> +  if (!unlimited
> +  && values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
>   {
> /* We can only free sources, not the values themselves, because 
> sources
>of other values in this SCC might point to them.   */
> @@ -1623,6 +1641,9 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>   }
>   }
>
> +  if (val_pos_p)
> + *val_pos_p = NULL;
> +
> values = NULL;
> return set_to_bottom ();
>   }
> @@ -1630,8 +1651,54 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
> values_count++;
> val = allocate_and_init_ipcp_value (newval);
> val->add_source (cs, src_val, src_idx, offset);
> -  val->next = values;
> -  values = val;
> +  if (val_pos_p)
> +{
> +  val->next = (*val_pos_p)->next;
> +  (*val_pos_p)->next = val;
> +  *val_pos_p = val;
> +}
> +  else
> +{
> +  val->next = values;
> +  values = val;
> +}
> +
> +  return true;
> +}
> +
> +/* Return true, if a ipcp_value VAL is orginated from parameter value of
> +   self-feeding recursive function by applying non-passthrough arithmetic
> +   transformation.  */
> +
> +static bool
> +self_recursively_generated_p (ipcp_value *val)
> +{
> +  class ipa_node_params *info = NULL;
> +
> +  for (ipcp_value_source *src = val->sources; src; src = src->next)
> +{
> +  cgraph_edge *cs = src->cs;
> +
> +  if

Re: [PATCH] S/390: Use UNSPEC_GET_TP for thread pointer loads

2019-10-23 Thread Andreas Krebbel

On 23.10.19 13:02, Ilya Leoshkevich wrote:
> Boostrapped and regtested on s390x-redhat-linux.
> 
> gcc/ChangeLog:
> 
> 2019-10-21  Ilya Leoshkevich  
> 
>   * config/s390/s390.c (s390_get_thread_pointer): Use
>   gen_get_thread_pointer.
>   (s390_expand_split_stack_prologue): Likewise.
>   * config/s390/s390.md (UNSPEC_GET_TP): New UNSPEC.
>   (*get_tp_31): New 31-bit splitter for UNSPEC_GET_TP.
>   (*get_tp_64): New 64-bit splitter for UNSPEC_GET_TP.
>   (get_thread_pointer): Use UNSPEC_GET_TP, use
>   parameterized name.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-21  Ilya Leoshkevich  
> 
>   * gcc.target/s390/load-thread-pointer-once-2.c: New test.

Ok. Thanks!

Andreas

96 matches

Mail list logo