Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-17 Thread JeanHeyd Meneide
... And I am very tired and forgot to attach the patch. Again. Sorry...!

On Fri, Oct 18, 2019 at 1:54 AM JeanHeyd Meneide
 wrote:
>
> Dear Jason,
>
> On Thu, Oct 17, 2019 at 3:51 PM Jason Merrill  wrote:
> >  > FAIL: g++.dg/cpp0x/gen-attrs-67.C  -std=c++11  (test for errors, line 8)
> >  > FAIL: g++.dg/cpp1z/feat-cxx1z.C  -std=gnu++17 (test for excess errors)
> >  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11 (test for excess errors)
> >  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 12)
> >  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 13)
> >  > FAIL: g++.dg/cpp2a/feat-cxx2a.C   (test for excess errors)
>
>  Sorry about that! I implemented a bit of a better warning to
> cover gen-attrs-67, and bumped the feature test macro value checks in
> the feat tests. The rest should be fine now too.
>
>  Let me know if anything else seems off!
>
> Best Wishes,
> JeanHeyd Meneide
>
> ===
>
> 2019-10-17  JeanHeyd Meneide  
>
> gcc/
>
> Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
> * escaped_string.h (escaped_string): New header.
> * tree.c (escaped_string): Remove escaped_string class.
>
> gcc/c-family
>
> Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
> * c-lex.c (c_common_has_attribute): Update attribute value.
>
> gcc/cp/
>
> Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
> * tree.c (handle_nodiscard_attribute) Added C++2a nodiscard
> string message.
>   (std_attribute_table) Increase nodiscard argument handling
> max_length from 0
>   to 1.
> * parser.c (cp_parser_check_std_attribute): Add requirement
> that nodiscard
>   only be seen once in attribute-list.
>   (cp_parser_std_attribute): Check that empty parenthesis lists are
>   not specified for attributes that have max_length > 0 (e.g.
> [[attr()]]).
> * cvt.c (maybe_warn_nodiscard): Add nodiscard message to
> output, if applicable.
>   (convert_to_void): Allow constructors to be nodiscard-able
> (fixes paper-as-DR
>   p1771).
>
> gcc/testsuite/g++.dg/cpp0x
>
> Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
> * gen-attrs-67.C: Test new error message for empty-parenthesis-list.
>
> gcc/testsuite/g++.dg/cpp2a
>
> Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
> * nodiscard-construct.C: New test.
> * nodiscard-once.C: New test.
> * nodiscard-reason-nonstring.C: New test.
> * nodiscard-reason-only-one.C: New test.
> * nodiscard-reason.C: New test.
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
  else if (is_attribute_p ("deprecated", attr_name))
result = 201309;
  else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
  else if (is_attribute_p ("no_unique_address", attr_name)
   || is_attribute_p ("likely", attr_name)
   || is_attribute_p ("unlikely", attr_name))
result = 201803;
+ else if (is_attribute_p ("nodiscard", attr_name))
+   result = 201907;
  if (result)
attr_name = NULL_TREE;
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 364af72e68d..4df5fc49048 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,22 +1027,45 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
+  tree attr;
   if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+  && (attr = lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = (has_msg ?
+   G_("ignoring return value of %qD, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of %qD, "
+  "declared with attribute %%s"));
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
  

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-17 Thread JeanHeyd Meneide
Dear Jason,

On Thu, Oct 17, 2019 at 3:51 PM Jason Merrill  wrote:
>  > FAIL: g++.dg/cpp0x/gen-attrs-67.C  -std=c++11  (test for errors, line 8)
>  > FAIL: g++.dg/cpp1z/feat-cxx1z.C  -std=gnu++17 (test for excess errors)
>  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11 (test for excess errors)
>  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 12)
>  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 13)
>  > FAIL: g++.dg/cpp2a/feat-cxx2a.C   (test for excess errors)

 Sorry about that! I implemented a bit of a better warning to
cover gen-attrs-67, and bumped the feature test macro value checks in
the feat tests. The rest should be fine now too.

 Let me know if anything else seems off!

Best Wishes,
JeanHeyd Meneide

===

2019-10-17  JeanHeyd Meneide  

gcc/

Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
* escaped_string.h (escaped_string): New header.
* tree.c (escaped_string): Remove escaped_string class.

gcc/c-family

Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
* c-lex.c (c_common_has_attribute): Update attribute value.

gcc/cp/

Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
* tree.c (handle_nodiscard_attribute) Added C++2a nodiscard
string message.
  (std_attribute_table) Increase nodiscard argument handling
max_length from 0
  to 1.
* parser.c (cp_parser_check_std_attribute): Add requirement
that nodiscard
  only be seen once in attribute-list.
  (cp_parser_std_attribute): Check that empty parenthesis lists are
  not specified for attributes that have max_length > 0 (e.g.
[[attr()]]).
* cvt.c (maybe_warn_nodiscard): Add nodiscard message to
output, if applicable.
  (convert_to_void): Allow constructors to be nodiscard-able
(fixes paper-as-DR
  p1771).

gcc/testsuite/g++.dg/cpp0x

Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
* gen-attrs-67.C: Test new error message for empty-parenthesis-list.

gcc/testsuite/g++.dg/cpp2a

Implements p1301 [[nodiscard("should have a reason")]] + p1771 DR
* nodiscard-construct.C: New test.
* nodiscard-once.C: New test.
* nodiscard-reason-nonstring.C: New test.
* nodiscard-reason-only-one.C: New test.
* nodiscard-reason.C: New test.


[Patch][Demangler] Fix for complex values

2019-10-17 Thread Miguel Saldivar
This is a small fix for Bug 67299, where symbol: `Z1fCf` which would become
`f(float complex)` instead of `f(floatcomplex )`.
I thought this would be the preferred way of printing, because both
`llvm-cxxfilt` and `cpp_filt` both print the the mangled name in this
fashion.

Thanks,
Miguel Saldivar

>From 4ca98c0749bae1389594b31ee7f6ef575aafcd8f Mon Sep 17 00:00:00 2001
From: Miguel Saldivar 
Date: Thu, 17 Oct 2019 16:36:19 -0700
Subject: [PATCH][Demangler] Small fix for complex values

gcc/libiberty/
* cp-demangle.c (d_print_mod): Add a space before printing `complex`
and `imaginary`, as opposed to after.

gcc/libiberty/
* testsuite/demangle-expected: Adjust test.
---
 libiberty/ChangeLog   | 5 +
 libiberty/cp-demangle.c   | 4 ++--
 libiberty/testsuite/demangle-expected | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 97d9767c2ea..62d5527b95b 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,8 @@
+2019-10-17  Miguel Saldivar  
+ * cp-demangle.c (d_print_mod): Add a space before printing `complex`
+ and `imaginary`, as opposed to after.
+ * testsuite/demangle-expected: Adjust test.
+
 2019-10-03  Eduard-Mihai Burtescu  

  * rust-demangle.c (looks_like_rust): Remove.
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index aa78c86dd44..bd4dfb785a9 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -5977,10 +5977,10 @@ d_print_mod (struct d_print_info *dpi, int options,
   d_append_string (dpi, "&&");
   return;
 case DEMANGLE_COMPONENT_COMPLEX:
-  d_append_string (dpi, "complex ");
+  d_append_string (dpi, " complex");
   return;
 case DEMANGLE_COMPONENT_IMAGINARY:
-  d_append_string (dpi, "imaginary ");
+  d_append_string (dpi, " imaginary");
   return;
 case DEMANGLE_COMPONENT_PTRMEM_TYPE:
   if (d_last_char (dpi) != '(')
diff --git a/libiberty/testsuite/demangle-expected
b/libiberty/testsuite/demangle-expected
index f21ed00e559..43f003655b2 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -1278,7 +1278,7 @@ int& int_if_addable(A*)
 #
 --format=gnu-v3
 _Z3bazIiEvP1AIXszcl3foocvT__ELCf_
-void baz(A*)
+void baz(A*)
 #
 --format=gnu-v3
 _Z3fooI1FEN1XIXszdtcl1PclcvT__EEE5arrayEE4TypeEv
--
2.23.0


Re: [SVE] PR86753

2019-10-17 Thread Prathamesh Kulkarni
On Wed, 16 Oct 2019 at 04:19, Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
> >>  wrote:
> >> >
> >> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
> >> >  wrote:
> >> > >
> >> > > Leaving the main review to Richard, just some comments...
> >> > >
> >> > > Prathamesh Kulkarni  writes:
> >> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info 
> >> > > > *vinfo,
> >> > > >
> >> > > > When STMT_INFO is vectorized as a nested cycle, for_reduction is 
> >> > > > true.
> >> > > >
> >> > > > +   For COND_EXPR if T comes from masked load, and is 
> >> > > > conditional
> >> > > > +   on C, we apply loop mask to result of vector comparison, if it's 
> >> > > > present.
> >> > > > +   Similarly for E, if it is conditional on !C.
> >> > > > +
> >> > > > Return true if STMT_INFO is vectorizable in this way.  */
> >> > > >
> >> > > >  bool
> >> > >
> >> > > I think this is a bit misleading.  But IMO it'd be better not to have
> >> > > a comment here and just rely on the one in the main function body.
> >> > > This optimisation isn't really changing the vectorisation strategy,
> >> > > and the comment could easily get forgotten if things change in future.
> >> > >
> >> > > > [...]
> >> > > > @@ -,6 +10006,35 @@ vectorizable_condition (stmt_vec_info 
> >> > > > stmt_info, gimple_stmt_iterator *gsi,
> >> > > >/* Handle cond expr.  */
> >> > > >for (j = 0; j < ncopies; j++)
> >> > > >  {
> >> > > > +  tree loop_mask = NULL_TREE;
> >> > > > +  bool swap_cond_operands = false;
> >> > > > +
> >> > > > +  /* Look up if there is a loop mask associated with the
> >> > > > +  scalar cond, or it's inverse.  */
> >> > >
> >> > > Maybe:
> >> > >
> >> > >See whether another part of the vectorized code applies a loop
> >> > >mask to the condition, or to its inverse.
> >> > >
> >> > > > +
> >> > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> >> > > > + {
> >> > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> >> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> >> > > > + {
> >> > > > +   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> >> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> >> > > > vectype, j);
> >> > > > + }
> >> > > > +   else
> >> > > > + {
> >> > > > +   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> >> > > > +   cond.code = invert_tree_comparison (cond.code, 
> >> > > > honor_nans);
> >> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> >> > > > + {
> >> > > > +   vec_loop_masks *masks = _VINFO_MASKS 
> >> > > > (loop_vinfo);
> >> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> >> > > > +   vectype, j);
> >> > > > +   cond_code = cond.code;
> >> > > > +   swap_cond_operands = true;
> >> > > > + }
> >> > > > + }
> >> > > > + }
> >> > > > +
> >> > > >stmt_vec_info new_stmt_info = NULL;
> >> > > >if (j == 0)
> >> > > >   {
> >> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info 
> >> > > > stmt_info, gimple_stmt_iterator *gsi,
> >> > > >   }
> >> > > >   }
> >> > > >   }
> >> > > > +
> >> > > > +   /* If loop mask is present, then AND it with
> >> > >
> >> > > Maybe "If we decided to apply a loop mask, ..."
> >> > >
> >> > > > +  result of vec comparison, so later passes (fre4)
> >> > >
> >> > > Probably better not to name the pass -- could easily change in future.
> >> > >
> >> > > > +  will reuse the same condition used in masked load.
> >> > >
> >> > > Could be a masked store, or potentially other things too.
> >> > > So maybe just "will reuse the masked condition"?
> >> > >
> >> > > > +
> >> > > > +  For example:
> >> > > > +  for (int i = 0; i < 100; ++i)
> >> > > > +x[i] = y[i] ? z[i] : 10;
> >> > > > +
> >> > > > +  results in following optimized GIMPLE:
> >> > > > +
> >> > > > +  mask__35.8_43 = vect__4.7_41 != { 0, ... };
> >> > > > +  vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> >> > > > +  _19 = [base: z_12(D), index: ivtmp_56, step: 4, 
> >> > > > offset: 0B];
> >> > > > +  vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> >> > > > +  vect_iftmp.12_52 = VEC_COND_EXPR  >> > > > +vect_iftmp.11_47, { 10, 
> >> > > > ... }>;
> >> > > > +
> >> > > > +  instead of recomputing vec != { 0, ... } in vec_cond_expr 
> >> > > >  */
> >> > >
> >> > > That's true, but gives the impression that avoiding the vec != { 0, 
> >> > > ... }
> >> > > is the main goal, whereas 

Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-17 Thread Feng Xue OS
> I noticed similar issue when analyzing the SPEC, self-recursive function is
> not versioned and posted my observations in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92074.

> Generally, this could be implemented well by your patch, while I am
> wondering whether it is OK to convert the recursive function to
> non-recursive function in a independent pass after ipa-cp and ipa-sra instead
> of reuse the ipa-cp framework?
> The reason is sometimes the argument is passed-by-reference, and
> ipa-sra runs after ipa-cp, so this kind of optimization may not be done in
> WPA.  What's your idea about this, please?   Thanks.

Function versioning is done in ipa-cp, there is nothing special for recursive 
function.
So adding a dedicated pass for recursive seems to be redundant.

We might not need to resort to ipa-sra to resolve concern you mentioned. 
Original
ipa-cp already supports a simple kind of propagation on by-ref argument, who 
must
be defined by constant. And for an extended form as:  *arg = *param OP 
constant,  I've
created a tracker PR91682,  also composed a patch:
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01189.html.

Feng

fruit juice

2019-10-17 Thread cherry029
Dear Sir

I am from Xi'an, China. We are very interested in your juice drink. If you can, 
please send me a photo or catalog and look forward to working with you!



Best regards


 Ms Ariel


Mol:0086-13022879793








--




Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-17 Thread luoxhu
Hi Feng,

On 2019/10/17 16:23, Feng Xue OS wrote:
> IPA does not allow constant propagation on parameter that is used to control
> function recursion.
> 
> recur_fn (i)
> {
>if ( !terminate_recursion (i))
>  {
>...
>recur_fn (i + 1);
>...
>  }
>...
> }
> 
> This patch is composed to enable multi-versioning for self-recursive function,
> and versioning copies is limited by a specified option.

I noticed similar issue when analyzing the SPEC, self-recursive function is
not versioned and posted my observations in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92074.

Generally, this could be implemented well by your patch, while I am
wondering whether it is OK to convert the recursive function to
non-recursive function in a independent pass after ipa-cp and ipa-sra instead
of reuse the ipa-cp framework?
The reason is sometimes the argument is passed-by-reference, and
ipa-sra runs after ipa-cp, so this kind of optimization may not be done in
WPA.  What's your idea about this, please?   Thanks. 


Thanks
Xiong Hu

> 
> Feng
> ---
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 045072e02ec..6255a766e4d 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -229,7 +229,9 @@ public:
> inline bool set_contains_variable ();
> bool add_value (valtype newval, cgraph_edge *cs,
> ipcp_value *src_val = NULL,
> -   int src_idx = 0, HOST_WIDE_INT offset = -1);
> +   int src_idx = 0, HOST_WIDE_INT offset = -1,
> +   ipcp_value **val_pos_p = NULL,
> +   bool unlimited = false);
> void print (FILE * f, bool dump_sources, bool dump_benefits);
>   };
>   
> @@ -1579,22 +1581,37 @@ allocate_and_init_ipcp_value 
> (ipa_polymorphic_call_context source)
>   /* Try to add NEWVAL to LAT, potentially creating a new ipcp_value for it.  
> CS,
>  SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
>  meaning.  OFFSET -1 means the source is scalar and not a part of an
> -   aggregate.  */
> +   aggregate.  If non-NULL, VAL_POS_P specifies position in value list,
> +   after which newly created ipcp_value will be inserted, and it is also
> +   used to record address of the added ipcp_value before function returns.
> +   UNLIMITED means whether value count should not exceed the limit given
> +   by PARAM_IPA_CP_VALUE_LIST_SIZE.  */
>   
>   template 
>   bool
>   ipcp_lattice::add_value (valtype newval, cgraph_edge *cs,
> ipcp_value *src_val,
> -   int src_idx, HOST_WIDE_INT offset)
> +   int src_idx, HOST_WIDE_INT offset,
> +   ipcp_value **val_pos_p,
> +   bool unlimited)
>   {
> ipcp_value *val;
>   
> +  if (val_pos_p)
> +{
> +  for (val = values; val && val != *val_pos_p; val = val->next);
> +  gcc_checking_assert (val);
> +}
> +
> if (bottom)
>   return false;
>   
> for (val = values; val; val = val->next)
>   if (values_equal_for_ipcp_p (val->value, newval))
> {
> + if (val_pos_p)
> +   *val_pos_p = val;
> +
>   if (ipa_edge_within_scc (cs))
> {
>   ipcp_value_source *s;
> @@ -1609,7 +1626,8 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>   return false;
> }
>   
> -  if (values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
> +  if (!unlimited
> +  && values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
>   {
> /* We can only free sources, not the values themselves, because 
> sources
>of other values in this SCC might point to them.   */
> @@ -1623,6 +1641,9 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>   }
>   }
>   
> +  if (val_pos_p)
> + *val_pos_p = NULL;
> +
> values = NULL;
> return set_to_bottom ();
>   }
> @@ -1630,8 +1651,54 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
> values_count++;
> val = allocate_and_init_ipcp_value (newval);
> val->add_source (cs, src_val, src_idx, offset);
> -  val->next = values;
> -  values = val;
> +  if (val_pos_p)
> +{
> +  val->next = (*val_pos_p)->next;
> +  (*val_pos_p)->next = val;
> +  *val_pos_p = val;
> +}
> +  else
> +{
> +  val->next = values;
> +  values = val;
> +}
> +
> +  return true;
> +}
> +
> +/* Return true, if a ipcp_value VAL is orginated from parameter value of
> +   self-feeding recursive function by applying non-passthrough arithmetic
> +   transformation.  */
> +
> +static bool
> +self_recursively_generated_p (ipcp_value *val)
> +{
> +  class ipa_node_params *info = NULL;
> +
> +  for (ipcp_value_source *src = val->sources; src; src = src->next)
> +{
> +  cgraph_edge *cs = src->cs;
> +
> +  if (!src->val || cs->caller != cs->callee->function_symbol ()
> +   || src->val == val)
> + return false;
> +
> + 

[SVE] PR91272

2019-10-17 Thread Prathamesh Kulkarni
Hi,
The attached patch tries to fix PR91272.
Does it look OK ?

With patch, I see following failures for aarch64-sve.exp:
FAIL: gcc.target/aarch64/sve/clastb_1.c -march=armv8.2-a+sve
scan-assembler \\tclastb\\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\\.s
FAIL: gcc.target/aarch64/sve/clastb_2.c -march=armv8.2-a+sve
scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.s
FAIL: gcc.target/aarch64/sve/clastb_3.c -march=armv8.2-a+sve
scan-assembler \\tclastb\\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\\.b
FAIL: gcc.target/aarch64/sve/clastb_5.c -march=armv8.2-a+sve
scan-assembler \\tclastb\\tx[0-9]+, p[0-7], x[0-9]+, z[0-9]+\\.d

For instance, in clastb_1.c, it now emits:
clastb  s1, p1, s1, z0.s
while using a fully predicated loop.
Should I adjust the tests ?

Thanks,
Prathamesh
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
index d4f9b0b6a94..6e69b264e9b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #define N 32
 
@@ -17,4 +17,5 @@ condition_reduction (int *a, int min_v)
   return last;
 }
 
+/* { dg-final { scan-tree-dump "using a fully-masked loop." "vect" } } */
 /* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\.s} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_2.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_2.c
index 2c49bd3b0f0..d1a743972a7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_2.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #include 
 
@@ -23,4 +23,5 @@ condition_reduction (TYPE *a, TYPE min_v)
   return last;
 }
 
+/* { dg-final { scan-tree-dump "using a fully-masked loop." "vect" } } */
 /* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\.s} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_3.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_3.c
index 35344f446c6..71e85c03cc0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_3.c
@@ -1,8 +1,9 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #define TYPE uint8_t
 
 #include "clastb_2.c"
 
+/* { dg-final { scan-tree-dump "using a fully-masked loop." "vect" } } */
 /* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\.b} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_4.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_4.c
index ce58abd6161..b4db170ea06 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_4.c
@@ -1,8 +1,9 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #define TYPE int16_t
 
 #include "clastb_2.c"
 
+/* { dg-final { scan-tree-dump "using a fully-masked loop." "vect" } } */
 /* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\.h} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_5.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_5.c
index 2b9783d6627..878d9f60913 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_5.c
@@ -1,8 +1,9 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #define TYPE uint64_t
 
 #include "clastb_2.c"
 
+/* { dg-final { scan-tree-dump "using a fully-masked loop." "vect" } } */
 /* { dg-final { scan-assembler {\tclastb\tx[0-9]+, p[0-7], x[0-9]+, z[0-9]+\.d} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_6.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_6.c
index c47d303f730..38632a21be1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_6.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #define N 32
 
@@ -21,4 +21,5 @@ condition_reduction (TYPE *a, TYPE min_v)
   return last;
 }
 
+/* { dg-final { scan-tree-dump "using a fully-masked loop." "vect" } } */
 /* { dg-final { scan-assembler 

Re: [PATCH] Fix objsz ICE (PR tree-optimization/92056)

2019-10-17 Thread Martin Sebor

On 10/17/19 1:00 AM, Jakub Jelinek wrote:

Hi!

The following bug has been introduced when cond_expr_object_size has been
added in 2007.  We want to treat a COND_EXPR like a PHI with 2 arguments,
and PHI is handled in a loop that breaks if the lhs value is unknown, and
then does the if (TREE_CODE (arg) == SSA_NAME) merge_object_sizes else
expr_object_size which is used even in places that handle just a single
operand (with the lhs value initialized to the opposite value of unknown
first).  At least expr_object_size asserts that the lhs value is not
unknown at the start.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


I'm not sure the other change (r277134) it the right way to fix
the problem with the missing initialization.  It was introduced
with the merger of the sprintf pass.  The latter still calls
init_object_sizes in get_destination_size.  I think the call
should be moved from there into the new combined sprintf/strlen
printf_strlen_execute function that also calls fini_object_sizes,
and the one from determine_min_objsize should be removed.  I can
take care of it unless you think it needs to stay the way it is
now for some reason.

Martin



2019-10-17  Jakub Jelinek  

PR tree-optimization/92056
* tree-object-size.c (cond_expr_object_size): Return early if then_
processing resulted in unknown size.

* gcc.c-torture/compile/pr92056.c: New test.

--- gcc/tree-object-size.c.jj   2019-10-05 09:35:14.895967464 +0200
+++ gcc/tree-object-size.c  2019-10-16 15:34:11.414769994 +0200
@@ -903,6 +903,9 @@ cond_expr_object_size (struct object_siz
else
  expr_object_size (osi, var, then_);
  
+  if (object_sizes[object_size_type][varno] == unknown[object_size_type])

+return reexamine;
+
if (TREE_CODE (else_) == SSA_NAME)
  reexamine |= merge_object_sizes (osi, var, else_, 0);
else
--- gcc/testsuite/gcc.c-torture/compile/pr92056.c.jj2019-10-16 
15:42:56.042848440 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr92056.c   2019-10-16 
15:42:41.595066602 +0200
@@ -0,0 +1,18 @@
+/* PR tree-optimization/92056 */
+
+const char *d;
+
+void
+foo (int c, char *e, const char *a, const char *b)
+{
+  switch (c)
+{
+case 33:
+  for (;; d++)
+if (__builtin_strcmp (b ? : "", d))
+  return;
+  break;
+case 4:
+  __builtin_sprintf (e, a);
+}
+}

Jakub





[PATCH][Demangler] Small fix for complex values

2019-10-17 Thread Miguel Saldivar
This is a small for Bug 67299, where symbol: `Z1fCf` becomes `f(float
complex)` rather than:
`f(floatcomplex )`. I thought this would be the preferred way of printing,
because both `llvm-cxxfilt` and `cpp_filt` both print the former.

Thanks,
Miguel Saldivar


Re: {PATCH v3, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-17 Thread Segher Boessenkool
Hi Kelvin,

On Wed, Oct 09, 2019 at 03:28:45PM -0500, Kelvin Nilsen wrote:
> This new pass scans existing rtl expressions and replaces them with rtl 
> expressions that favor selection of the D-form instructions in contexts for 
> which the D-form instructions are preferred.  The new pass runs after the RTL 
> loop optimizations since loop unrolling often introduces opportunities for 
> beneficial replacements of X-form addressing instructions.
> 
> For each of the new tests, multiple X-form instructions are replaced with 
> D-form instructions, some addi instructions are replaced with add 
> instructions, and some addi instructions are eliminated.  The typical 
> improvement for the included tests is a decrease of 4.28% to 12.12% in the 
> number of instructions executed on each iteration of the loop.  The 
> optimization has not shown measurable improvement on specmark tests, 
> presumably because the typical loops that are benefited by this optimization 
> are memory bounded and this optimization does not eliminate memory loads or 
> stores.  However, it is anticipated that multi-threaded workloads and 
> measurements of total power and cooling costs for heavy server workloads 
> would benefit.

My first question is, why did ivopts choose the suboptimal solution?
_Did_ it, or did something later mess things up?

This new pass can help us investigate that.  It certainly sounds like we
could do better earlier already.

I think it is a good design to make fixes late in the pass pipeline, *but*
we should try to make good choices earlier, too -- the "late tweaks" should
be just that, tweaks; 4%-12% is a bit much.

(It's not that super late here; but still, why does it help so much?)

> 2. Improved comments and added discussion of computational complexity.

It's really not good at all to have anything that is quadratic in the size
of the program, or the size of a function, or the size of a basic block:
there always show up real programs which then take approximately infinitely
long to compile.

If there are good arguments why some parameter can not be bigger than 100
in reality, or maybe even 1000, it is different of course; but things like
number of function, number of basic blocks, or number of instructions (per
function or bb or loop) are not naturally limited.

> 5. Refactored the code to divide into smaller functions and provide more 
> descriptive commentary.

Many thanks for this :-)

> +   This pass replaces the above-matched sequences with:
> +
> +   Ai: derived_pointer = array_base + offset
> +   *(derived_pointer)
> +
> +   Aij: leave these alone.  expect that subsequent optimization deletes
> +this code as it may become dead (since we don't use the
> +indexing expression following our code transformations.)
> +
> +   Ai:
> +   *(derived_pointer + constant_i)
> + (where constant_i equals sum of constant (n,j) for all n from 1
> +  to i paired with all j from 1 to Kn,

So if I understand this correctly, if the code is

  x0 = [base+8]
  x1 = [base]
  x2 = [base+16]

this pass will change it to

  p = base+8
  x0 = [p]
  x1 = [p-8]
  x2 = [p+8]

Should it always pick the first access as the new base pointer?  Should it
use the lowest offset instead?

(Maybe the code does something more advanced than picking the first; not
clear from this comment though).

> +class indexing_web_entry: public web_entry_base
> +{
> + public:
> +  rtx_insn *insn;/* Pointer to the insn */
> +  basic_block bb;/* Pointer to the enclosing basic block */

The rest of the fields have the comment before the declaration.  I would
just lose the comments here though: if something called "insn" is not an
insn, or something called "bb" is not a block, ... :-)

> +  /* A unique sequence number is assigned to each instruction for the
> + purpose of simplifying domination tests.  Within each basic
> + block, sequence numbers areassigned in strictly increasing order.
> + Thus, for any two instructions known to reside in the same basic
> + block, the instruction with a lower insn_sequence_no is kknown
> + to dominate the instruction with a higher insn_sequence_no.  */
> +  unsigned int insn_sequence_no;

Many existing passes call this "luid" (for "local unique id").

(Typos: "are assigned", "known").

> +  /* If this insn is relevant, it is a load or store with a memory
> + address that is comprised of a base pointer (e.g. the address of
> + an array or array slice) and an index expression (e.g. an index
> + within the array).  The original_base_use and original_index_use
> + fields represent the numbers of the instructions that define the
> + base and index values which are summed together with a constant
> + value to determine the value of this instruction's memory
> + address.  */
> +  unsigned int original_base_use;
> +  unsigned int original_index_use;

I wonder how you determine what is base and what is index?

(I'll review the rest 

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-17 Thread Jakub Jelinek
On Wed, Oct 16, 2019 at 04:36:07PM -0400, Jason Merrill wrote:
> > As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
> > anything, the diagnostics was still
> > constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
> > constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
> > constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
> >  5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
> > "inline assembly is not a constant expression" }
> >|  ^~~
> > constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed 
> > in a ‘constexpr’ function in C++2a
> > as without that change.
> 
> That's because the patch changes EXPR_LOCATION for evaluation of the
> CLEANUP_BODY, but it should be for evaluation of CLEANUP_EXPR instead.

Indeed, that works too.  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2019-10-18  Jakub Jelinek  

* constexpr.c (cxx_eval_constant_expression) :
Temporarily change input_location to CLEANUP_STMT location.

* g++.dg/cpp2a/constexpr-dtor3.C: Expect in 'constexpr' expansion of
message on the line with variable declaration.
* g++.dg/ext/constexpr-attr-cleanup1.C: Likewise.

--- gcc/cp/constexpr.c.jj   2019-10-17 00:15:50.126726231 +0200
+++ gcc/cp/constexpr.c  2019-10-17 11:21:34.400062565 +0200
@@ -4984,14 +4984,20 @@ cxx_eval_constant_expression (const cons
  non_constant_p, overflow_p,
  jump_target);
if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
- /* Also evaluate the cleanup.  If we weren't skipping at the
-start of the CLEANUP_BODY, change jump_target temporarily
-to _jump_target, so that even a return or break or
-continue in the body doesn't skip the cleanup.  */
- cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), true,
-   non_constant_p, overflow_p,
-   jump_target ? _jump_target
-   : NULL);
+ {
+   location_t loc = input_location;
+   if (EXPR_HAS_LOCATION (t))
+ input_location = EXPR_LOCATION (t);
+   /* Also evaluate the cleanup.  If we weren't skipping at the
+  start of the CLEANUP_BODY, change jump_target temporarily
+  to _jump_target, so that even a return or break or
+  continue in the body doesn't skip the cleanup.  */
+   cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), true,
+ non_constant_p, overflow_p,
+ jump_target ? _jump_target
+ : NULL);
+   input_location = loc;
+ }
   }
   break;
 
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C.jj 2019-10-17 
00:15:49.425736657 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor3.C2019-10-17 
11:20:13.977290046 +0200
@@ -149,7 +149,7 @@ constexpr int x3 = f3 ();
 constexpr int
 f4 ()
 {
-  W7 w13 = 5;
+  W7 w13 = 5;  // { dg-message "in 'constexpr' expansion of" }
   return 0;
 }
 
--- gcc/testsuite/g++.dg/ext/constexpr-attr-cleanup1.C.jj   2019-10-03 
00:32:15.604526950 +0200
+++ gcc/testsuite/g++.dg/ext/constexpr-attr-cleanup1.C  2019-10-18 
00:18:50.248166117 +0200
@@ -15,7 +15,7 @@ cleanup2 (int *x)
 constexpr bool
 foo ()
 {
-  int a __attribute__((cleanup (cleanup))) = 1;
+  int a __attribute__((cleanup (cleanup))) = 1;// { dg-message "in 
'constexpr' expansion of" }
   return true;
 }
 


Jakub


Re: [PATCH] RISC-V: Include more registers in SIBCALL_REGS.

2019-10-17 Thread Andrew Burgess
* Jim Wilson  [2019-10-17 14:55:34 -0700]:

> On Thu, Oct 17, 2019 at 7:09 AM Andrew Burgess
>  wrote:
> > I'm still working on part 2, I'm hoping to have a revised patch posted
> > by Monday next week.
> 
> I started looking at the part 2 patch also.  I noticed a problem where
> the NOTE_INSN_EPILOGUE_BEGIN can occur before the
> NOTE_INSN_PROLOGUE_END due to branch and basic block optimizations, I
> get an ICE in this case due to deref of a null pointer.  This is
> pretty easy to fix, I just added a check to
> riscv_remove_unneeded_save_restore_calls in the loop that searches for
> NOTE_INSN_PROLOGUE_END, and if I find NOTE_INSN_EPILOGUE_BEGIN first I
> exit without doing anything.  A more complex fix would be to try to
> follow the CFG instead of scanning forward from the prologue end note
> to find the epilogue begin note.  I also noticed a case where the
> __riscv_save_0 call was optimized away but the __riscv_restore_0 call
> was not.  In this case the function has two epilogues, and only one of
> the two epilogues was optimized.  We could just check for more than
> one epilogue and return without doing any work as the simple fix.  Or
> a more complex fix is to try to handle more than one epilogue.
> 
> Given the kinds of problems I'm seeing, I think there should be an
> option to control this optimization, so people can turn it off is
> necessary.  I think it is OK if this is on by default if it passes the
> gcc testsuite.
> 
> Since you are looking at this, I can look at something else for a few
> days.  I was just trying to get this off of my backlog.

It's entirely up to you.  I believe I have a version testing now that
addresses all of the issues you mentioned above.  I haven't added a
switch for it, but can do if you want.

My expectation was that I would post a version on Monday that had zero
regressions when run with -msave-restore forced on (as you described
in a previous mail).

I'll continue to work on this and post on Monday unless you drop a
revision earlier.

Thanks,

Andrew


Re: [PATCH] RISC-V: Include more registers in SIBCALL_REGS.

2019-10-17 Thread Jim Wilson
On Thu, Oct 17, 2019 at 7:09 AM Andrew Burgess
 wrote:
> I'm still working on part 2, I'm hoping to have a revised patch posted
> by Monday next week.

I started looking at the part 2 patch also.  I noticed a problem where
the NOTE_INSN_EPILOGUE_BEGIN can occur before the
NOTE_INSN_PROLOGUE_END due to branch and basic block optimizations, I
get an ICE in this case due to deref of a null pointer.  This is
pretty easy to fix, I just added a check to
riscv_remove_unneeded_save_restore_calls in the loop that searches for
NOTE_INSN_PROLOGUE_END, and if I find NOTE_INSN_EPILOGUE_BEGIN first I
exit without doing anything.  A more complex fix would be to try to
follow the CFG instead of scanning forward from the prologue end note
to find the epilogue begin note.  I also noticed a case where the
__riscv_save_0 call was optimized away but the __riscv_restore_0 call
was not.  In this case the function has two epilogues, and only one of
the two epilogues was optimized.  We could just check for more than
one epilogue and return without doing any work as the simple fix.  Or
a more complex fix is to try to handle more than one epilogue.

Given the kinds of problems I'm seeing, I think there should be an
option to control this optimization, so people can turn it off is
necessary.  I think it is OK if this is on by default if it passes the
gcc testsuite.

Since you are looking at this, I can look at something else for a few
days.  I was just trying to get this off of my backlog.

Jim


Re: [patch,testuite]: Fix some fllout for small targets.

2019-10-17 Thread Jeff Law
On 10/17/19 9:32 AM, Georg-Johann Lay wrote:
> Hi, this fixes some FAILs for small targets, fixed or skipped by
> size32plus, double64[plus] etc.
> 
> Ok to apply?
> 
> Johann
> 
> Fix some fallout for small targets.
> 
> * gcc.c-torture/execute/20190820-1.c:
> Add dg-require-effective-target int32plus.
> * gcc.c-torture/execute/pr85331.c
> Add dg-require-effective-target double64plus.
> * gcc.dg/pow-sqrt-1.c: Same.
> * gcc.dg/pow-sqrt-2.c: Same.
> * gcc.dg/pow-sqrt-3.c: Same.
> * gcc.c-torture/execute/20190901-1.c: Same.
> * gcc.c-torture/execute/user-printf.c [avr]: Skip.
> * gcc.c-torture/execute/fprintf-2.c [avr]: Skip.
> * gcc.c-torture/execute/printf-2.c [avr]: Skip.
> * gcc.dg/Wlarger-than3.c [avr]: Skip.
> * gcc.c-torture/execute/ieee/20041213-1.c (sqrt)
> [avr,double=float]: Provide custom prototype.
> * gcc.dg/pr36017.c: Same.
> * gcc.c-torture/execute/pr90025.c: Use 32-bit int.
> * gcc.dg/complex-7.c: Add dg-require-effective-target double64.
> * gcc.dg/loop-versioning-1.c:
> Add dg-require-effective-target size32plus.
> * gcc.dg/loop-versioning-2.c: Same.
OK
jeff


Re: [patch,avr]: PR86040: Fix missing reset of RAMPZ after ELPM.

2019-10-17 Thread Jeff Law
On 10/17/19 5:26 AM, Georg-Johann Lay wrote:
> Hi, for families avrxmega5/7 after ELPM the reset of RAMPZ to
> zero was missing in some situations due to  shortcut-return in
> avr_out_lpm which which bypassed that reset.
> 
> Ok to apply and backport?
> 
> Johann
> 
> PR target/86040
> * config/avr/avr.c (avr_out_lpm): Do not shortcut-return.
OK for applying and backporting.
jeff


Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-17 Thread Jason Merrill

On 10/17/19 3:12 PM, JeanHeyd Meneide wrote:

 * tree.c (handle_nodiscard_attribute): Implements p1301
   [[nodiscard("should have a reason")]].


This should be a heading at the top rather than specific to this 
file/function.


It looks like you didn't run the testsuite, I'm seeing several new failures:

> FAIL: g++.dg/cpp0x/gen-attrs-67.C  -std=c++11  (test for errors, line 8)
> FAIL: g++.dg/cpp1z/feat-cxx1z.C  -std=gnu++17 (test for excess errors)
> FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11 (test for excess errors)
> FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 12)
> FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 13)
> FAIL: g++.dg/cpp2a/feat-cxx2a.C   (test for excess errors)

Jason


[PATCH] * .gitattributes: Avoid {} in filename pattern.

2019-10-17 Thread Jason Merrill
Brace-expansion is a bash feature, not part of glob(7).

Applying to trunk.
---
 .gitattributes | 6 +-
 ChangeLog  | 4 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/.gitattributes b/.gitattributes
index b38d7f1b43b..183fdcaaa9a 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1 +1,5 @@
-*.{c,C,cc,h} whitespace=indent-with-non-tab,space-before-tab,trailing-space
+# Add indent-with-non-tab to the default git whitespace warnings.
+# Note that this file doesn't support bash-style brace expansion.
+
+*.[cCh] whitespace=indent-with-non-tab,space-before-tab,trailing-space
+*.cc whitespace=indent-with-non-tab,space-before-tab,trailing-space
diff --git a/ChangeLog b/ChangeLog
index 90413f57284..5487226c989 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2019-10-17  Jason Merrill  
+
+   * .gitattributes: Avoid {} in filename pattern.
+
 2019-10-08  Thomas Schwinge  
 
* MAINTAINERS: Add back Trevor Smigiel; move into Write After

base-commit: f0899489a4299d6270437f8be0d7989d91c44c88
-- 
2.18.1



Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-17 Thread JeanHeyd Meneide
2019-10-17  JeanHeyd Meneide  

gcc/

* escaped_string.h (escaped_string): New header.
* tree.c (escaped_string): Remove escaped_string class.

gcc/c-family

* c-lex.c (c_common_has_attribute): Update attribute value.

gcc/cp/

* tree.c (handle_nodiscard_attribute): Implements p1301
  [[nodiscard("should have a reason")]].
  (handle_nodiscard_attribute) Added C++2a nodiscard string message.
  (std_attribute_table) Increase nodiscard argument handling
max_length from 0
  to 1.
* parser.c (cp_parser_check_std_attribute): Add requirement
that nodiscard
  only be seen once in attribute-list.
* cvt.c (maybe_warn_nodiscard): Add nodiscard message to
output, if applicable.
  (convert_to_void): Allow constructors to be nodiscard-able
(fixes paper-as-DR
  p1771).

gcc/testsuite/g++.dg/cpp2a

* nodiscard-construct.C: New test.
* nodiscard-once.C: New test.
* nodiscard-reason-nonstring.C: New test.
* nodiscard-reason-only-one.C: New test.
* nodiscard-reason.C: New test.
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
  else if (is_attribute_p ("deprecated", attr_name))
result = 201309;
  else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
  else if (is_attribute_p ("no_unique_address", attr_name)
   || is_attribute_p ("likely", attr_name)
   || is_attribute_p ("unlikely", attr_name))
result = 201803;
+ else if (is_attribute_p ("nodiscard", attr_name))
+   result = 201907;
  if (result)
attr_name = NULL_TREE;
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 364af72e68d..f2d2ba6cafb 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,22 +1027,45 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
+  tree attr;
   if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+  && (attr = lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = (has_msg ?
+   G_("ignoring return value of %qD, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of %qD, "
+  "declared with attribute %%s"));
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute nodiscard", fn))
+ format, fn, raw_msg))
inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
+  && (attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES 
(rettype
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = has_msg ?
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %%s");
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring returned value of type %qT, "
- "declared with attribute nodiscard", rettype))
+ format, rettype, raw_msg))
{
  if (fn)
inform (DECL_SOURCE_LOCATION (fn),
@@ -1180,7 +1204,7 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
 instantiations be affected by an ABI property that is, or at
 least 

Re: Type representation in CTF and DWARF

2019-10-17 Thread Nick Alcock
On 17 Oct 2019, Richard Biener verbalised:

> On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock  wrote:
>>
>> On 11 Oct 2019, Indu Bhagat stated:
>> > Compile with -g -gdwarf-like-ctf and use dwz -o   
>> > (using
>> > dwz compiled from the master branch) on the generated binaries:
>> >
>> > (coreutils-0.22)
>> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
>> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
>> > ls   30616   |1136   |21098   | 26240  
>> >  | 0.62
>> > pwd  10734   |788|10433   | 13929  
>> >  | 0.83
>> > groups 10706 |811|10249   | 13378  
>> >  | 0.80
>> >
>> > (emacs-26.3)
>> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
>> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
>> > emacs-26.3.1 674657  |6402   |   273963   |   273910   
>> >  | 0.33
>
> Btw, for a fair comparison you have to remove all DW_TAG_subroutine
> children as well since CTF doesn't represent scopes or local variables
> at all (nor types only used by locals). It seems CTF only represents
> function entry points.

Good point: I'll have to hack up a DWARF trimmer to do this comparison
properly, I think. (Though CTF does represent global variables,
including file-scope statics.)

In most cases local types etc are a fairly small contributor to the
total volume -- but macros can contribute a lot in some codebases. (The
Linux kernel's READ_ONCE macro is one I've personally been bitten by in
the past, with a new local struct in every use. GCC doesn't deduplicate
any of those so the resulting bloat from tens of thousands of instances
of this identical structure is quite incredible...)


Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2019 at 02:38:32PM -0400, Jason Merrill wrote:
> On 10/16/19 5:36 PM, JeanHeyd Meneide wrote:
> > Thanks, Jason! I fixed those last things and I put the changelog below
> > in the e-mail. I'll figure out how to write a good changelog in a
> > commit message on the command line soon. :D
> 
> In the e-mail like this is fine, thanks.
> 
> > 2019-10-16  JeanHeyd Meneide 

Also, two spaces before < rather than one.

> > 
> > gcc/
> > 
> >  * escaped_string.h: New. Refactored out of tree.c to make more

And here just New file. or New header. or New., no need to give reasons.

> >  broadly available (e.g. to parser.c, cvt.c).
> >  * tree.c: remove escaped_string class

Jakub


Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-17 Thread Jason Merrill

On 10/16/19 5:36 PM, JeanHeyd Meneide wrote:

Thanks, Jason! I fixed those last things and I put the changelog below
in the e-mail. I'll figure out how to write a good changelog in a
commit message on the command line soon. :D


In the e-mail like this is fine, thanks.


2019-10-16  JeanHeyd Meneide 

gcc/

 * escaped_string.h: New. Refactored out of tree.c to make more
 broadly available (e.g. to parser.c, cvt.c).
 * tree.c: remove escaped_string class


Descriptions should be capitalized and end with a period, e.g. "Remove 
escaped_string class."



gcc/c-family
 * c-lex.c - update attribute value
gcc/cp/
 * tree.c: Implement p1301 - nodiscard("should have a reason"))
 Added C++2a nodiscard string message handling.
 Increase nodiscard argument handling max_length from 0
 to 1.
 * parser.c: add requirement that nodiscard only be seen
 once in attribute-list
 * cvt.c: add nodiscard message to output, if applicable


These changes all need to mention what definition they are changing, e.g.

* tree.c (handle_nodiscard_attribute): Handle C++2a nodiscard string 
message.

(std_attribute_table): Increase nodiscard max_length to 1.

Jason



Re: [PATCH] LRA: side_effects_p stmts' output is not invariant (PR89721)

2019-10-17 Thread Vladimir Makarov

On 10/17/19 2:09 PM, Segher Boessenkool wrote:

On Fri, Mar 15, 2019 at 05:14:48PM -0500, Segher Boessenkool wrote:

On Fri, Mar 15, 2019 at 04:25:01PM -0400, Vladimir Makarov wrote:

On 2019-03-15 2:30 p.m., Segher Boessenkool wrote:

PR89721 shows LRA treating an unspec_volatile's result as invariant,
which of course isn't correct.  This patch fixes it.

Segher, thank you for fixing this.  The patch is ok to commit.

Thanks, done.  Is this okay for backports to 8 and 7 as well?  After a
while of course.

I lost track of this.  Is it okay to backport to 8 and 7?



Yes, sure it is ok for 8 and 7 too.

Thank you, Segher.





Re: [PATCH] LRA: side_effects_p stmts' output is not invariant (PR89721)

2019-10-17 Thread Segher Boessenkool
On Fri, Mar 15, 2019 at 05:14:48PM -0500, Segher Boessenkool wrote:
> On Fri, Mar 15, 2019 at 04:25:01PM -0400, Vladimir Makarov wrote:
> > On 2019-03-15 2:30 p.m., Segher Boessenkool wrote:
> > >PR89721 shows LRA treating an unspec_volatile's result as invariant,
> > >which of course isn't correct.  This patch fixes it.
> > Segher, thank you for fixing this.  The patch is ok to commit.
> 
> Thanks, done.  Is this okay for backports to 8 and 7 as well?  After a
> while of course.

I lost track of this.  Is it okay to backport to 8 and 7?


Segher


Re: Type representation in CTF and DWARF

2019-10-17 Thread Richard Biener
On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock  wrote:
>
> On 11 Oct 2019, Indu Bhagat stated:
> > Compile with -g -gdwarf-like-ctf and use dwz -o   (using
> > dwz compiled from the master branch) on the generated binaries:
> >
> > (coreutils-0.22)
> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > ls   30616   |1136   |21098   | 26240   
> > | 0.62
> > pwd  10734   |788|10433   | 13929   
> > | 0.83
> > groups 10706 |811|10249   | 13378   
> > | 0.80
> >
> > (emacs-26.3)
> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > emacs-26.3.1 674657  |6402   |   273963   |   273910
> > | 0.33

Btw, for a fair comparison you have to remove all DW_TAG_subroutine
children as well since
CTF doesn't represent scopes or local variables at all (nor types only
used by locals).  It seems
CTF only represents function entry points.

> A side note here: the sizes given above are uncompressed sizes, but in
> the real world CTF is almost always compressed: the threshold for
> compression is in theory customizable but at the moment is hardwired at
> 4KiB-uncompressed in the linker. I usually see compression ratios of
> roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
> /usr/lib/libgtk-3.so.0.2404.3, and got these sizes:
>
> .text: 3317489
> DWARF: 8589254
> Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264
> .ctf section size: 213839
>
> Note that this is not only in the absence of CTF strtab sharing with the
> ELF dynstrtab, but also using a less effective compressor: currently we
> use gzip, but I expect to transition to lzma iff available at binutils
> build time (which it usually is), perhaps as an option (on by default)
> to allow interoperability with binutils that don't have lzma available.
> Obviously better compressors will save even more space.
>
> It may help that CTF is designed for good compressibility: we try to
> minimize the number of unique symbols if we can do so without impairing
> other properties, e.g. by avoiding encoding IDs of objects when we can
> instead rely on the consumer to compute them at read time by walking
> through the relevant data structures and counting.
>
> A few benchamrks indicate that compression by default also saves time
> both at compression and decompression time.
>
> (Within a week I should be able to repeat this with an ld capable of CTF
> deduplication rather than kludging it with a deduplicator meant for a
> quite different job. I expect the sizes above to improve. In fact if
> they *don't* improve I will take this as strong evidence that my
> deduplicator is buggy.)
>
>
> FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but
> with deduplication: it's bigger than I'd like at around 10% of .text
> size, but still much less than 1% of binary size (my goal is 1--2% of
> .text, but Emacs is a nice tricky case, like Gtk, with lots of big types
> and structures with long member names):
>
> section  size  addr
> .interp28   4194872
> .note.ABI-tag  32   4194900
> .note.gnu.build-id 36   4194932
> .gnu.hash 628   4194968
> .dynsym 24432   4195600
> .dynstr 16934   4220032
> .gnu.version 2036   4236966
> .gnu.version_r704   4239008
> .rela.data.rel.ro  72   4239712
> .rela.data168   4239784
> .rela.got  48   4239952
> .rela,bss 336   424
> .rela.plt   23448   4240336
> .init  23   4263784
> .plt15648   4263808
> .text 1912622   4279456
> .fini   9   6192080
> .rodata165416   6192096
> .eh_frame_hdr   36196   6357512
> .eh_frame  210976   6393712
> .init_array 8   6609328
> .fini_array 8   6609336
> .data.rel.ro 4569   6609344
> .dynamic 1104   6613920
> .got   16   6615024
> .got.plt 7840   6615040
> .data 3276077   6622880
> ,bss 34153472   9899008
> .comment   26 0
> .gnu_debuglink 24 0
> .comment   26 0
> .debug_aranges   1536 0
> .debug_info   3912261 0
> .debug_abbrev   38821 0
> .debug_line408063 0
> .debug_str 117631 0
> .debug_loc 954538 0
> .debug_ranges  149590 0
> .ctf   213839 0
> .ctf (uncompressed)713264 0
>
> (obviously, manually edited a bit, size -A 

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-17 Thread Jason Merrill

On 10/17/19 11:30 AM, Paolo Carlini wrote:

.. hi again.

We have an issue with this idea which I didn't notice earlier today: 
there are some libstdc++ *_neg testcases which rely on permerrors being 
emitted for code in library headers. For example 
20_util/ratio/cons/cons_overflow_neg.cc relies on a permerror for the 
"overflow in constant expression" in constexpr.c. Or, 
20_util/variant/visit_neg.cc relies on the permerror for "invalid 
conversion from .. to .." emitted by convert_like_real.


Something seems a little fishy here but I'm not sure which way we want 
to go: I don't think we want to audit that right here, right now all the 
permerrors in the front-end which could potentially be involved in this 
kind of issue. Even if we, say, promote to plain errors the above two, 
that seems brittle, we got many permerrors which in principle should be 
real errors.


Hmm, true.  So your patch from yesterday is OK.

Jason



Re: [PATCH] Communicate lto-wrapper and ld through a file

2019-10-17 Thread Giuliano Belinassi
Hi

On 10/17, Richard Biener wrote:
> On Wed, Oct 16, 2019 at 7:46 PM Giuliano Belinassi
>  wrote:
> >
> > Hi,
> >
> > Previously, the lto-wrapper communicates with ld by creating a pipe from
> > lto-wrapper's stdout to ld's stdin. This patch uses a temporary file for
> > this communication, releasing stdout to be used for debugging.
> 
> I'm not sure it is a good idea on its own.

The issue is that debugging a gcc python plugin in LTO mode using pdb, the pdb
output will be redirected to ld. As a consequence, no messages from pdb will be
displayed and lto will crash, as we discussed in IRC.

Sure, one can use another python debugger such as wdb, as suggested in IRC, but
it seems really fragile that python stdout gets mixed with lto1 stdout.

The point of this patch is to fix this issue, since using stdout seems fragile
based on this information, as discussed in IRC.

> Then you have to consider that
> the lto-plugin is used to drive different GCC versions (and thus lto-wrappers)
> and you are breaking compatibility with older versions which makes it
> really not an option.

I didn't know about that, but I could address that considering that there are
two cases:

(1) lto-plugin  is outdated
(2) lto-wrapper is outdated

In (1), we have no issue at all because lto-wrapper will be launched without
the --to_ld_list and thus lto-wrapper will fallback to stdout, even if
lto-wrapper is patched.  If it is not, since it won't receive this parameter at
all, everything will work as before.

In (2), we have the problem that if lto-wrapper is present, it won't understand
the --to_ld_list parameter and it will crash. Therefore I could patch
lto-wrapper to have the following behaviour:

lto-wrapper will accept an extra parameter, say `--support-file`. If lto-plugin
launches lto-wrapper with this arguments and it returns success, then it is
known that lto-wrapper will accept the temporary file to output the file names
to ld. If it fails, then it is known that lto-wrapper is unpatched and
lto-plugin should proceed the old way.

However, I would like to know if this is a suitable solution, or if won't be
accepted for other reasons.

> 
> There's stderr for debugging...

One could do `sys.stdout = sys.stderr` in the python code, but it doesn't seems
to be an nice solution...

Giuliano.

> 
> > I've run a full testsuite and bootstrapped LTO in a linux x86_64, and found
> > no issues so far. Do I need to write a testcase for this feature?
> >
> > Giuliano.
> >
> > gcc/ChangeLog
> > 2019-10-16  Giuliano Belinassi  
> >
> > * lto-wrapper.c (STATIC_LEN): New macro.
> > (to_ld): New.
> > (find_crtofftable): Print to file to_ld.
> > (find_ld_list_file): New.
> > (main): Check if to_ld is valid or is stdout.
> >
> > gcc/libiberty
> > 2019-10-16  Giuliano Belinassi  
> >
> > * pex-unix.c (pex_unix_exec_child): check for PEX_KEEP_STD_IO flag.
> > (to_ld): New.
> >
> > gcc/include
> > 19-10-16  Giuliano Belinassi  
> >
> > * libiberty.h (PEX_KEEP_STD_IO): New macro.
> >
> > gcc/lto-plugin
> > 2019-10-16  Giuliano Belinassi  
> >
> > * lto-plugin.c (exec_lto_wrapper): Replace pipe from stdout to temporary
> > file, and pass its name in argv.
> >


Re: Type representation in CTF and DWARF

2019-10-17 Thread Nick Alcock
On 11 Oct 2019, Indu Bhagat stated:
> Compile with -g -gdwarf-like-ctf and use dwz -o   (using
> dwz compiled from the master branch) on the generated binaries:
>
> (coreutils-0.22)
>  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> ls   30616   |1136   |21098   | 26240 
>   | 0.62
> pwd  10734   |788|10433   | 13929 
>   | 0.83
> groups 10706 |811|10249   | 13378 
>   | 0.80
>
> (emacs-26.3)
>  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> emacs-26.3.1 674657  |6402   |   273963   |   273910  
>   | 0.33

A side note here: the sizes given above are uncompressed sizes, but in
the real world CTF is almost always compressed: the threshold for
compression is in theory customizable but at the moment is hardwired at
4KiB-uncompressed in the linker. I usually see compression ratios of
roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
/usr/lib/libgtk-3.so.0.2404.3, and got these sizes:

.text: 3317489
DWARF: 8589254
Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264
.ctf section size: 213839

Note that this is not only in the absence of CTF strtab sharing with the
ELF dynstrtab, but also using a less effective compressor: currently we
use gzip, but I expect to transition to lzma iff available at binutils
build time (which it usually is), perhaps as an option (on by default)
to allow interoperability with binutils that don't have lzma available.
Obviously better compressors will save even more space.

It may help that CTF is designed for good compressibility: we try to
minimize the number of unique symbols if we can do so without impairing
other properties, e.g. by avoiding encoding IDs of objects when we can
instead rely on the consumer to compute them at read time by walking
through the relevant data structures and counting.

A few benchamrks indicate that compression by default also saves time
both at compression and decompression time. 

(Within a week I should be able to repeat this with an ld capable of CTF
deduplication rather than kludging it with a deduplicator meant for a
quite different job. I expect the sizes above to improve. In fact if
they *don't* improve I will take this as strong evidence that my
deduplicator is buggy.)


FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but
with deduplication: it's bigger than I'd like at around 10% of .text
size, but still much less than 1% of binary size (my goal is 1--2% of
.text, but Emacs is a nice tricky case, like Gtk, with lots of big types
and structures with long member names):

section  size  addr
.interp28   4194872
.note.ABI-tag  32   4194900
.note.gnu.build-id 36   4194932
.gnu.hash 628   4194968
.dynsym 24432   4195600
.dynstr 16934   4220032
.gnu.version 2036   4236966
.gnu.version_r704   4239008
.rela.data.rel.ro  72   4239712
.rela.data168   4239784
.rela.got  48   4239952
.rela,bss 336   424
.rela.plt   23448   4240336
.init  23   4263784
.plt15648   4263808
.text 1912622   4279456
.fini   9   6192080
.rodata165416   6192096
.eh_frame_hdr   36196   6357512
.eh_frame  210976   6393712
.init_array 8   6609328
.fini_array 8   6609336
.data.rel.ro 4569   6609344
.dynamic 1104   6613920
.got   16   6615024
.got.plt 7840   6615040
.data 3276077   6622880
,bss 34153472   9899008
.comment   26 0
.gnu_debuglink 24 0
.comment   26 0
.debug_aranges   1536 0
.debug_info   3912261 0
.debug_abbrev   38821 0
.debug_line408063 0
.debug_str 117631 0
.debug_loc 954538 0
.debug_ranges  149590 0
.ctf   213839 0
.ctf (uncompressed)713264 0

(obviously, manually edited a bit, size -A doesn't produce the last line
on its own!)

(I'm not sure what the hell is going on with the weirdly-named ,bss
section. Probably something to do with unexec().)


Re: [Patch, fortran] PR fortran/92142 - CFI_setpointer corrupts descriptor

2019-10-17 Thread Tobias Burnus

Hi,

+  fprintf (stderr, "CFI_setpointer: Result is NULL.\n");
…

+ return CFI_INVALID_DESCRIPTOR;
+! { dg-do run }
+! { dg-additional-options "-fbounds-check" }
+! { dg-additional-sources ISO_Fortran_binding_15.c }



If you generate to stdout/stderr like in this case, I think it makes 
sense to also check for this output using "{dg-output …}".


Otherwise, it looks okay at a glance – but I defer the proper review to 
either someone else or to later.


Another question would be: Is it always guaranteed that 
result->attribute  is set? I am asking because it resembles to the 
untrained eye the code at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92027


And there, the result attribute is unset – that might be a bug in the C 
code of the test itself – or in libgomp. But it doesn't harm to quickly 
think about whether that can be an issue here as well or not.


Cheers,

Tobias



[PATCH] Cleanup more redundant stuff in reduction vect

2019-10-17 Thread Richard Biener


Bootstrapped and tested on x86_64-unknwon-linux-gnu, applied.

Richard.

2019-10-17  Richard Biener  

* tree-vectorizer.h (_stmt_vec_info::cond_reduc_code): Remove.
(STMT_VINFO_VEC_COND_REDUC_CODE): Likewise.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Do not
initialize STMT_VINFO_VEC_COND_REDUC_CODE.
* tree-vect-loop.c (vect_is_simple_reduction): Set
STMT_VINFO_REDUC_CODE.
(vectorizable_reduction): Remove dead and redundant code, use
STMT_VINFO_REDUC_CODE instead of STMT_VINFO_VEC_COND_REDUC_CODE.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 6cad0fd08e1..84175362490 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2872,6 +2872,7 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   if (check_reduction_path (vect_location, loop, phi, latch_def, ,
path))
 {
+  STMT_VINFO_REDUC_CODE (phi_info) = code;
   if (code == COND_EXPR && !nested_in_vect_loop)
STMT_VINFO_REDUC_TYPE (phi_info) = COND_REDUCTION;
 
@@ -5550,17 +5551,13 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
   tree vectype_in = NULL_TREE;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum tree_code code;
-  int op_type;
-  enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
+  enum vect_def_type cond_reduc_dt = vect_unknown_def_type;
   stmt_vec_info cond_stmt_vinfo = NULL;
   tree scalar_type;
   int i;
   int ncopies;
   bool single_defuse_cycle = false;
-  tree ops[3];
-  enum vect_def_type dts[3];
-  bool nested_cycle = false, found_nested_cycle_def = false;
+  bool nested_cycle = false;
   bool double_reduc = false;
   int vec_num;
   tree tem;
@@ -5667,25 +5664,10 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
 which is defined by the loop-header-phi.  */
 
   gassign *stmt = as_a  (stmt_info->stmt);
-
-  /* Flatten RHS.  */
   switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
 {
 case GIMPLE_BINARY_RHS:
-  code = gimple_assign_rhs_code (stmt);
-  op_type = TREE_CODE_LENGTH (code);
-  gcc_assert (op_type == binary_op);
-  ops[0] = gimple_assign_rhs1 (stmt);
-  ops[1] = gimple_assign_rhs2 (stmt);
-  break;
-
 case GIMPLE_TERNARY_RHS:
-  code = gimple_assign_rhs_code (stmt);
-  op_type = TREE_CODE_LENGTH (code);
-  gcc_assert (op_type == ternary_op);
-  ops[0] = gimple_assign_rhs1 (stmt);
-  ops[1] = gimple_assign_rhs2 (stmt);
-  ops[2] = gimple_assign_rhs3 (stmt);
   break;
 
 case GIMPLE_UNARY_RHS:
@@ -5695,9 +5677,8 @@ vectorizable_reduction (stmt_vec_info stmt_info, slp_tree 
slp_node,
 default:
   gcc_unreachable ();
 }
-
-  if (code == COND_EXPR && slp_node)
-return false;
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  int op_type = TREE_CODE_LENGTH (code);
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -5721,12 +5702,14 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
   int reduc_index = -1;
   for (i = 0; i < op_type; i++)
 {
+  tree op = gimple_op (stmt, i + 1);
   /* The condition of COND_EXPR is checked in vectorizable_condition().  */
   if (i == 0 && code == COND_EXPR)
 continue;
 
   stmt_vec_info def_stmt_info;
-  if (!vect_is_simple_use (ops[i], loop_vinfo, [i], ,
+  enum vect_def_type dt;
+  if (!vect_is_simple_use (op, loop_vinfo, , ,
   _stmt_info))
{
  if (dump_enabled_p ())
@@ -5734,36 +5717,25 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
 "use not simple.\n");
  return false;
}
-  dt = dts[i];
-  if (dt == vect_reduction_def
- && ops[i] == reduc_def)
+  if ((dt == vect_reduction_def || dt == vect_nested_cycle)
+ && op == reduc_def)
{
  reduc_index = i;
  continue;
}
-  else if (tem)
-   {
- /* To properly compute ncopies we are interested in the widest
-input type in case we're looking at a widening accumulation.  */
- if (!vectype_in
- || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
- < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (tem)
-   vectype_in = tem;
-   }
 
-  if (dt != vect_internal_def
- && dt != vect_external_def
- && dt != vect_constant_def
- && dt != vect_induction_def
-  && !(dt == vect_nested_cycle && nested_cycle))
+  /* There should be only one cycle def in the stmt, the one
+ leading to reduc_def.  */
+  if (VECTORIZABLE_CYCLE_DEF (dt))
return false;
 
-  if (dt == vect_nested_cycle
- && ops[i] == reduc_def)
-   {
- 

Re: [arm] PR target/89400 fix thumb1 unaligned access expansion

2019-10-17 Thread Richard Earnshaw (lists)

On 03/05/2019 14:47, Richard Earnshaw (lists) wrote:


Armv6 has support for unaligned accesses to memory.  However, the
thumb1 code patterns were trying to use the 32-bit code constraints.
One failure mode from this was that the patterns are designed to be
compatible with conditional execution and this was then causing an
assert in the compiler.

The unaligned_loadhis pattern is only used for expanding extv, which
in turn is only enabled for systems supporting thumb2.  Given that
there is no simple expansion for a thumb1 sign-extending load (the
instruction has no immediate offset form and requires two registers in
the address) it seems simpler to just disable this for thumb1.

Fixed thusly:

PR target/89400
* config/arm/arm.md (unaligned_loadsi): Add variant for thumb1.
Restrict 'all' variant to 32-bit configurations.
(unaligned_loadhiu): Likewise.
(unaligned_storehi): Likewise.
(unaligned_storesi): Likewise.
(unaligned_loadhis): Disable when compiling for thumb1.



I've now backported this to the gcc-7 -8 and -9 branches.  The patch is 
identical for -8 and -9 but needs some minor context tweaks for gcc-7, 
so attaching the copy that was applied there.


R.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0839f9ddf11..057b25deb4e 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4486,62 +4486,78 @@ (define_expand "extv_regsi"
 ; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
 
 (define_insn "unaligned_loadsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
-	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
+  [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m,Uw,m")]
 		   UNSPEC_UNALIGNED_LOAD))]
   "unaligned_access"
-  "ldr%?\t%0, %1\t@ unaligned"
-  [(set_attr "arch" "t2,any")
-   (set_attr "length" "2,4")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no")
+  "@
+   ldr\t%0, %1\t@ unaligned
+   ldr%?\t%0, %1\t@ unaligned
+   ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t1,t2,32")
+   (set_attr "length" "2,2,4")
+   (set_attr "predicable" "no,yes,yes")
+   (set_attr "predicable_short_it" "no,yes,no")
(set_attr "type" "load1")])
 
+;; The 16-bit Thumb1 variant of ldrsh requires two registers in the
+;; address (there's no immediate format).  That's tricky to support
+;; here and we don't really need this pattern for that case, so only
+;; enable for 32-bit ISAs.
 (define_insn "unaligned_loadhis"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extend:SI
 	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uh")]
 		 UNSPEC_UNALIGNED_LOAD)))]
-  "unaligned_access"
+  "unaligned_access && TARGET_32BIT"
   "ldrsh%?\t%0, %1\t@ unaligned"
   [(set_attr "predicable" "yes")
(set_attr "type" "load_byte")])
 
 (define_insn "unaligned_loadhiu"
-  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+  [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
 	(zero_extend:SI
-	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "m,Uw,m")]
 		 UNSPEC_UNALIGNED_LOAD)))]
   "unaligned_access"
-  "ldrh%?\t%0, %1\t@ unaligned"
-  [(set_attr "arch" "t2,any")
-   (set_attr "length" "2,4")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no")
+  "@
+   ldrh\t%0, %1\t@ unaligned
+   ldrh%?\t%0, %1\t@ unaligned
+   ldrh%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t1,t2,32")
+   (set_attr "length" "2,2,4")
+   (set_attr "predicable" "no,yes,yes")
+   (set_attr "predicable_short_it" "no,yes,no")
(set_attr "type" "load_byte")])
 
 (define_insn "unaligned_storesi"
-  [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
-	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
+  [(set (match_operand:SI 0 "memory_operand" "=m,Uw,m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,l,r")]
 		   UNSPEC_UNALIGNED_STORE))]
   "unaligned_access"
-  "str%?\t%1, %0\t@ unaligned"
-  [(set_attr "arch" "t2,any")
-   (set_attr "length" "2,4")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "yes,no")
+  "@
+   str\t%1, %0\t@ unaligned
+   str%?\t%1, %0\t@ unaligned
+   str%?\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t1,t2,32")
+   (set_attr "length" "2,2,4")
+   (set_attr "predicable" "no,yes,yes")
+   (set_attr "predicable_short_it" "no,yes,no")
(set_attr "type" "store1")])
 
 (define_insn "unaligned_storehi"
-  [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
-	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
+  [(set (match_operand:HI 0 "memory_operand" "=m,Uw,m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,l,r")]
 		   UNSPEC_UNALIGNED_STORE))]
   "unaligned_access"
-  "strh%?\t%1, %0\t@ unaligned"
-  [(set_attr "arch" "t2,any")
-   (set_attr "length" "2,4")
-   (set_attr "predicable" "yes")
-   (set_attr 

[Patch, fortran] PR fortran/92142 - CFI_setpointer corrupts descriptor

2019-10-17 Thread José Rui Faustino de Sousa

Hi all!

Proposed patch to solve the handling of the attribute value in the 
descriptor.


Patch tested only on x86_64-pc-linux-gnu.

CFI_setpointer does not check if it is setting a pointer and will set 
any type of object to the target.


CFI_setpointer will also change the pointer attribute of the pointer to 
whatever is the target's attribute corrupting the descriptor.


Thank you very much.

Best regards,
José Rui

2019-10-17  José Rui Faustino de Sousa  

 PR fortran/92142
 * ISO_Fortran_binding.c (CFI_setpointer): Add check to verify if the
 object being set (result) is really a pointer. Remove two instances
 where the result attribute value is overwritten.

2019-10-17  José Rui Faustino de Sousa  

 PR fortran/92142
 * ISO_Fortran_binding_15.f90: New test.
 * ISO_Fortran_binding_15.c: Additional source.

Index: libgfortran/runtime/ISO_Fortran_binding.c
===
--- libgfortran/runtime/ISO_Fortran_binding.c   (revision 276937)
+++ libgfortran/runtime/ISO_Fortran_binding.c   (working copy)
@@ -795,13 +795,21 @@
 int CFI_setpointer (CFI_cdesc_t *result, CFI_cdesc_t *source,
const CFI_index_t lower_bounds[])
 {
-  /* Result must not be NULL. */
-  if (unlikely (compile_options.bounds_check) && result == NULL)
+  /* Result must not be NULL and must be a Fortran pointer. */
+  if (unlikely (compile_options.bounds_check))
 {
-  fprintf (stderr, "CFI_setpointer: Result is NULL.\n");
-  return CFI_INVALID_DESCRIPTOR;
+  if (result == NULL)
+   {
+ fprintf (stderr, "CFI_setpointer: Result is NULL.\n");
+ return CFI_INVALID_DESCRIPTOR;
+   }
+
+  if (result->attribute != CFI_attribute_pointer)
+   {
+ fprintf (stderr, "CFI_setpointer: Result is not a Fortran 
pointer.\n");
+ return CFI_INVALID_ATTRIBUTE;
+   }
 }
-
   /* If source is NULL, the result is a C Descriptor that describes a
* disassociated pointer. */
   if (source == NULL)
@@ -808,7 +816,6 @@
 {
   result->base_addr = NULL;
   result->version  = CFI_VERSION;
-  result->attribute = CFI_attribute_pointer;
 }
   else
 {
@@ -852,7 +859,6 @@

   /* Assign components to result. */
   result->version = source->version;
-  result->attribute = source->attribute;

   /* Dimension information. */
   for (int i = 0; i < source->rank; i++)
Index: gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c
===
--- gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c  (nonexistent)
+++ gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.c  (working copy)
@@ -0,0 +1,41 @@
+/* Test the fix of .  */
+
+/* #include "../../../libgfortran/ISO_Fortran_binding.h" */
+#include "ISO_Fortran_binding.h"
+
+#include 
+
+int c_setpointer(CFI_cdesc_t *);
+
+int c_setpointer(CFI_cdesc_t *ip)
+{
+  CFI_cdesc_t *yp = NULL;
+  void *auxp = ip->base_addr;
+  int ierr;
+  int status;
+
+  /* Setting up the pointer */
+  ierr = 1;
+  yp = malloc(sizeof(*ip));
+  if (yp == NULL) return ierr;
+  status = CFI_establish(yp, NULL, CFI_attribute_pointer, ip->type, 
ip->elem_len, ip->rank, NULL);

+  if (status != CFI_SUCCESS) return ierr;
+  if (yp->attribute != CFI_attribute_pointer) return ierr;
+  /* Set the pointer to ip */
+  ierr = 2;
+  status = CFI_setpointer(yp, ip, NULL);
+  if (status != CFI_SUCCESS) return ierr;
+  if (yp->attribute != CFI_attribute_pointer) return ierr;
+  /* Set the pointer to NULL */
+  ierr = 3;
+  status = CFI_setpointer(yp, NULL, NULL);
+  if (status != CFI_SUCCESS) return ierr;
+  if (yp->attribute != CFI_attribute_pointer) return ierr;
+  /* "Set" the ip variable to yp (should not be possible) */
+  ierr = 4;
+  status = CFI_setpointer(ip, yp, NULL);
+  if (status != CFI_INVALID_ATTRIBUTE) return ierr;
+  if (ip->attribute != CFI_attribute_other) return ierr;
+  if (ip->base_addr != auxp) return ierr;
+  return 0;
+}
Index: gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.f90
===
--- gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.f90(nonexistent)
+++ gcc/testsuite/gfortran.dg/ISO_Fortran_binding_15.f90(working copy)
@@ -0,0 +1,22 @@
+! { dg-do run }
+! { dg-additional-options "-fbounds-check" }
+! { dg-additional-sources ISO_Fortran_binding_15.c }
+!
+!
+  use, intrinsic :: iso_c_binding, only: c_int
+
+  implicit none
+
+  interface
+function c_setpointer(ip) result(ierr) bind(c)
+  use, intrinsic :: iso_c_binding, only: c_int
+  type(*), dimension(..), target :: ip
+  integer(c_int) :: ierr
+end function c_setpointer
+  end interface
+
+  integer(c_int) :: it = 1
+
+  if (c_setpointer(it) /= 0) stop 1
+
+end




[PING] [WIP PATCH] add object access attributes (PR 83859)

2019-10-17 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01690.html

Other than the suggestions I got for optimization (for GCC 11)
and additional buffer overflow detection for [static] arrays),
is there any feedback on the patch itself?  Jeff?

Martin

On 9/29/19 1:51 PM, Martin Sebor wrote:

-Wstringop-overflow detects a subset of past-the-end read and write
accesses by built-in functions such as memcpy and strcpy.  It relies
on the functions' effects the knowledge of which is hardwired into
GCC.  Although it's possible for users to create wrappers for their
own functions to detect similar problems, it's quite cumbersome and
so only lightly used outside system libraries like Glibc.  Even Glibc
only checks for buffer overflow and not for reading past the end.

PR 83859 asks to expose the same checking that GCC does natively for
built-in calls via a function attribute that associates a pointer
argument with the size argument, such as:

   __attribute__((buffer_size (1, 2))) void
   f (char* dst, size_t dstsize);

The attached patch is my initial stab at providing this feature by
introducing three new attributes:

   * read_only (ptr-argno, size-argno)
   * read_only (ptr-argno, size-argno)
   * read_write (ptr-argno, size-argno)

As requested, the attributes associate a pointer parameter to
a function with a size parameter.  In addition, they also specify
how the function accesses the object the pointer points to: either
it only reads from it, or it only writes to it, or it does both.

Besides enabling the same buffer overflow detection as for built-in
string functions they also let GCC issue -Wuninitialized warnings
for uninitialized objects passed to read-only functions by reference,
and -Wunused-but-set warnings for objects passed to write-only
functions that are otherwise unused (PR 80806).  The -Wununitialized
part is done. The -Wunused-but-set detection is implemented only in
the C FE and not yet in C++.

Besides the diagnostic improvements above the attributes also open
up optimization opportunities such as DCE.  I'm still working on this
and so it's not yet part of the initial patch.

I plan to finish the patch for GCC 10 but I don't expect to have
the time to start taking advantage of the attributes for optimization
until GCC 11.

Besides regression testing on x86_64-linux, I also tested the patch
by compiling Binutils/GDB, Glibc, and the Linux kernel with it.  It
found no new problems but caused a handful of -Wunused-but-set-variable 
false positives due to an outstanding bug in the C front-end introduced

by the patch that I still need to fix.

Martin




[AArch64][SVE2] Fix for r277110 (BSL variants)

2019-10-17 Thread Yuliang Wang
Hi,

SVE2 vectorization for BSL and NBSL fails when the element type is unsigned 
8/16-bit.

The operands are being converted implicitly to corresponding signed types, 
which the mid-end fold pattern does not take into account; this patch augments 
the pattern with type conversion checks in order to rectify the above problem.

#define TYPE uint{8,16}_t

void
foo (TYPE *a, TYPE *b, TYPE *c, TYPE *d, int n)
{
  for (int i = 0; i < n; i++)
a[i] = OP (b[i], c[i], d[i]);
}

BSL:

 // #define OP(x,y,z) (((x) & (z)) | ((y) & ~(z)))

  beforeand z1.d, z2.d, z1.d
bic z0.d, z0.d, z2.d
orr z0.d, z0.d, z1.d
  ...
  after bsl z0.d, z0.d, z1.d, z2.d

NBSL:

  // #define OP(x,y,z) ~(((x) & (z)) | ((y) & ~(z)))

  beforeand z1.d, z2.d, z1.d
bic z0.d, z0.d, z2.d
orr z0.d, z0.d, z1.d
not z0.{b,h}, p1/m, z0.{b,h}
  ...
  after nbslz0.d, z0.d, z1.d, z2.d

The GIMPLE output for BSL shows where conversions could be inserted:

_1 = b[i];
_2 = d[i];
_3 = _1 & _2;
_4 = (signed short) _3;
_5 = c[i];
_6 = (signed short) _5;
_7 = d[i];
_8 = (signed short) _7;
_9 = ~_8;
_10 = _6 & _9;
_11 = _4 | _10;
_12 = (short unsigned int) _11;
a[i] = _12;

In contrast, for 32/64-bit types (regardless of signedness):

_1 = b[i];
_2 = d[i];
_3 = _1 & _2;
_4 = c[i];
_5 = d[i];
_6 = ~_5;
_7 = _4 & _6;
_8 = _3 | _7;
_9 = ~_8;
a[i] = _9;

Built and tested on aarch64-none-elf.

Regards,
Yuliang Wang


gcc/ChangeLog:

2019-10-17  Yuliang Wang  

* match.pd (/* (x & ~m) | (y & m) -> ... */): Modified fold pattern.
* genmatch.c (convert3): New convert operation to support the above.

gcc/testsuite/ChangeLog:

2019-10-17  Yuliang Wang  

* gcc.target/aarch64/sve2/bitsel_1.c: Add testing for unsigned types.
* gcc.target/aarch64/sve2/bitsel_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_3.c: As above.
* gcc.target/aarch64/sve2/bitsel_4.c: As above.
* gcc.target/aarch64/sve2/eor3_1.c: As above.


diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 
7db1f135840e09e794e2921859fa8e9b7fa8..ce87ae33e0b3c06f4d1fde8d8e74bf2210ee7a5a
 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -227,6 +227,7 @@ enum tree_code {
 CONVERT0,
 CONVERT1,
 CONVERT2,
+CONVERT3,
 VIEW_CONVERT0,
 VIEW_CONVERT1,
 VIEW_CONVERT2,
@@ -1176,6 +1177,7 @@ lower_opt_convert (operand *o)
 = { CONVERT0, CONVERT_EXPR,
CONVERT1, CONVERT_EXPR,
CONVERT2, CONVERT_EXPR,
+   CONVERT3, CONVERT_EXPR,
VIEW_CONVERT0, VIEW_CONVERT_EXPR,
VIEW_CONVERT1, VIEW_CONVERT_EXPR,
VIEW_CONVERT2, VIEW_CONVERT_EXPR };
@@ -4145,8 +4147,8 @@ parser::record_operlist (location_t loc, user_id *p)
 }
 }
 
-/* Parse the operator ID, special-casing convert?, convert1? and
-   convert2?  */
+/* Parse the operator ID, special-casing convert?, convert1?, convert2? and
+   convert3?  */
 
 id_base *
 parser::parse_operation ()
@@ -4167,6 +4169,8 @@ parser::parse_operation ()
;
   else if (strcmp (id, "convert2") == 0)
;
+  else if (strcmp (id, "convert3") == 0)
+   ;
   else if (strcmp (id, "view_convert") == 0)
id = "view_convert0";
   else if (strcmp (id, "view_convert1") == 0)
@@ -4183,6 +4187,7 @@ parser::parse_operation ()
 }
   else if (strcmp (id, "convert1") == 0
   || strcmp (id, "convert2") == 0
+  || strcmp (id, "convert3") == 0
   || strcmp (id, "view_convert1") == 0
   || strcmp (id, "view_convert2") == 0)
 fatal_at (id_tok, "expected '?' after conditional operator");
@@ -4723,9 +4728,9 @@ parser::parse_for (location_t)
  id_base *idb = get_operator (oper, true);
  if (idb == NULL)
fatal_at (token, "no such operator '%s'", oper);
- if (*idb == CONVERT0 || *idb == CONVERT1 || *idb == CONVERT2
- || *idb == VIEW_CONVERT0 || *idb == VIEW_CONVERT1
- || *idb == VIEW_CONVERT2)
+ if (*idb == CONVERT0 || *idb == VIEW_CONVERT0
+ || *idb == CONVERT1 || *idb == CONVERT2|| *idb == CONVERT3
+ || *idb == VIEW_CONVERT1 || *idb == VIEW_CONVERT2)
fatal_at (token, "conditional operators cannot be used inside for");
 
  if (arity == -1)
@@ -5136,6 +5141,7 @@ main (int argc, char **argv)
 add_operator (CONVERT0, "convert0", "tcc_unary", 1);
 add_operator (CONVERT1, "convert1", "tcc_unary", 1);
 add_operator (CONVERT2, "convert2", "tcc_unary", 1);
+add_operator (CONVERT3, "convert3", "tcc_unary", 1);
 add_operator (VIEW_CONVERT0, "view_convert0", "tcc_unary", 1);
 add_operator (VIEW_CONVERT1, "view_convert1", "tcc_unary", 1);
 add_operator (VIEW_CONVERT2, "view_convert2", "tcc_unary", 1);
diff --git a/gcc/match.pd b/gcc/match.pd
index 
e3ac06c8ef5b893bd344734095b11047a43f98b8..0aa065c2941dd79477434fd3b6691c9a9b68d20c
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1461,8 +1461,13 @@ 

Re: [PATCH] [x86] Add detection of Icelake Client and Server

2019-10-17 Thread Thiago Macieira
On Wednesday, 16 October 2019 23:51:30 PDT Uros Bizjak wrote:
> > gcc/ChangeLog:
> > * config/i386/driver-i386.c (host_detect_local_cpu): Handle
> > 
> >   icelake-client and icelake-server.
> > 
> > * testsuite/gcc.target/i386/builtin_target.c 
(check_intel_cpu_model):
> >   Verify icelakes are detected correctly.
> > 
> > libgcc/ChangeLog:
> > * config/i386/cpuinfo.c (get_intel_cpu): Handle icelake-client
> > 
> >   and icelake-server.
> 
> Please also state how you bootstrapped and tested the patch.

I didn't personally, but libgcc thus built was tested by a colleague who has 
access to engineering samples of some of those machines and they confirm 
icelake is properly detected. We didn't test gcc's -march=native detection.

The numbers come from the Linux kernel header:
https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/intel-family.h#L77-L81
which in turn come from the Intel SDM, vol. 4,. The May 2019 edition has the 
most up-to-date values.

LLVM's compiler-rt equivalent:
https://github.com/llvm-mirror/compiler-rt/commit/
787bbab3e844b25bd3f8f282c6d3c8b3ad892fb4
https://github.com/llvm-mirror/compiler-rt/commit/
7a65a376f3ae2d770797eb87b7556a3689a6177a

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products





[PATCH] Process new C++17 and C++20 headers with Doxygen

2019-10-17 Thread Jonathan Wakely

This won't do anything by default, because __cplusplus is set to 201402L
when Doxygen runs. If/when that changes, these headers should be
processed.

* doc/doxygen/user.cfg.in (INPUT): Add new C++17 and C++20 headers.

Committed to trunk.


commit 14b2576d8c0ea76ccbb4cb90a1d30aea4993a03d
Author: redi 
Date:   Thu Oct 17 15:40:04 2019 +

Process new C++17 and C++20 headers with Doxygen

This won't do anything by default, because __cplusplus is set to 201402L
when Doxygen runs. If/when that changes, these headers should be
processed.

* doc/doxygen/user.cfg.in (INPUT): Add new C++17 and C++20 headers.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277121 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index dc493998a1a..3c0295d99a5 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -793,14 +793,19 @@ INPUT  = 
@srcdir@/doc/doxygen/doxygroups.cc \
  @srcdir@/libsupc++/new \
  @srcdir@/libsupc++/typeinfo \
  include/algorithm \
+ include/any \
  include/array \
  include/atomic \
+ include/bit \
  include/bitset \
+ include/charconv \
  include/chrono \
- include/complex \
  include/codecvt \
+ include/complex \
+ include/concepts \
  include/condition_variable \
  include/deque \
+ include/filesystem \
  include/forward_list \
  include/fstream \
  include/functional \
@@ -816,8 +821,11 @@ INPUT  = 
@srcdir@/doc/doxygen/doxygroups.cc \
  include/locale \
  include/map \
  include/memory \
+ include/memory_resource \
  include/mutex \
  include/numeric \
+ include/numbers \
+ include/optional \
  include/ostream \
  include/queue \
  include/random \
@@ -826,11 +834,13 @@ INPUT  = 
@srcdir@/doc/doxygen/doxygroups.cc \
  include/scoped_allocator \
  include/set \
  include/shared_mutex \
+ include/span \
  include/sstream \
  include/stack \
  include/stdexcept \
  include/streambuf \
  include/string \
+ include/string_view \
  include/system_error \
  include/thread \
  include/tuple \
@@ -840,6 +850,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc 
\
  include/unordered_set \
  include/utility \
  include/valarray \
+ include/variant \
  include/vector \
  include/cassert \
  include/ccomplex \


[PATCH] Define [range.cmp] comparisons for C++20

2019-10-17 Thread Jonathan Wakely

Define std::identity, std::ranges::equal_to, std::ranges::not_equal_to,
std::ranges::greater, std::ranges::less, std::ranges::greater_equal and
std::ranges::less_equal.

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/range_cmp.h: New header for C++20 function objects.
* include/std/functional: Include new header.
* testsuite/20_util/function_objects/identity/1.cc: New test.
* testsuite/20_util/function_objects/range.cmp/equal_to.cc: New test.
* testsuite/20_util/function_objects/range.cmp/greater.cc: New test.
* testsuite/20_util/function_objects/range.cmp/greater_equal.cc: New
test.
* testsuite/20_util/function_objects/range.cmp/less.cc: New test.
* testsuite/20_util/function_objects/range.cmp/less_equal.cc: New test.
* testsuite/20_util/function_objects/range.cmp/not_equal_to.cc: New
test.

Tested powerpc64le-linux, committed to trunk.


commit b948d3f92d7bbe4d53237cb20ff40a15fa123988
Author: Jonathan Wakely 
Date:   Thu Oct 17 15:20:38 2019 +0100

Define [range.cmp] comparisons for C++20

Define std::identity, std::ranges::equal_to, std::ranges::not_equal_to,
std::ranges::greater, std::ranges::less, std::ranges::greater_equal and
std::ranges::less_equal.

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/range_cmp.h: New header for C++20 function objects.
* include/std/functional: Include new header.
* testsuite/20_util/function_objects/identity/1.cc: New test.
* testsuite/20_util/function_objects/range.cmp/equal_to.cc: New 
test.
* testsuite/20_util/function_objects/range.cmp/greater.cc: New test.
* testsuite/20_util/function_objects/range.cmp/greater_equal.cc: New
test.
* testsuite/20_util/function_objects/range.cmp/less.cc: New test.
* testsuite/20_util/function_objects/range.cmp/less_equal.cc: New 
test.
* testsuite/20_util/function_objects/range.cmp/not_equal_to.cc: New
test.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 35ee3cfcd34..9ff12f10fb1 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -152,6 +152,7 @@ bits_headers = \
${bits_srcdir}/random.h \
${bits_srcdir}/random.tcc \
${bits_srcdir}/range_access.h \
+   ${bits_srcdir}/range_cmp.h \
${bits_srcdir}/refwrap.h \
${bits_srcdir}/regex.h \
${bits_srcdir}/regex.tcc \
diff --git a/libstdc++-v3/include/bits/range_cmp.h 
b/libstdc++-v3/include/bits/range_cmp.h
new file mode 100644
index 000..3e5bb8847ab
--- /dev/null
+++ b/libstdc++-v3/include/bits/range_cmp.h
@@ -0,0 +1,179 @@
+// Concept-constrained comparison implementations -*- C++ -*-
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/ranges_function.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{functional}
+ */
+
+#ifndef _RANGE_CMP_H
+#define _RANGE_CMP_H 1
+
+#if __cplusplus > 201703L
+# include 
+# include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct __is_transparent; // not defined
+
+  // Define std::identity here so that  and 
+  // don't need to include  to get it.
+
+  /// [func.identity] The identity function.
+  struct identity
+  {
+template
+  constexpr _Tp&&
+  operator()(_Tp&& __t) const noexcept
+  { return std::forward<_Tp>(__t); }
+
+using is_transparent = __is_transparent;
+  };
+
+namespace ranges
+{
+  namespace __detail
+  {
+// BUILTIN-PTR-CMP(T, ==, U)
+template
+  concept __eq_builtin_ptr_cmp
+   = convertible_to<_Tp, const volatile void*>
+ && convertible_to<_Up, const volatile void*>
+   

[patch,testuite]: Fix some fllout for small targets.

2019-10-17 Thread Georg-Johann Lay
Hi, this fixes some FAILs for small targets, fixed or skipped by 
size32plus, double64[plus] etc.


Ok to apply?

Johann

Fix some fallout for small targets.

* gcc.c-torture/execute/20190820-1.c:
Add dg-require-effective-target int32plus.
* gcc.c-torture/execute/pr85331.c
Add dg-require-effective-target double64plus.
* gcc.dg/pow-sqrt-1.c: Same.
* gcc.dg/pow-sqrt-2.c: Same.
* gcc.dg/pow-sqrt-3.c: Same.
* gcc.c-torture/execute/20190901-1.c: Same.
* gcc.c-torture/execute/user-printf.c [avr]: Skip.
* gcc.c-torture/execute/fprintf-2.c [avr]: Skip.
* gcc.c-torture/execute/printf-2.c [avr]: Skip.
* gcc.dg/Wlarger-than3.c [avr]: Skip.
* gcc.c-torture/execute/ieee/20041213-1.c (sqrt)
[avr,double=float]: Provide custom prototype.
* gcc.dg/pr36017.c: Same.
* gcc.c-torture/execute/pr90025.c: Use 32-bit int.
* gcc.dg/complex-7.c: Add dg-require-effective-target double64.
* gcc.dg/loop-versioning-1.c:
Add dg-require-effective-target size32plus.
* gcc.dg/loop-versioning-2.c: Same.
Index: gcc.c-torture/execute/20190820-1.c
===
--- gcc.c-torture/execute/20190820-1.c	(revision 277097)
+++ gcc.c-torture/execute/20190820-1.c	(working copy)
@@ -1,5 +1,6 @@
 /* PR rtl-optimization/91347 */
 /* Reported by John David Anglin  */
+/* { dg-require-effective-target int32plus } */
 
 typedef unsigned short __u16;
 typedef __signed__ int __s32;
Index: gcc.c-torture/execute/20190901-1.c
===
--- gcc.c-torture/execute/20190901-1.c	(revision 277097)
+++ gcc.c-torture/execute/20190901-1.c	(working copy)
@@ -1,7 +1,12 @@
 /* PR target/91472 */
 /* Reported by John Paul Adrian Glaubitz  */
+/* { dg-require-effective-target double64plus } */
 
+#if __SIZEOF_INT__ >= 4
 typedef unsigned int gmp_uint_least32_t;
+#else
+typedef __UINT_LEAST32_TYPE__ gmp_uint_least32_t;
+#endif
 
 union ieee_double_extract
 {
Index: gcc.c-torture/execute/fprintf-2.c
===
--- gcc.c-torture/execute/fprintf-2.c	(revision 277097)
+++ gcc.c-torture/execute/fprintf-2.c	(working copy)
@@ -1,6 +1,7 @@
 /* Verify that calls to fprintf don't get eliminated even if their
result on success can be computed at compile time (they can fail).
The calls can still be transformed into those of other functions.
+   { dg-skip-if "requires io" { avr-*-* } }
{ dg-skip-if "requires io" { freestanding } } */
 
 #include 
Index: gcc.c-torture/execute/ieee/20041213-1.c
===
--- gcc.c-torture/execute/ieee/20041213-1.c	(revision 277097)
+++ gcc.c-torture/execute/ieee/20041213-1.c	(working copy)
@@ -1,4 +1,8 @@
+#if defined (__AVR__) && (__SIZEOF_DOUBLE__ == __SIZEOF_FLOAT__)
+extern double sqrt (double) __asm ("sqrtf");
+#else
 extern double sqrt (double);
+#endif
 extern void abort (void);
 int once;
 
Index: gcc.c-torture/execute/pr85331.c
===
--- gcc.c-torture/execute/pr85331.c	(revision 277097)
+++ gcc.c-torture/execute/pr85331.c	(working copy)
@@ -1,4 +1,5 @@
 /* PR tree-optimization/85331 */
+/* { dg-require-effective-target double64plus } */
 
 typedef double V __attribute__((vector_size (2 * sizeof (double;
 typedef long long W __attribute__((vector_size (2 * sizeof (long long;
Index: gcc.c-torture/execute/pr90025.c
===
--- gcc.c-torture/execute/pr90025.c	(revision 277097)
+++ gcc.c-torture/execute/pr90025.c	(working copy)
@@ -13,10 +13,10 @@ bar (char *p)
 }
 
 __attribute__((noipa)) void
-foo (unsigned int x)
+foo (__UINT32_TYPE__ x)
 {
   char s[32] = { 'f', 'o', 'o', 'b', 'a', 'r', 0 };
-  ((unsigned int *) s)[2] = __builtin_bswap32 (x);
+  ((__UINT32_TYPE__ *) s)[2] = __builtin_bswap32 (x);
   bar (s);
 }
 
Index: gcc.c-torture/execute/printf-2.c
===
--- gcc.c-torture/execute/printf-2.c	(revision 277097)
+++ gcc.c-torture/execute/printf-2.c	(working copy)
@@ -2,6 +2,7 @@
result on success can be computed at compile time (they can fail).
The calls can still be transformed into those of other functions.
{ dg-require-effective-target unwrapped }
+   { dg-skip-if "requires io" { avr-*-* } }
{ dg-skip-if "requires io" { freestanding } } */
 
 #include 
Index: gcc.c-torture/execute/user-printf.c
===
--- gcc.c-torture/execute/user-printf.c	(revision 277097)
+++ gcc.c-torture/execute/user-printf.c	(working copy)
@@ -2,6 +2,7 @@
don't get eliminated even if their result on success can be computed at
compile time (they can fail).
{ dg-require-effective-target unwrapped }
+   

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-17 Thread Paolo Carlini

.. hi again.

We have an issue with this idea which I didn't notice earlier today: 
there are some libstdc++ *_neg testcases which rely on permerrors being 
emitted for code in library headers. For example 
20_util/ratio/cons/cons_overflow_neg.cc relies on a permerror for the 
"overflow in constant expression" in constexpr.c. Or, 
20_util/variant/visit_neg.cc relies on the permerror for "invalid 
conversion from .. to .." emitted by convert_like_real.


Something seems a little fishy here but I'm not sure which way we want 
to go: I don't think we want to audit that right here, right now all the 
permerrors in the front-end which could potentially be involved in this 
kind of issue. Even if we, say, promote to plain errors the above two, 
that seems brittle, we got many permerrors which in principle should be 
real errors.


Paolo.



[PATCH] fix constrained auto parsing issue

2019-10-17 Thread Andrew Sutton
This fixes a parsing bug with constrained placeholders uses as the
first parameter of a constructor.

Andrew Sutton


0001-Fix-a-bug-with-type-constraints-in-constructors.patch
Description: Binary data


[PATCH] PR libstdc++/92124 fix incorrect container move assignment

2019-10-17 Thread Jonathan Wakely

The container requirements say that for move assignment "All existing
elements of [the target] are either move assigned or destroyed". Some of
our containers currently use __make_move_if_noexcept which makes the
move depend on whether the element type is nothrow move constructible.
This is incorrect, because the standard says we must move assign, not
move or copy depending on the move constructor.

Use make_move_iterator instead so that we move unconditionally. This
ensures existing elements won't be copy assigned.

PR libstdc++/92124
* include/bits/forward_list.h
(_M_move_assign(forward_list&&, false_type)): Do not use
__make_move_if_noexcept, instead move unconditionally.
* include/bits/stl_deque.h (_M_move_assign2(deque&&, false_type)):
Likewise.
* include/bits/stl_list.h (_M_move_assign(list&&, false_type)):
Likewise.
* include/bits/stl_vector.h (_M_move_assign(vector&&, false_type)):
Likewise.
* testsuite/23_containers/vector/92124.cc: New test.

Tested x86_64-linux, committed to trunk.

commit 66f4aa35299bb8e967aa54930b27815cf8161693
Author: Jonathan Wakely 
Date:   Thu Oct 17 14:27:53 2019 +0100

PR libstdc++/92124 fix incorrect container move assignment

The container requirements say that for move assignment "All existing
elements of [the target] are either move assigned or destroyed". Some of
our containers currently use __make_move_if_noexcept which makes the
move depend on whether the element type is nothrow move constructible.
This is incorrect, because the standard says we must move assign, not
move or copy depending on the move constructor.

Use make_move_iterator instead so that we move unconditionally. This
ensures existing elements won't be copy assigned.

PR libstdc++/92124
* include/bits/forward_list.h
(_M_move_assign(forward_list&&, false_type)): Do not use
__make_move_if_noexcept, instead move unconditionally.
* include/bits/stl_deque.h (_M_move_assign2(deque&&, false_type)):
Likewise.
* include/bits/stl_list.h (_M_move_assign(list&&, false_type)):
Likewise.
* include/bits/stl_vector.h (_M_move_assign(vector&&, false_type)):
Likewise.
* testsuite/23_containers/vector/92124.cc: New test.

diff --git a/libstdc++-v3/include/bits/forward_list.h 
b/libstdc++-v3/include/bits/forward_list.h
index e686283a432..cab2ae788a7 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -1336,8 +1336,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
else
  // The rvalue's allocator cannot be moved, or is not equal,
  // so we need to individually move each element.
- this->assign(std::__make_move_if_noexcept_iterator(__list.begin()),
-  std::__make_move_if_noexcept_iterator(__list.end()));
+ this->assign(std::make_move_iterator(__list.begin()),
+  std::make_move_iterator(__list.end()));
   }
 
   // Called by assign(_InputIterator, _InputIterator) if _Tp is
diff --git a/libstdc++-v3/include/bits/stl_deque.h 
b/libstdc++-v3/include/bits/stl_deque.h
index ac76d681ff0..50491e76ff5 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -2256,8 +2256,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  {
// The rvalue's allocator cannot be moved and is not equal,
// so we need to individually move each element.
-   _M_assign_aux(std::__make_move_if_noexcept_iterator(__x.begin()),
- std::__make_move_if_noexcept_iterator(__x.end()),
+   _M_assign_aux(std::make_move_iterator(__x.begin()),
+ std::make_move_iterator(__x.end()),
  std::random_access_iterator_tag());
__x.clear();
  }
diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index 701982538df..328a79851a8 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -1957,8 +1957,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
else
  // The rvalue's allocator cannot be moved, or is not equal,
  // so we need to individually move each element.
- _M_assign_dispatch(std::__make_move_if_noexcept_iterator(__x.begin()),
-std::__make_move_if_noexcept_iterator(__x.end()),
+ _M_assign_dispatch(std::make_move_iterator(__x.begin()),
+std::make_move_iterator(__x.end()),
 __false_type{});
   }
 #endif
diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index d33e589498a..ff08b266692 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ 

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-17 Thread Paolo Carlini

Hi,

On 17/10/19 05:15, Jason Merrill wrote:

On 10/16/19 11:59 AM, Paolo Carlini wrote:
... the below, slightly extended patch: 1- Makes sure the 
in_system_header_at calls surviving in decl.c get the same location 
used for the corresponding diagnostic
Hmm, we probably want to change permerror to respect 
warn_system_headers like warning and pedwarn.


I see. Certainly it enables cleaning up those two remaining usages in 
decl.c.


Which is the right place to implement this? The beginning of 
diagnostic_impl figures out if a given permerror boils down to an actual 
error or a warning: if we use in_system_header_at at that level, part of 
the permissive_error_kind macro, then the warning is handled as any 
other other warning, thus by default suppressed in system headers, etc. 
Makes sense? Tested x86_64-linux.


Thanks, Paolo.



Index: cp/decl.c
===
--- cp/decl.c   (revision 277097)
+++ cp/decl.c   (working copy)
@@ -4933,10 +4933,9 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
  "multiple types in one declaration");
   else if (declspecs->redefined_builtin_type)
 {
-  if (!in_system_header_at (input_location))
-   permerror (declspecs->locations[ds_redefined_builtin_type_spec],
-  "redeclaration of C++ built-in type %qT",
-  declspecs->redefined_builtin_type);
+  permerror (declspecs->locations[ds_redefined_builtin_type_spec],
+"redeclaration of C++ built-in type %qT",
+declspecs->redefined_builtin_type);
   return NULL_TREE;
 }
 
@@ -4984,7 +4983,8 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
 --end example]  */
   if (saw_typedef)
{
- error ("missing type-name in typedef-declaration");
+ error_at (declspecs->locations[ds_typedef],
+   "missing type-name in typedef-declaration");
  return NULL_TREE;
}
   /* Anonymous unions are objects, so they can have specifiers.  */;
@@ -9328,7 +9328,6 @@ grokfndecl (tree ctype,
}
  /* 17.6.3.3.5  */
  if (suffix[0] != '_'
- && !in_system_header_at (location)
  && !current_function_decl && !(friendp && !funcdef_flag))
warning_at (location, OPT_Wliteral_suffix,
"literal operator suffixes not preceded by %<_%>"
@@ -10036,8 +10035,6 @@ compute_array_index_type_loc (location_t name_loc,
   indicated by the state of complain), so that
   another substitution can be found.  */
return error_mark_node;
- else if (in_system_header_at (input_location))
-   /* Allow them in system headers because glibc uses them.  */;
  else if (name)
pedwarn (loc, OPT_Wpedantic,
 "ISO C++ forbids zero-size array %qD", name);
@@ -11004,7 +11001,7 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (type_was_error_mark_node)
/* We've already issued an error, don't complain more.  */;
-  else if (in_system_header_at (input_location) || flag_ms_extensions)
+  else if (flag_ms_extensions)
/* Allow it, sigh.  */;
   else if (! is_main)
permerror (id_loc, "ISO C++ forbids declaration of %qs with no type",
@@ -11037,7 +11034,7 @@ grokdeclarator (const cp_declarator *declarator,
}
   /* Don't pedwarn if the alternate "__intN__" form has been used instead
 of "__intN".  */
-  else if (!int_n_alt && pedantic && ! in_system_header_at 
(input_location))
+  else if (!int_n_alt && pedantic)
pedwarn (declspecs->locations[ds_type_spec], OPT_Wpedantic,
 "ISO C++ does not support %<__int%d%> for %qs",
 int_n_data[declspecs->int_n_idx].bitsize, name);
@@ -12695,10 +12692,7 @@ grokdeclarator (const cp_declarator *declarator,
else
  {
/* Array is a flexible member.  */
-   if (in_system_header_at (input_location))
- /* Do not warn on flexible array members in system
-headers because glibc uses them.  */;
-   else if (name)
+   if (name)
  pedwarn (id_loc, OPT_Wpedantic,
   "ISO C++ forbids flexible array member %qs", name);
else
Index: cp/error.c
===
--- cp/error.c  (revision 277097)
+++ cp/error.c  (working copy)
@@ -4317,10 +4317,7 @@ cp_printer (pretty_printer *pp, text_info *text, c
 void
 maybe_warn_cpp0x (cpp0x_warn_str str)
 {
-  if ((cxx_dialect == cxx98) && !in_system_header_at (input_location))
-/* We really want to suppress this warning in system headers,
-   because libstdc++ uses variadic templates even when we aren't
-   in C++0x mode. */
+  if (cxx_dialect == cxx98)
 switch (str)
   {
 

[wwwdocs] Improve markup/nicer formatting for GIT instructions.

2019-10-17 Thread Gerald Pfeifer
Committed.

My first git push to the new wwwdocs repository ;-)  Thank you,
Joseph and everyone else who helped!

Gerald


>From 6df815817051ed1defc13eda2cbadc089da6d646 Mon Sep 17 00:00:00 2001
From: Gerald Pfeifer 
Date: Thu, 17 Oct 2019 10:19:42 +0200
Subject: [PATCH] Improve markup/nicer formatting for GIT instructions.

---
 htdocs/about.html | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/htdocs/about.html b/htdocs/about.html
index 019b6fbd..a812a7f9 100644
--- a/htdocs/about.html
+++ b/htdocs/about.html
@@ -53,10 +53,10 @@ a higher chance of being implemented soon. ;-)
 Assuming you have both git 
 and SSH installed, you can check out the web pages as follows:
 
-
- git clone 
git+ssh://username@gcc.gnu.org/git/gcc-wwwdocs.git
- where username is your user name at gcc.gnu.org
-
+
+git clone 
git+ssh://username@gcc.gnu.org/git/gcc-wwwdocs.git
+where username is your user name at gcc.gnu.org
+
 
 For anonymous access, use
 git://gcc.gnu.org/git/gcc-wwwdocs.git instead.
-- 
2.23.0



Re: [PATCH] RISC-V: Include more registers in SIBCALL_REGS.

2019-10-17 Thread Andrew Burgess
* Jim Wilson  [2019-10-16 14:04:45 -0700]:

> This finishes the part 1 of 2 patch submitted by Andrew Burgess on Aug 19.
> This adds the argument registers but not t0 (aka x5) to SIBCALL_REGS.  It
> also adds the missing riscv_regno_to_class change.
> 
> Tested with cross riscv32-elf and riscv64-linux toolchain build and check.
> There were no regressions.  I see about a 0.01% code size reduction for the
> C and libstdc++ libraries.
> 
> Committed.

Thanks for doing this Jim.

I'm still working on part 2, I'm hoping to have a revised patch posted
by Monday next week.

Thanks again,
Andrew



> 
> Jim
> 
>   gcc/
>   * config/riscv/riscv.h (REG_CLASS_CONTENTS): Add argument passing
>   regs to SIBCALL_REGS.
>   * config/riscv/riscv.c (riscv_regno_to_class): Change argument
>   passing regs to SIBCALL_REGS.
> ---
>  gcc/config/riscv/riscv.c | 6 +++---
>  gcc/config/riscv/riscv.h | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index b8a8778b92c..77a3ad94aa8 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -256,9 +256,9 @@ enum riscv_microarchitecture_type riscv_microarchitecture;
>  const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
>GR_REGS,   GR_REGS,GR_REGS,GR_REGS,
>GR_REGS,   GR_REGS,SIBCALL_REGS,   SIBCALL_REGS,
> -  JALR_REGS, JALR_REGS,  JALR_REGS,  JALR_REGS,
> -  JALR_REGS, JALR_REGS,  JALR_REGS,  JALR_REGS,
> -  JALR_REGS, JALR_REGS,  JALR_REGS,  JALR_REGS,
> +  JALR_REGS, JALR_REGS,  SIBCALL_REGS,   SIBCALL_REGS,
> +  SIBCALL_REGS,  SIBCALL_REGS,   SIBCALL_REGS,   SIBCALL_REGS,
> +  SIBCALL_REGS,  SIBCALL_REGS,   JALR_REGS,  JALR_REGS,
>JALR_REGS, JALR_REGS,  JALR_REGS,  JALR_REGS,
>JALR_REGS, JALR_REGS,  JALR_REGS,  JALR_REGS,
>SIBCALL_REGS,  SIBCALL_REGS,   SIBCALL_REGS,   SIBCALL_REGS,
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 5fc9be8edbf..246494663f6 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -400,7 +400,7 @@ enum reg_class
>  #define REG_CLASS_CONTENTS   \
>  {\
>{ 0x, 0x, 0x },/* NO_REGS */   \
> -  { 0xf0c0, 0x, 0x },/* SIBCALL_REGS */  \
> +  { 0xf003fcc0, 0x, 0x },/* SIBCALL_REGS */  \
>{ 0xffc0, 0x, 0x },/* JALR_REGS */ \
>{ 0x, 0x, 0x },/* GR_REGS */   \
>{ 0x, 0x, 0x },/* FP_REGS */   \
> -- 
> 2.17.1
> 


[PATCH] Simplify vect_is_simple_reduction

2019-10-17 Thread Richard Biener


Now we can solely rely on check_reduction_path.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-17  Richard Biener  

* tree-vect-loop.c (check_reduction_path): Compute reduction
operation here.
(vect_is_simple_reduction): Remove special-case of single-stmt
reduction path detection.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 0ad32cec2e2..734c8d60e86 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2565,11 +2565,12 @@ needs_fold_left_reduction_p (tree type, tree_code code)
 }
 
 /* Return true if the reduction PHI in LOOP with latch arg LOOP_ARG and
-   reduction operation CODE has a handled computation expression.  */
+   has a handled computation expression.  Store the main reduction
+   operation in *CODE.  */
 
 static bool
 check_reduction_path (dump_user_location_t loc, loop_p loop, gphi *phi,
- tree loop_arg, enum tree_code code,
+ tree loop_arg, enum tree_code *code,
  vec > )
 {
   auto_bitmap visited;
@@ -2639,6 +2640,7 @@ pop:
   /* Check whether the reduction path detected is valid.  */
   bool fail = path.length () == 0;
   bool neg = false;
+  *code = ERROR_MARK;
   for (unsigned i = 1; i < path.length (); ++i)
 {
   gimple *use_stmt = USE_STMT (path[i].second);
@@ -2655,23 +2657,23 @@ pop:
  fail = true;
  break;
}
-  if (gimple_assign_rhs_code (use_stmt) != code)
+  enum tree_code use_code = gimple_assign_rhs_code (use_stmt);
+  if (use_code == MINUS_EXPR)
{
- if (code == PLUS_EXPR
- && gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
-   {
- /* Track whether we negate the reduction value each iteration.  */
- if (gimple_assign_rhs2 (use_stmt) == op)
-   neg = ! neg;
-   }
- else
-   {
- fail = true;
- break;
-   }
+ use_code = PLUS_EXPR;
+ /* Track whether we negate the reduction value each iteration.  */
+ if (gimple_assign_rhs2 (use_stmt) == op)
+   neg = ! neg;
+   }
+  if (*code == ERROR_MARK)
+   *code = use_code;
+  else if (use_code != *code)
+   {
+ fail = true;
+ break;
}
 }
-  return ! fail && ! neg;
+  return ! fail && ! neg && *code != ERROR_MARK;
 }
 
 bool
@@ -2679,7 +2681,9 @@ check_reduction_path (dump_user_location_t loc, loop_p 
loop, gphi *phi,
  tree loop_arg, enum tree_code code)
 {
   auto_vec > path;
-  return check_reduction_path (loc, loop, phi, loop_arg, code, path);
+  enum tree_code code_;
+  return (check_reduction_path (loc, loop, phi, loop_arg, _, path)
+ && code_ == code);
 }
 
 
@@ -2862,86 +2866,18 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   return NULL;
 }
 
-  gassign *def_stmt = dyn_cast  (def_stmt_info->stmt);
-  if (!def_stmt)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"reduction: unhandled reduction operation: %G",
-def_stmt_info->stmt);
-  return NULL;
-}
-  enum tree_code code = gimple_assign_rhs_code (def_stmt);
-
-  /* We can handle "res -= x[i]", which is non-associative by
- simply rewriting this into "res += -x[i]".  Avoid changing
- gimple instruction for the first simple tests and only do this
- if we're allowed to change code at all.  */
-  if (code == MINUS_EXPR && gimple_assign_rhs2 (def_stmt) != phi_name)
-code = PLUS_EXPR;
-
-  tree op1, op2;
-  if (code == COND_EXPR)
-{
-  if (! nested_in_vect_loop)
-   STMT_VINFO_REDUC_TYPE (phi_info) = COND_REDUCTION;
-  op1 = gimple_assign_rhs2 (def_stmt);
-  op2 = gimple_assign_rhs3 (def_stmt);
-}
-  else if (get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS)
-{
-  op1 = gimple_assign_rhs1 (def_stmt);
-  op2 = gimple_assign_rhs2 (def_stmt);
-}
-  else
-{
-  if (dump_enabled_p ())
-   report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-   "reduction: not handled operation: ");
-  return NULL;
-}
-
-  if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) != SSA_NAME)
-{
-  if (dump_enabled_p ())
-   report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-   "reduction: both uses not ssa_names: ");
-
-  return NULL;
-}
-
-  /* Reduction is safe. We're dealing with one of the following:
- 1) integer arithmetic and no trapv
- 2) floating point arithmetic, and special flags permit this optimization
- 3) nested cycle (i.e., outer loop vectorization).  */
-
-  /* Check for the simple case that one def is the reduction def,
- defined by the PHI node.  */
-  stmt_vec_info def1_info = loop_info->lookup_def (op1);
-  stmt_vec_info def2_info = loop_info->lookup_def (op2);
-  

Re: [PATCH][wwwdocs] Purge CVS from gccmission.html

2019-10-17 Thread Kyrill Tkachov



On 10/15/19 5:29 PM, Gerald Pfeifer wrote:

On Mon, 14 Oct 2019, Kyrill Tkachov wrote:

Surely would be fine with me.

I see, thanks. Here's a proposed patch then.

My previous mail was meant to pre-approve your patch. ;-)


Right, I meant to take it as such, but got distracted with the committing :)

Committed now.

Kyrill



Yes, this is okay.

Thanks,
Gerald


[arm] Add default FPU for Marvell-pj4

2019-10-17 Thread Richard Earnshaw (lists)
According to GAS, the Marvell PJ4 CPU has a VFPv3-D16 floating point 
unit, but GCC's CPU configuration tables omits this, meaning that 
-mfpu=auto will not correctly select the FPU.  This patch fixes this by 
adding the +fp option to the architecture specification for this device.


* config/arm/arm-cpus.in (marvel-pj4): Add +fp to the architecture.

Committed to trunk.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index f8a3b3db67a..50379a0a10a 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1150,7 +1150,7 @@ end cpu cortex-m3
 
 begin cpu marvell-pj4
  tune flags LDSCHED
- architecture armv7-a+mp+sec
+ architecture armv7-a+mp+sec+fp
  costs marvell_pj4
 end cpu marvell-pj4
 


Re: [Patch, fortran] PR91926 - assumed rank optional

2019-10-17 Thread Tobias Burnus

See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92027
for a tracking bug – I just added also some analysis.

Tobias

PS: A better patch submission, with the actual patch attached, would 
have been nice. Please re-post the committed patch – and the new patch, 
which fixes the fall out. – Thanks!


On 10/9/19 12:26 PM, Paul Richard Thomas wrote:

Hi Christophe,

Thanks for flagging this up - I am back at base on Saturday and will
take it up then.

Regards

Paul

On Wed, 9 Oct 2019 at 11:13, Christophe Lyon  wrote:

Hi,


On Sat, 5 Oct 2019 at 20:31, Paul Richard Thomas 
 wrote:

I must apologise not posting this before committing. I left for a
vacation this morning and I thought that this problem and the one
posted by Gilles were best fixed before departing. The patch only
touches the new ISO_Fortran binding feature and so I thought that I
would be safe to do this.

It was fully regtested and only applies to trunk.

Paul

Author: pault
Date: Sat Oct  5 08:17:55 2019
New Revision: 276624

URL: https://gcc.gnu.org/viewcvs?rev=276624=gcc=rev
Log:
2019-10-05  Paul Thomas  

 PR fortran/91926
 * trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Correct the
 assignment of the attribute field to account correctly for an
 assumed shape dummy. Assign separately to the gfc and cfi
 descriptors since the atribute can be different. Add btanch to
 correctly handle missing optional dummies.

2019-10-05  Paul Thomas  

 PR fortran/91926
 * gfortran.dg/ISO_Fortran_binding_13.f90 : New test.
 * gfortran.dg/ISO_Fortran_binding_13.c : Additional source.
 * gfortran.dg/ISO_Fortran_binding_14.f90 : New test.

2019-10-05  Paul Thomas  

 PR fortran/91926
 * runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Do not
 modify the bounds and offset for CFI_other.

Added:
 trunk/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_13.c
 trunk/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_13.f90
 trunk/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_14.f90
Modified:
 trunk/gcc/fortran/ChangeLog
 trunk/gcc/fortran/trans-expr.c
 trunk/gcc/testsuite/ChangeLog
 trunk/libgfortran/ChangeLog
 trunk/libgfortran/runtime/ISO_Fortran_binding.c



Since this was committed (r276624), I have noticed regressions on 
arm-linux-gnueabihf:
FAIL: gfortran.dg/ISO_Fortran_binding_11.f90   -O3 -g  execution test
I've seen other reports on gcc-testresults too.

Christophe





Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-17 Thread Richard Sandiford
Yuliang Wang  writes:
> Thanks very much, updated.
>
> Regards,
> Yuliang
>
>
> gcc/ChangeLog:
>
> 2019-10-17  Yuliang Wang  
>
>   * config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3)
>   (aarch64_sve2_nor, aarch64_sve2_nand)
>   (aarch64_sve2_bsl, aarch64_sve2_nbsl)
>   (aarch64_sve2_bsl1n, aarch64_sve2_bsl2n):
>   New combine patterns.
>   * config/aarch64/iterators.md (BSL_DUP): New int iterator for the above.
>   (bsl_1st, bsl_2nd, bsl_dup, bsl_mov): Attributes for the above.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-17  Yuliang Wang  
>
>   * gcc.target/aarch64/sve2/eor3_1.c: New test.
>   * gcc.target/aarch64/sve2/nlogic_1.c: As above.
>   * gcc.target/aarch64/sve2/nlogic_2.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_1.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_2.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_3.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_4.c: As above.

Thanks, applied as r277110.

Richard


Re: [PATCH 3/3] Implementation of -Wclobbered on tree-ssa

2019-10-17 Thread Alexander Monakov
On Tue, 8 Oct 2019, Alexander Monakov wrote:

[massive snip]

> So in my opinion our CFG is good enough, the real issues with -Wclobbered 
> false
> positives are not due to phi nodes but other effects.
> 
> If you agree: what would be the next steps?

Hello,

may I ping this discussion?  I apologize for letting my cranky mood that day 
leak
too badly into the message.

Alexander


Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-17 Thread Segher Boessenkool
On Wed, Oct 16, 2019 at 11:44:37PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> >> If someone wants to add a new canonical form then the ports should of
> >> course adapt, but until then I think the patch is doing the right thing.
> >
> > We used to generate this, until GCC 5.  There aren't many ports that have
> > adapted yet.
> 
> The patch has testcases, so this won't be a silent failure for SVE2
> if things change again in future.

Sure.  But I am saying the current behaviour should *not* be canonical
(and it never was); it would make more sense to have the more sensible
behaviour we had for the many years (or decades?) before as canonical.

> >> > If the mask is not a constant, we really shouldn't generate a totally
> >> > different form.  The xor-and-xor form is very hard to handle, too.
> >> >
> >> > Expand currently generates this, because gimple thinks this is simpler.
> >> > I think this should be fixed.
> >> 
> >> But the constant form is effectively folding away the NOT.
> >> Without it the equivalent rtl uses 4 operations rather than 3:
> >> 
> >>   (ior (and A C) (and B (not C)))
> >
> > RTL canonicalisation rules are not based around number of ops.
> 
> Not to the exclusion of all else, sure.

No, not *at all*.  As a side-effect this is sometimes the case, of course,
perhaps more often than not; but it is not a rule.

> But my point was that there
> are reasons why forcing the (ior ...) form for non-constants might not
> be a strict improvement.

Sure.  But we either have canonical forms, which might be a bit awkward
sometimes, or we have to handle two (or three, or sometimes many more)
different forms everywhere.  This hurts especially in the MDs.  Like,
aarch now has this xor-and-xor pattern, but it is not canonical; some
parts in GCC might generate the and/ior thing for example, or instead
use some zero_extract thing (which in combine will then also try the
and/ior thing).

> > For example, we do (and (not A) (not B)) rather than (not (ior (A B)) .
> 
> Right, hence my complaint about this the other day on IRC. :-)
> I hadn't noticed until then that gimple had a different rule.

I think I missed that, sorry.

> > Instead, there are other rules (like here: push "not"s inward,
> > which can be applied locally with the wanted result).
> 
> Sure.  But I think it's common ground that there's no existing
> rtl rule that applies naturally to (xor (and (xor A B) C) B),
> where there's no (not ...) to push down.

Yes.  The documentation says
  @cindex @code{xor}, canonicalization of
  @item
  The only possible RTL expressions involving both bitwise exclusive-or
  and bitwise negation are @code{(xor:@var{m} @var{x} @var{y})}
  and @code{(not:@var{m} (xor:@var{m} @var{x} @var{y}))}.
and that is all it says about xor (and it is under-defined, anyway; surely
  (set (reg)
   (xor (mult (not (reg))
  (reg))
(reg)))
(it's hard to come up with a less silly example) is valid RTL as well!)

Because in the
  ((A^B) & C) ^ B
we have B twice it can be expressed in quite different ways.  I prefer
  (A) | (B&~C)
(which is the disjunctive normal form for this) (using ior instead of xor
is common, but that is not a documented canonicalisation either); this is
nice because C has a different role (different than A and B) here, and it
is the same shape as you get for C a constant, and the expression is
nicely symmetrical too, and the expression tree is less deep (ignoring
the inversion ;-) )

In general, the RTL code does not handle xor very well, even aside from
all of the canonicalisation issues.  So maybe we should just not use xor
much?

> >> As you say, it's no accident that we get this form, it's something
> >> that match.pd specifically chose.  And I think there should be a
> >> strong justification for having an RTL canonical form that reverses
> >> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
> >> to be able to undo the gimple transforms in all cases.
> >
> > Canonical RTL is different in many ways, already.
> 
> Sure, wasn't claiming otherwise.  But most of the rtl canonicalisation
> rules predate gimple by some distance, so while the individual choices
> are certainly deliberate, the differences weren't necessarily planned
> as differences.

The RTL and Gimple rules have very different goals.

> Whereas here we're talking about adding a new rtl rule
> with the full knowledge that it's the ooposite of the equivalent gimple
> rule.

And also with the knowledge that it is what existing target code still
expects!  (The aarch64 code is the first using the "gimple" form for this
as far as I know).

> If we're going to move in one direction, it seems better to move
> towards making the rules more consistent rather than towards deliberately
> making them (even) less consistent.

I don't think this would matter at all.

> > "Not as powerful", I have no idea what you mean, btw.  RTL is much closer
> > to the real machine, so is a lot 

[committed][vect] Be consistent in versioning threshold use

2019-10-17 Thread Andre Vieira (lists)

Hi,

This piece of code was pre-approved by richi.

Retested by bootstrapping and regression testing on x86_64 (AVX512) and 
aarch64.


Committed in revision r277105.

gcc/ChangeLog:
2019-10-17  Andre Vieira  

* tree-vect-loop.c (vect_analyze_loop_2): Use same condition to 
decide

when to use versioning threshold.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b00d68b9de938148c24e088024b2e01bdceb2e81..72b80f46b1a9fa0bc8392809c286b5fac9a74451 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2155,6 +2155,7 @@ start_over:
   if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
 {
   poly_uint64 niters_th = 0;
+  unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
 
   if (!vect_use_loop_mask_for_alignment_p (loop_vinfo))
 	{
@@ -2175,6 +2176,14 @@ start_over:
   /* One additional iteration because of peeling for gap.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
 	niters_th += 1;
+
+  /*  Use the same condition as vect_transform_loop to decide when to use
+	  the cost to determine a versioning threshold.  */
+  if (th >= vect_vf_for_cost (loop_vinfo)
+	  && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+	  && ordered_p (th, niters_th))
+	niters_th = ordered_max (poly_uint64 (th), niters_th);
+
   LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo) = niters_th;
 }
 


Re: [patch,avr] Fix avr build broken by r276985.

2019-10-17 Thread Richard Biener
On Thu, 17 Oct 2019, Georg-Johann Lay wrote:

> Am 10/17/19 um 1:22 PM schrieb Eric Botcazou:
> >> r276985 broke avr because it removed PARAM_ALLOW_STORE_DATA_RACES from
> >> --params.  The patch fixes that by using flag_store_data_races = 1 instead.
> > 
> > Note that you'll unconditionally override the user, unlike the original
> > code.
> 
> You're right.  What about this one?

LGTM.

> Johann
> 
> 
>   Fix breakage introduced by r276985.
>   * config/avr/avr.c (avr_option_override): Remove set of
>   PARAM_ALLOW_STORE_DATA_RACES.
>   * common/config/avr/avr-common.c (avr_option_optimization_table)
>   [OPT_LEVELS_ALL]: Turn on -fallow-store-data-races.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[committed][vect] Outline code into new function: determine_peel_for_niter

2019-10-17 Thread Andre Vieira (lists)

Hi,

This piece of code was pre-approved by richi.

Retested by bootstrapping and regression testing on x86_64 (AVX512) and 
aarch64.


Committed in revision r277103.

gcc/ChangeLog:
2019-10-17  Andre Vieira  

* tree-vect-loop.c (determine_peel_for_niter): New function 
contained

outlined code from ...
(vect_analyze_loop_2): ... here.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 39125aa4fc6a17657bde76921c80980f18621600..b00d68b9de938148c24e088024b2e01bdceb2e81 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1809,6 +1809,57 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
 }
 }
 
+
+/* Decides whether we need to create an epilogue loop to handle
+   remaining scalar iterations and sets PEELING_FOR_NITERS accordingly.  */
+
+void
+determine_peel_for_niter (loop_vec_info loop_vinfo)
+{
+  LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
+
+  unsigned HOST_WIDE_INT const_vf;
+  HOST_WIDE_INT max_niter
+= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+
+  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
+  if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
+th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
+	  (loop_vinfo));
+
+  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+/* The main loop handles all iterations.  */
+LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
+  else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+	   && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+{
+  /* Work out the (constant) number of iterations that need to be
+	 peeled for reasons other than niters.  */
+  unsigned int peel_niter = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+	peel_niter += 1;
+  if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_niter,
+		   LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
+	LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
+}
+  else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
+	   /* ??? When peeling for gaps but not alignment, we could
+	  try to check whether the (variable) niters is known to be
+	  VF * N + 1.  That's something of a niche case though.  */
+	   || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+	   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (_vf)
+	   || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
+		< (unsigned) exact_log2 (const_vf))
+	   /* In case of versioning, check if the maximum number of
+		  iterations is greater than th.  If they are identical,
+		  the epilogue is unnecessary.  */
+	   && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
+		   || ((unsigned HOST_WIDE_INT) max_niter
+		   > (th / const_vf) * const_vf
+LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
+}
+
+
 /* Function vect_analyze_loop_2.
 
Apply a set of analyses on LOOP, and create a loop_vec_info struct
@@ -1936,7 +1987,6 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool , unsigned *n_stmts)
   vect_compute_single_scalar_iteration_cost (loop_vinfo);
 
   poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  unsigned th;
 
   /* Check the SLP opportunities in the loop, analyze and build SLP trees.  */
   ok = vect_analyze_slp (loop_vinfo, *n_stmts);
@@ -1976,9 +2026,6 @@ start_over:
 		   LOOP_VINFO_INT_NITERS (loop_vinfo));
 }
 
-  HOST_WIDE_INT max_niter
-= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
-
   /* Analyze the alignment of the data-refs in the loop.
  Fail if a data reference is found that cannot be vectorized.  */
 
@@ -2082,42 +2129,7 @@ start_over:
 return opt_result::failure_at (vect_location,
    "Loop costings not worthwhile.\n");
 
-  /* Decide whether we need to create an epilogue loop to handle
- remaining scalar iterations.  */
-  th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
-
-  unsigned HOST_WIDE_INT const_vf;
-  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-/* The main loop handles all iterations.  */
-LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
-  else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-	   && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
-{
-  /* Work out the (constant) number of iterations that need to be
-	 peeled for reasons other than niters.  */
-  unsigned int peel_niter = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
-	peel_niter += 1;
-  if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_niter,
-		   LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
-	LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
-}
-  else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
-	   /* ??? When peeling for gaps but not alignment, we could
-	  try to check whether the (variable) niters is known to be
-	  VF * N + 1.  That's something of a niche case though.  */
-	   || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-	   || 

[C++ PATCH] builtin fn creation

2019-10-17 Thread Nathan Sidwell
We're rather lax about setting DECL_CONTEXT correctly on several classes 
of builtins.  This is affecting modules.  But in cleaning it up I 
noticed cxx_builtin_function & cxx_builtin_function_ext_scope
* are doing very similar work (they're identical except for a boolean 
value).
* through a worker that repeats work (builtin_function_1 repeats its 
analysis on the ::std version).
* and the latter incorrectly pushes things intended for ::std into ::, 
(because it uses pushdecl_top_level (decl))


The bug I encountered was that the version pushed to :: had a NULL 
context (rather than FROB_CONTEXT (global_namespace), which is the 
translation unit decl).


This patch
* implements cxx_builtin_function_ext_scope in terms of 
cxx_builtin_function, by wrapping that call in a push_nested_namespace 
(global_namespace).

* folds builtin_function_1 into cxx_builtin_function
* applies the transforms to the incoming decl before cloning it
* uses copy_decl, a c++ decl-specific copy_node, which correctly 
duplicates the lang-specific data

* uses IDENTIFIER_LENGTH rather than strlen
* set DECL_CONTEXT to FROB_CONTEXT (current_namespace).

applying to trunk.

nathan
--
Nathan Sidwell
2019-10-17  Nathan Sidwell  

	* decl.c (builtin_function_1): Merge into ...
	(cxx_builtin_function): ... here.  Nadger the decl before maybe
	copying it.  Set the context.
	(cxx_builtin_function_ext_scope): Push to top level, then call
	cxx_builtin_function.

Index: decl.c
===
--- decl.c	(revision 277098)
+++ decl.c	(working copy)
@@ -72,5 +72,4 @@ static tree grokvardecl (tree, tree, tre
 static void check_static_variable_definition (tree, tree);
 static void record_unknown_type (tree, const char *);
-static tree builtin_function_1 (tree, tree, bool);
 static int member_function_or_else (tree, tree, enum overload_flags);
 static tree local_variable_p_walkfn (tree *, int *, void *);
@@ -4630,10 +4629,11 @@ cp_make_fname_decl (location_t loc, tree
 }
 
-static tree
-builtin_function_1 (tree decl, tree context, bool is_global)
-{
-  tree  id = DECL_NAME (decl);
-  const char *name = IDENTIFIER_POINTER (id);
+/* Install DECL as a builtin function at current (global) scope.
+   Return the new decl (if we found an existing version).  Also
+   installs it into ::std, if it's not '_*'.  */
 
+tree
+cxx_builtin_function (tree decl)
+{
   retrofit_lang_decl (decl);
 
@@ -4645,45 +4645,33 @@ builtin_function_1 (tree decl, tree cont
   DECL_VISIBILITY_SPECIFIED (decl) = 1;
 
-  DECL_CONTEXT (decl) = context;
-
-  /* A function in the user's namespace should have an explicit
- declaration before it is used.  Mark the built-in function as
- anticipated but not actually declared.  */
+  tree id = DECL_NAME (decl);
+  const char *name = IDENTIFIER_POINTER (id);
   if (name[0] != '_' || name[1] != '_')
+/* In the user's namespace, it must be declared before use.  */
+DECL_ANTICIPATED (decl) = 1;
+  else if (IDENTIFIER_LENGTH (id) > strlen ("___chk")
+	   && 0 != strncmp (name + 2, "builtin_", strlen ("builtin_"))
+	   && 0 == memcmp (name + IDENTIFIER_LENGTH (id) - strlen ("_chk"),
+			   "_chk", strlen ("_chk") + 1))
+/* Treat __*_chk fortification functions as anticipated as well,
+   unless they are __builtin_*_chk.  */
 DECL_ANTICIPATED (decl) = 1;
-  else if (strncmp (name + 2, "builtin_", strlen ("builtin_")) != 0)
-{
-  size_t len = strlen (name);
-
-  /* Treat __*_chk fortification functions as anticipated as well,
-	 unless they are __builtin_*.  */
-  if (len > strlen ("___chk")
-	  && memcmp (name + len - strlen ("_chk"),
-		 "_chk", strlen ("_chk") + 1) == 0)
-	DECL_ANTICIPATED (decl) = 1;
-}
-
-  if (is_global)
-return pushdecl_top_level (decl);
-  else
-return pushdecl (decl);
-}
 
-tree
-cxx_builtin_function (tree decl)
-{
-  tree  id = DECL_NAME (decl);
-  const char *name = IDENTIFIER_POINTER (id);
   /* All builtins that don't begin with an '_' should additionally
  go in the 'std' namespace.  */
   if (name[0] != '_')
 {
-  tree decl2 = copy_node(decl);
+  tree std_decl = copy_decl (decl);
+
   push_namespace (std_identifier);
-  builtin_function_1 (decl2, std_node, false);
+  DECL_CONTEXT (std_decl) = FROB_CONTEXT (std_node);
+  pushdecl (std_decl);
   pop_namespace ();
 }
 
-  return builtin_function_1 (decl, NULL_TREE, false);
+  DECL_CONTEXT (decl) = FROB_CONTEXT (current_namespace);
+  decl = pushdecl (decl);
+
+  return decl;
 }
 
@@ -4697,18 +4685,9 @@ tree
 cxx_builtin_function_ext_scope (tree decl)
 {
+  push_nested_namespace (global_namespace);
+  decl = cxx_builtin_function (decl);
+  pop_nested_namespace (global_namespace);
 
-  tree  id = DECL_NAME (decl);
-  const char *name = IDENTIFIER_POINTER (id);
-  /* All builtins that don't begin with an '_' should additionally
- go in the 'std' namespace.  */
-  if (name[0] != '_')
-{
-   

[committed][vect] Refactor versioning threshold

2019-10-17 Thread Andre Vieira (lists)

Hi,

This piece of code was pre-approved by richi.

Retested by bootstrapping and regression testing on x86_64 (AVX512) and 
aarch64.


Committed in revision r277101.

gcc/ChangeLog:
2019-10-17  Andre Vieira  

* tree-vect-loop.c (vect_transform_loop): Move code from here...
* tree-vect-loop-manip.c (vect_loop_versioning): ... to here.
* tree-vectorizer.h (vect_loop_versioning): Remove unused 
parameters.
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 5c25441c70a271f04730486e513437fffa75b7e3..a2902267c62889a63af09d121a631e6d8c6f69d5 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2966,9 +2966,7 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
*COND_EXPR_STMT_LIST.  */
 
 class loop *
-vect_loop_versioning (loop_vec_info loop_vinfo,
-		  unsigned int th, bool check_profitability,
-		  poly_uint64 versioning_threshold)
+vect_loop_versioning (loop_vec_info loop_vinfo)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop;
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
@@ -2988,10 +2986,15 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
   bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo);
   bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo);
   bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
+  poly_uint64 versioning_threshold
+= LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
   tree version_simd_if_cond
 = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
+  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
 
-  if (check_profitability)
+  if (th >= vect_vf_for_cost (loop_vinfo)
+  && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && !ordered_p (th, versioning_threshold))
 cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
 			 build_int_cst (TREE_TYPE (scalar_loop_iters),
 	th - 1));
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index cb95ea36298886955cfd789c75f09242e02e98d1..39125aa4fc6a17657bde76921c80980f18621600 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8166,18 +8166,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
 
   if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
 {
-  poly_uint64 versioning_threshold
-	= LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
-  if (check_profitability
-	  && ordered_p (poly_uint64 (th), versioning_threshold))
-	{
-	  versioning_threshold = ordered_max (poly_uint64 (th),
-	  versioning_threshold);
-	  check_profitability = false;
-	}
   class loop *sloop
-	= vect_loop_versioning (loop_vinfo, th, check_profitability,
-versioning_threshold);
+	= vect_loop_versioning (loop_vinfo);
   sloop->force_vectorize = false;
   check_profitability = false;
 }
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 291304fe95e13d8123877d4ff41c6d9fe8d60bb6..bdb6b87c7b2d61302c33b071f737ecea41c06d33 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1514,8 +1514,7 @@ extern void vect_set_loop_condition (class loop *, loop_vec_info,
 extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
 		 class loop *, edge);
-class loop *vect_loop_versioning (loop_vec_info, unsigned int, bool,
-   poly_uint64);
+class loop *vect_loop_versioning (loop_vec_info);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
  tree *, tree *, tree *, int, bool, bool);
 extern void vect_prepare_for_masked_peels (loop_vec_info);


[ARM,RFC] Avoid test failures when forcing incompatible options

2019-10-17 Thread Christophe Lyon
Hi,

While making experiments with -mpure-code, I ran the testsuite with
that flag, and noticed that several tests failed because they were
forcing an architecture version incompatible with this option.

I've just realized that I've come up with 2 different types of fixes:
(1) add a dg-skip-if (like eg. in gcc.target/arm/pr40657-1.c):
/* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } {
"-mpure-code" } } */
or in gcc.target/arm/pr52006.c:
/* { dg-skip-if "-mpure-code and -fPIC incompatible" { *-*-* } {
"-mpure-code" } } */

(2) in other cases I added:
/* { dg-require-effective-target arm_arch_v8a_ok } */
in gcc.target/arm/attr-crypto.c and ftest-armv7a-arm.c (and similar)

I have a patch that mixes both ways, but I'm wondering whether there's
a preferred approach?

(1) means we should in theory do this for every option that is not
supported by all cpu/arch/... version, and that we'd potentially want
to run the tests with

(2) means that the tests are skipped if one uses a runtestflag like
-mcpu=cortex-XX that is incompatible with arm_arch_YY_ok, even if one
doesn't for -mpure-code, but where the -march option safely overrides
the -mcpu.

For instance, if one forces -mcpu=cortex-m4, attr-crypto.c is compiled with
-mcpu=cortex-m4 -march=armv8-a which generates a warning, but compiles OK.
With my patch, it would be UNSUPPORTED because the effective-target
would fail on the warning.

If one foces -mcpu=cortex-m4 -mpure-code, that attr-crypto.c is
compiled with -mcpu=cortex-m4 -mpure-code -march=armv8-a, resulting in
a failure without my patch because -mpure-code conflicts with
-march=armv8-a. With my patch, the test is UNSUPPORTED.

Thoughts?

Christophe


[PATCH] Move the rest of the validity checks from vect_is_simple_reduction

2019-10-17 Thread Richard Biener


to vectorizable_reduction, that is.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

* tree-vect-loop.c (needs_fold_left_reduction_p): Export.
(vect_is_simple_reduction): Move all validity checks ...
(vectorizable_reduction): ... here.  Compute whether we
need a fold-left reduction here.
* tree-vect-patterns.c (vect_reassociating_reduction_p): Merge
both overloads, check needs_fold_left_reduction_p directly.
* tree-vectorizer.h (needs_fold_left_reduction_p): Declare.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 0530d6643b4..791c17ab0ea 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2536,7 +2536,7 @@ report_vect_op (dump_flags_t msg_type, gimple *stmt, 
const char *msg)
on type TYPE.  NEED_WRAPPING_INTEGRAL_OVERFLOW is true if integer
overflow must wrap.  */
 
-static bool
+bool
 needs_fold_left_reduction_p (tree type, tree_code code)
 {
   /* CHECKME: check for !flag_finite_math_only too?  */
@@ -2888,13 +2888,6 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   op1 = gimple_assign_rhs2 (def_stmt);
   op2 = gimple_assign_rhs3 (def_stmt);
 }
-  else if (!commutative_tree_code (code) || !associative_tree_code (code))
-{
-  if (dump_enabled_p ())
-   report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-   "reduction: not commutative/associative: ");
-  return NULL;
-}
   else if (get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS)
 {
   op1 = gimple_assign_rhs1 (def_stmt);
@@ -2917,18 +2910,6 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   return NULL;
 }
 
-  /* Check whether it's ok to change the order of the computation.
- Generally, when vectorizing a reduction we change the order of the
- computation.  This may change the behavior of the program in some
- cases, so we need to check that this is ok.  One exception is when
- vectorizing an outer-loop: the inner-loop is executed sequentially,
- and therefore vectorizing reductions in the inner-loop during
- outer-loop vectorization is safe.  */
-  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
-  if (STMT_VINFO_REDUC_TYPE (phi_info) == TREE_CODE_REDUCTION
-  && needs_fold_left_reduction_p (type, code))
-STMT_VINFO_REDUC_TYPE (phi_info) = FOLD_LEFT_REDUCTION;
-
   /* Reduction is safe. We're dealing with one of the following:
  1) integer arithmetic and no trapv
  2) floating point arithmetic, and special flags permit this optimization
@@ -5633,7 +5614,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, slp_tree 
slp_node,
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   enum tree_code code;
-  internal_fn reduc_fn;
   int op_type;
   enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
   stmt_vec_info cond_stmt_vinfo = NULL;
@@ -5872,19 +5852,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
  operation in the reduction meta.  */
   STMT_VINFO_REDUC_IDX (reduc_info) = reduc_index;
 
-  /* When vectorizing a reduction chain w/o SLP the reduction PHI is not
- directy used in stmt.  */
-  if (reduc_index == -1)
-{
-  if (STMT_VINFO_REDUC_TYPE (phi_info) == FOLD_LEFT_REDUCTION)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"in-order reduction chain without SLP.\n");
- return false;
-   }
-}
-
   if (!(reduc_index == -1
|| dts[reduc_index] == vect_reduction_def
|| dts[reduc_index] == vect_nested_cycle
@@ -6047,17 +6014,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
   double_reduc = true;
 }
 
-  vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
-  if ((double_reduc || reduction_type != TREE_CODE_REDUCTION)
-  && ncopies > 1)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"multiple types in double reduction or condition "
-"reduction.\n");
-  return false;
-}
-
   /* 4.2. Check support for the epilog operation.
 
   If STMT represents a reduction pattern, then the type of the
@@ -6093,38 +6049,75 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
   (and also the same tree-code) when generating the epilog code and
   when generating the code inside the loop.  */
 
-  enum tree_code orig_code;
-  if (orig_stmt_info
-  && (reduction_type == TREE_CODE_REDUCTION
- || reduction_type == FOLD_LEFT_REDUCTION))
+  vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
+  enum tree_code orig_code = ERROR_MARK;
+  if (reduction_type == CONST_COND_REDUCTION
+  || reduction_type == 

Re: [patch,avr] Fix avr build broken by r276985.

2019-10-17 Thread Georg-Johann Lay

Am 10/17/19 um 1:22 PM schrieb Eric Botcazou:

r276985 broke avr because it removed PARAM_ALLOW_STORE_DATA_RACES from
--params.  The patch fixes that by using flag_store_data_races = 1 instead.


Note that you'll unconditionally override the user, unlike the original code.


You're right.  What about this one?

Johann


Fix breakage introduced by r276985.
* config/avr/avr.c (avr_option_override): Remove set of
PARAM_ALLOW_STORE_DATA_RACES.
* common/config/avr/avr-common.c (avr_option_optimization_table)
[OPT_LEVELS_ALL]: Turn on -fallow-store-data-races.

Index: common/config/avr/avr-common.c
===
--- common/config/avr/avr-common.c	(revision 277097)
+++ common/config/avr/avr-common.c	(working copy)
@@ -38,6 +38,11 @@ static const struct default_options avr_
 { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },
+/* Allow optimizer to introduce store data races. This used to be the
+   default -- it was changed because bigger targets did not see any
+   performance decrease. For the AVR though, disallowing data races
+   introduces additional code in LIM and increases reg pressure.  */
+{ OPT_LEVELS_ALL, OPT_fallow_store_data_races, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 277097)
+++ config/avr/avr.c	(working copy)
@@ -741,15 +741,6 @@ avr_option_override (void)
   if (avr_strict_X)
 flag_caller_saves = 0;
 
-  /* Allow optimizer to introduce store data races. This used to be the
- default - it was changed because bigger targets did not see any
- performance decrease. For the AVR though, disallowing data races
- introduces additional code in LIM and increases reg pressure.  */
-
-  maybe_set_param_value (PARAM_ALLOW_STORE_DATA_RACES, 1,
- global_options.x_param_values,
- global_options_set.x_param_values);
-
   /* Unwind tables currently require a frame pointer for correctness,
  see toplev.c:process_options().  */
 


[COMMITTED][ARM,testsuite] Fix typo in arm_arch_v8a_ok effective target

2019-10-17 Thread Christophe Lyon
Hi,

The arm_arch_v8a_ok effective-target lacks a closing bracket in these
tests, resulting in it being ignored.

2019-10-17  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/vseleqdf.c: Add missing closing bracket.
* gcc.target/arm/vseleqsf.c: Likewise.
* gcc.target/arm/vselgedf.c: Likewise.
* gcc.target/arm/vselgesf.c: Likewise.
* gcc.target/arm/vselgtdf.c: Likewise.
* gcc.target/arm/vselgtsf.c: Likewise.
* gcc.target/arm/vselledf.c: Likewise.
* gcc.target/arm/vsellesf.c: Likewise.
* gcc.target/arm/vselltdf.c: Likewise.
* gcc.target/arm/vselltsf.c: Likewise.
* gcc.target/arm/vselnedf.c: Likewise.
* gcc.target/arm/vselnesf.c: Likewise.
* gcc.target/arm/vselvcdf.c: Likewise.
* gcc.target/arm/vselvcsf.c: Likewise.
* gcc.target/arm/vselvsdf.c: Likewise.
* gcc.target/arm/vselvssf.c: Likewise.

I've committed it as obvious.

Christophe
commit 46b34d2567ba517a5f215d0817607843f1747d39
Author: Christophe Lyon 
Date:   Thu Oct 17 11:12:56 2019 +0200

[ARM,testsuite] Fix typo in arm_arch_v8a_ok effective target.

The arm_arch_v8a_ok effective-target lacks a closing bracket in these
tests, resulting in it being ignored.

2019-10-17  Christophe Lyon  

	gcc/testsuite/
	* gcc.target/arm/vseleqdf.c: Add missing closing bracket.
	* gcc.target/arm/vseleqsf.c: Likewise.
	* gcc.target/arm/vselgedf.c: Likewise.
	* gcc.target/arm/vselgesf.c: Likewise.
	* gcc.target/arm/vselgtdf.c: Likewise.
	* gcc.target/arm/vselgtsf.c: Likewise.
	* gcc.target/arm/vselledf.c: Likewise.
	* gcc.target/arm/vsellesf.c: Likewise.
	* gcc.target/arm/vselltdf.c: Likewise.
	* gcc.target/arm/vselltsf.c: Likewise.
	* gcc.target/arm/vselnedf.c: Likewise.
	* gcc.target/arm/vselnesf.c: Likewise.
	* gcc.target/arm/vselvcdf.c: Likewise.
	* gcc.target/arm/vselvcsf.c: Likewise.
	* gcc.target/arm/vselvsdf.c: Likewise.
	* gcc.target/arm/vselvssf.c: Likewise.

Change-Id: Ib1f684f2cbba963e55226e8c79aad66ba60d61fe

diff --git a/gcc/testsuite/gcc.target/arm/vseleqdf.c b/gcc/testsuite/gcc.target/arm/vseleqdf.c
index 64d5784..8a43335 100644
--- a/gcc/testsuite/gcc.target/arm/vseleqdf.c
+++ b/gcc/testsuite/gcc.target/arm/vseleqdf.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-require-effective-target arm_v8_vfp_ok } */
 /* { dg-options "-O2 -mcpu=cortex-a57" } */
 /* { dg-add-options arm_v8_vfp } */
diff --git a/gcc/testsuite/gcc.target/arm/vseleqsf.c b/gcc/testsuite/gcc.target/arm/vseleqsf.c
index b052704..fc46318 100644
--- a/gcc/testsuite/gcc.target/arm/vseleqsf.c
+++ b/gcc/testsuite/gcc.target/arm/vseleqsf.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-require-effective-target arm_v8_vfp_ok } */
 /* { dg-options "-O2 -mcpu=cortex-a57" } */
 /* { dg-add-options arm_v8_vfp } */
diff --git a/gcc/testsuite/gcc.target/arm/vselgedf.c b/gcc/testsuite/gcc.target/arm/vselgedf.c
index e10508f..9a74edd 100644
--- a/gcc/testsuite/gcc.target/arm/vselgedf.c
+++ b/gcc/testsuite/gcc.target/arm/vselgedf.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-require-effective-target arm_v8_vfp_ok } */
 /* { dg-options "-O2 -mcpu=cortex-a57" } */
 /* { dg-add-options arm_v8_vfp } */
diff --git a/gcc/testsuite/gcc.target/arm/vselgesf.c b/gcc/testsuite/gcc.target/arm/vselgesf.c
index 645cf5d..5f10954 100644
--- a/gcc/testsuite/gcc.target/arm/vselgesf.c
+++ b/gcc/testsuite/gcc.target/arm/vselgesf.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-require-effective-target arm_v8_vfp_ok } */
 /* { dg-options "-O2 -mcpu=cortex-a57" } */
 /* { dg-add-options arm_v8_vfp } */
diff --git a/gcc/testsuite/gcc.target/arm/vselgtdf.c b/gcc/testsuite/gcc.target/arm/vselgtdf.c
index 741b9a8..7ceaa34 100644
--- a/gcc/testsuite/gcc.target/arm/vselgtdf.c
+++ b/gcc/testsuite/gcc.target/arm/vselgtdf.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-require-effective-target arm_v8_vfp_ok } */
 /* { dg-options "-O2 -mcpu=cortex-a57" } */
 /* { dg-add-options arm_v8_vfp } */
diff --git a/gcc/testsuite/gcc.target/arm/vselgtsf.c b/gcc/testsuite/gcc.target/arm/vselgtsf.c
index 3042c5b..9062ba2 100644
--- a/gcc/testsuite/gcc.target/arm/vselgtsf.c
+++ b/gcc/testsuite/gcc.target/arm/vselgtsf.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v8a_ok */
+/* { dg-require-effective-target 

Re: [patch,avr] Fix avr build broken by r276985.

2019-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2019 at 01:22:54PM +0200, Eric Botcazou wrote:
> > r276985 broke avr because it removed PARAM_ALLOW_STORE_DATA_RACES from
> > --params.  The patch fixes that by using flag_store_data_races = 1 instead.
> 
> Note that you'll unconditionally override the user, unlike the original code.

Yeah, better make that
  if (!global_options_set.x_flag_store_data_races)
flag_store_data_races = 1;

Jakub


[patch,avr]: PR86040: Fix missing reset of RAMPZ after ELPM.

2019-10-17 Thread Georg-Johann Lay

Hi, for families avrxmega5/7 after ELPM the reset of RAMPZ to
zero was missing in some situations due to  shortcut-return in
avr_out_lpm which which bypassed that reset.

Ok to apply and backport?

Johann

PR target/86040
* config/avr/avr.c (avr_out_lpm): Do not shortcut-return.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 276704)
+++ config/avr/avr.c	(working copy)
@@ -3780,13 +3780,14 @@ avr_out_lpm (rtx_insn *insn, rtx *op, in
   gcc_unreachable();
 
 case 1:
-  return avr_asm_len ("%4lpm %0,%a2", xop, plen, 1);
+  avr_asm_len ("%4lpm %0,%a2", xop, plen, 1);
+  break;
 
 case 2:
   if (REGNO (dest) == REG_Z)
-return avr_asm_len ("%4lpm %5,%a2+" CR_TAB
-"%4lpm %B0,%a2" CR_TAB
-"mov %A0,%5", xop, plen, 3);
+avr_asm_len ("%4lpm %5,%a2+" CR_TAB
+ "%4lpm %B0,%a2" CR_TAB
+ "mov %A0,%5", xop, plen, 3);
   else
 {
   avr_asm_len ("%4lpm %A0,%a2+" CR_TAB
@@ -3815,9 +3816,9 @@ avr_out_lpm (rtx_insn *insn, rtx *op, in
"%4lpm %B0,%a2+", xop, plen, 2);
 
   if (REGNO (dest) == REG_Z - 2)
-return avr_asm_len ("%4lpm %5,%a2+" CR_TAB
-"%4lpm %C0,%a2" CR_TAB
-"mov %D0,%5", xop, plen, 3);
+avr_asm_len ("%4lpm %5,%a2+" CR_TAB
+ "%4lpm %C0,%a2" CR_TAB
+ "mov %D0,%5", xop, plen, 3);
   else
 {
   avr_asm_len ("%4lpm %C0,%a2+" CR_TAB


Re: [patch,avr] Fix avr build broken by r276985.

2019-10-17 Thread Eric Botcazou
> r276985 broke avr because it removed PARAM_ALLOW_STORE_DATA_RACES from
> --params.  The patch fixes that by using flag_store_data_races = 1 instead.

Note that you'll unconditionally override the user, unlike the original code.

-- 
Eric Botcazou


Re: [patch,avr] Fix avr build broken by r276985.

2019-10-17 Thread Richard Biener
On Thu, 17 Oct 2019, Georg-Johann Lay wrote:

> Hi,
> 
> r276985 broke avr because it removed PARAM_ALLOW_STORE_DATA_RACES from
> --params.  The patch fixes that by using flag_store_data_races = 1 instead.
> 
> Ok to apply?

OK and sorry for the breakage.

Richard.


[patch,avr] Fix avr build broken by r276985.

2019-10-17 Thread Georg-Johann Lay

Hi,

r276985 broke avr because it removed PARAM_ALLOW_STORE_DATA_RACES from 
--params.  The patch fixes that by using flag_store_data_races = 1 instead.


Ok to apply?

Johann

* config/avr/avr.c (avr_option_override): Fix broken build
introduced by r276985.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 277097)
+++ config/avr/avr.c	(working copy)
@@ -746,9 +746,7 @@ avr_option_override (void)
  performance decrease. For the AVR though, disallowing data races
  introduces additional code in LIM and increases reg pressure.  */
 
-  maybe_set_param_value (PARAM_ALLOW_STORE_DATA_RACES, 1,
- global_options.x_param_values,
- global_options_set.x_param_values);
+  flag_store_data_races = 1;
 
   /* Unwind tables currently require a frame pointer for correctness,
  see toplev.c:process_options().  */


RE: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-17 Thread Yuliang Wang
Thanks very much, updated.

Regards,
Yuliang


gcc/ChangeLog:

2019-10-17  Yuliang Wang  

* config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3)
(aarch64_sve2_nor, aarch64_sve2_nand)
(aarch64_sve2_bsl, aarch64_sve2_nbsl)
(aarch64_sve2_bsl1n, aarch64_sve2_bsl2n):
New combine patterns.
* config/aarch64/iterators.md (BSL_DUP): New int iterator for the above.
(bsl_1st, bsl_2nd, bsl_dup, bsl_mov): Attributes for the above.

gcc/testsuite/ChangeLog:

2019-10-17  Yuliang Wang  

* gcc.target/aarch64/sve2/eor3_1.c: New test.
* gcc.target/aarch64/sve2/nlogic_1.c: As above.
* gcc.target/aarch64/sve2/nlogic_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_1.c: As above.
* gcc.target/aarch64/sve2/bitsel_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_3.c: As above.
* gcc.target/aarch64/sve2/bitsel_4.c: As above.


diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 
b018f5b0bc9b51edf831e2571f0f5a9af2210829..1158a76c49adc329d72a9eb9dbe6bf6f380f92c6
 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -142,3 +142,188 @@
   }
 )
 
+;; Unpredicated 3-way exclusive OR.
+(define_insn "*aarch64_sve2_eor3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, ?")
+   (xor:SVE_I
+ (xor:SVE_I
+   (match_operand:SVE_I 1 "register_operand" "0, w, w, w")
+   (match_operand:SVE_I 2 "register_operand" "w, 0, w, w"))
+ (match_operand:SVE_I 3 "register_operand" "w, w, 0, w")))]
+  "TARGET_SVE2"
+  "@
+  eor3\t%0.d, %0.d, %2.d, %3.d
+  eor3\t%0.d, %0.d, %1.d, %3.d
+  eor3\t%0.d, %0.d, %1.d, %2.d
+  movprfx\t%0, %1\;eor3\t%0.d, %0.d, %2.d, %3.d"
+  [(set_attr "movprfx" "*,*,*,yes")]
+)
+
+;; Use NBSL for vector NOR.
+(define_insn_and_rewrite "*aarch64_sve2_nor"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
+ [(match_operand 3)
+  (and:SVE_I
+(not:SVE_I
+  (match_operand:SVE_I 1 "register_operand" "%0, w"))
+(not:SVE_I
+  (match_operand:SVE_I 2 "register_operand" "w, w")))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE2"
+  "@
+  nbsl\t%0.d, %0.d, %2.d, %0.d
+  movprfx\t%0, %1\;nbsl\t%0.d, %0.d, %2.d, %0.d"
+  "&& !CONSTANT_P (operands[3])"
+  {
+operands[3] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Use NBSL for vector NAND.
+(define_insn_and_rewrite "*aarch64_sve2_nand"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
+ [(match_operand 3)
+  (ior:SVE_I
+(not:SVE_I
+  (match_operand:SVE_I 1 "register_operand" "%0, w"))
+(not:SVE_I
+  (match_operand:SVE_I 2 "register_operand" "w, w")))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE2"
+  "@
+  nbsl\t%0.d, %0.d, %2.d, %2.d
+  movprfx\t%0, %1\;nbsl\t%0.d, %0.d, %2.d, %2.d"
+  "&& !CONSTANT_P (operands[3])"
+  {
+operands[3] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Unpredicated bitwise select.
+;; (op3 ? bsl_mov : bsl_dup) == (((bsl_mov ^ bsl_dup) & op3) ^ bsl_dup)
+(define_insn "*aarch64_sve2_bsl"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (xor:SVE_I
+ (and:SVE_I
+   (xor:SVE_I
+ (match_operand:SVE_I 1 "register_operand" ", w")
+ (match_operand:SVE_I 2 "register_operand" ", w"))
+   (match_operand:SVE_I 3 "register_operand" "w, w"))
+ (match_dup BSL_DUP)))]
+  "TARGET_SVE2"
+  "@
+  bsl\t%0.d, %0.d, %.d, %3.d
+  movprfx\t%0, %\;bsl\t%0.d, %0.d, %.d, %3.d"
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Unpredicated bitwise inverted select.
+;; (~(op3 ? bsl_mov : bsl_dup)) == (~(((bsl_mov ^ bsl_dup) & op3) ^ bsl_dup))
+(define_insn_and_rewrite "*aarch64_sve2_nbsl"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
+ [(match_operand 4)
+  (not:SVE_I
+(xor:SVE_I
+  (and:SVE_I
+(xor:SVE_I
+  (match_operand:SVE_I 1 "register_operand" ", w")
+  (match_operand:SVE_I 2 "register_operand" ", w"))
+(match_operand:SVE_I 3 "register_operand" "w, w"))
+  (match_dup BSL_DUP)))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE2"
+  "@
+  nbsl\t%0.d, %0.d, %.d, %3.d
+  movprfx\t%0, %\;nbsl\t%0.d, %0.d, %.d, %3.d"
+  "&& !CONSTANT_P (operands[4])"
+  {
+operands[4] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Unpredicated bitwise select with inverted first operand.
+;; (op3 ? ~bsl_mov : bsl_dup) == ((~(bsl_mov ^ bsl_dup) & op3) ^ bsl_dup)
+(define_insn_and_rewrite "*aarch64_sve2_bsl1n"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (xor:SVE_I
+ (and:SVE_I
+   (unspec:SVE_I
+ [(match_operand 4)
+  (not:SVE_I
+

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-17 Thread Richard Sandiford
Yuliang Wang  writes:
> Hi Richard,
>
> Thanks for the suggestions, updated.
>
> Regards,
> Yuliang
>
>
> gcc/ChangeLog:
>
> 2019-10-17  Yuliang Wang  
>
>   * config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3)
>   (aarch64_sve2_nor, aarch64_sve2_nand)
>   (aarch64_sve2_bsl, aarch64_sve2_nbsl)
>   (aarch64_sve2_bsl1n, aarch64_sve2_bsl2n):
>   New combine patterns.
>   * config/aarch64/iterators.md (BSL_DUP): New int iterator for the above.
>   (bsl_1st, bsl_2nd, bsl_dup, bsl_mov): Attributes for the above.
>   * config/aarch64/aarch64.h (AARCH64_ISA_SVE2_SHA3): New ISA flag macro.
>   (TARGET_SVE2_SHA3): New CPU target.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-17  Yuliang Wang  
>
>   * gcc.target/aarch64/sve2/eor3_1.c: New test.
>   * gcc.target/aarch64/sve2/eor3_2.c: As above.
>   * gcc.target/aarch64/sve2/nlogic_1.c: As above.
>   * gcc.target/aarch64/sve2/nlogic_2.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_1.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_2.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_3.c: As above.
>   * gcc.target/aarch64/sve2/bitsel_4.c: As above.
>
>
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 
> b018f5b0bc9b51edf831e2571f0f5a9af2210829..08d5214a3debb9e9a0796da0af3009ed3ff55774
>  100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -142,3 +142,189 @@
>}
>  )
>  
> +;; Unpredicated 3-way exclusive OR.
> +(define_insn "*aarch64_sve2_eor3"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, ?")
> + (xor:SVE_I
> +   (xor:SVE_I
> + (match_operand:SVE_I 1 "register_operand" "0, w, w, w")
> + (match_operand:SVE_I 2 "register_operand" "w, 0, w, w"))
> +   (match_operand:SVE_I 3 "register_operand" "w, w, 0, w")))]
> +  "TARGET_SVE2_SHA3"

EOR3 is part of base SVE2, it doesn't require the SHA3 extension.

> +;; Unpredicated bitwise select.
> +;; N.B. non-canonical equivalent form due to expand pass.

Think it would be better to drop this line (and similarly for
the patterns below).  The form isn't non-canonical -- there just
isn't a defined canonical from here. :-)  It is the expected form
as things stand.

> +;; (op3 ? bsl_mov : bsl_dup) == (((bsl_mov ^ bsl_dup) & op3) ^ bsl_dup)
> +(define_insn "*aarch64_sve2_bsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (xor:SVE_I
> +   (and:SVE_I
> + (xor:SVE_I
> +   (match_operand:SVE_I 1 "register_operand" ", w")
> +   (match_operand:SVE_I 2 "register_operand" ", w"))
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_DUP)))]
> +  "TARGET_SVE2"
> +  "@
> +  bsl\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;bsl\t%0.d, %0.d, %.d, %3.d"
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise inverted select.
> +;; N.B. non-canonical equivalent form.
> +;; (~(op3 ? bsl_mov : bsl_dup)) == (~(((bsl_mov ^ bsl_dup) & op3) ^ bsl_dup))
> +(define_insn_and_rewrite "*aarch64_sve2_nbsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (unspec:SVE_I
> +   [(match_operand 4)
> +(not:SVE_I
> +  (xor:SVE_I
> +(and:SVE_I
> +  (xor:SVE_I
> +(match_operand:SVE_I 1 "register_operand" ", w")
> +(match_operand:SVE_I 2 "register_operand" ", w"))
> +  (match_operand:SVE_I 3 "register_operand" "w, w"))
> +(match_dup BSL_DUP)))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;nbsl\t%0.d, %0.d, %.d, %3.d"
> +  "&& !CONSTANT_P (operands[4])"
> +  {
> +operands[4] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise select with inverted first operand.
> +;; N.B. non-canonical equivalent form.
> +;; (op3 ? ~bsl_mov : bsl_dup) == (((~bsl_mov ^ bsl_dup) & op3) ^ bsl_dup)

That's true, but I think:

;; (op3 ? ~bsl_mov : bsl_dup) == ((~(bsl_mov ^ bsl_dup) & op3) ^ bsl_dup)

is clearer, to match the rtl.

> +(define_insn_and_rewrite "*aarch64_sve2_bsl1n"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (xor:SVE_I
> +   (and:SVE_I
> + (unspec:SVE_I
> +   [(match_operand 4)
> +(not:SVE_I
> +  (xor:SVE_I
> +(match_operand:SVE_I 1 "register_operand" ", w")
> +(match_operand:SVE_I 2 "register_operand" ", w")))]
> +   UNSPEC_PRED_X)
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_DUP)))]
> +  "TARGET_SVE2"
> +  "@
> +  bsl1n\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;bsl1n\t%0.d, %0.d, %.d, %3.d"
> +  "&& !CONSTANT_P (operands[4])"
> +  {
> +operands[4] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise select with inverted second operand.

[patch,avr,testsuite,committed]: Fix location of an expected diagnostic.

2019-10-17 Thread Georg-Johann Lay

Hi, committed this patchlet that fixes a test case.

Johann

* gcc.target/avr/progmem-error-1.cpp: Fix location of the
expected diagnostic.

Index: gcc.target/avr/progmem-error-1.cpp
===
--- gcc.target/avr/progmem-error-1.cpp  (revision 277095)
+++ gcc.target/avr/progmem-error-1.cpp  (revision 277096)
@@ -2,7 +2,4 @@

 #include "progmem.h"

-char str[] PROGMEM = "Hallo";
-/* This is the line number of the PROGMEM definition in progmem.h. 
Keep it

-   absolute.  */
-/* { dg-error "must be const" "" { target avr-*-* } 1 } */
+char str[] PROGMEM = "Hallo"; /* { dg-error "must be const" "" { target 
avr-*-* } } */


[PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-10-17 Thread Dennis Zhang
Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and 
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to 
implement functionalities of creating tags, setting tags, reading tags, 
and manipulating tags.
The intrinsics are part of Arm ACLE extension: 
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at 
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.

Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
(aarch64_init_memtag_builtins): New.
(AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
(aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
(aarch64_expand_builtin_memtag): New.
(aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
(AARCH64_BUILTIN_SUBCODE): New macro.
(aarch64_resolve_overloaded_memtag): New.
(aarch64_resolve_overloaded_builtin): New hook. Call
aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_MEMORY_TAGGING when enabled.
* config/aarch64/aarch64-protos.h (aarch64_resolve_overloaded_builtin):
Add declaration.
* config/aarch64/aarch64.c (TARGET_RESOLVE_OVERLOADED_BUILTIN):
New hook.
* config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
(TARGET_MEMTAG): Likewise.
* config/aarch64/aarch64.md (define_c_enum "unspec"): Add
UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
(irg, gmi, subp, addg, ldg, stg): New instructions.
* config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
(__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
(__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
(aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
* config/arm/types.md (memtag): New.
* doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-10-16  Dennis Zhang  

* gcc.target/aarch64/acle/memtag_1.c: New test.
* gcc.target/aarch64/acle/memtag_2.c: New test.
* gcc.target/aarch64/acle/memtag_3.c: New test.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index e02ece8672a..b77bcc42eab 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -445,6 +445,15 @@ enum aarch64_builtins
   AARCH64_TME_BUILTIN_TCOMMIT,
   AARCH64_TME_BUILTIN_TTEST,
   AARCH64_TME_BUILTIN_TCANCEL,
+  /* MEMTAG builtins.  */
+  AARCH64_MEMTAG_BUILTIN_START,
+  AARCH64_MEMTAG_BUILTIN_IRG,
+  AARCH64_MEMTAG_BUILTIN_GMI,
+  AARCH64_MEMTAG_BUILTIN_SUBP,
+  AARCH64_MEMTAG_BUILTIN_INC_TAG,
+  AARCH64_MEMTAG_BUILTIN_SET_TAG,
+  AARCH64_MEMTAG_BUILTIN_GET_TAG,
+  AARCH64_MEMTAG_BUILTIN_END,
   AARCH64_BUILTIN_MAX
 };
 
@@ -,6 +1120,52 @@ aarch64_init_tme_builtins (void)
    AARCH64_TME_BUILTIN_TCANCEL);
 }
 
+/* Initialize the memory tagging extension (MTE) builtins.  */
+struct
+{
+  tree ftype;
+  enum insn_code icode;
+} aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_END -
+			  AARCH64_MEMTAG_BUILTIN_START - 1];
+
+static void
+aarch64_init_memtag_builtins (void)
+{
+  tree fntype = NULL;
+
+#define AARCH64_INIT_MEMTAG_BUILTINS_DECL(F, N, I, T) \
+  aarch64_builtin_decls[AARCH64_MEMTAG_BUILTIN_##F] \
+= aarch64_general_add_builtin ("__builtin_aarch64_memtag_"#N, \
+   T, AARCH64_MEMTAG_BUILTIN_##F); \
+  aarch64_memtag_builtin_data[AARCH64_MEMTAG_BUILTIN_##F - \
+			  AARCH64_MEMTAG_BUILTIN_START - 1] = \
+{T, CODE_FOR_##I};
+
+  fntype = build_function_type_list (ptr_type_node, ptr_type_node,
+ uint64_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (IRG, irg, irg, fntype);
+
+  fntype = build_function_type_list (uint64_type_node, ptr_type_node,
+ uint64_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (GMI, gmi, gmi, fntype);
+
+  fntype = build_function_type_list (ptrdiff_type_node, ptr_type_node,
+ ptr_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (SUBP, subp, subp, fntype);
+
+  fntype = build_function_type_list (ptr_type_node, ptr_type_node,
+ unsigned_type_node, NULL);
+  AARCH64_INIT_MEMTAG_BUILTINS_DECL (INC_TAG, 

Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Segher Boessenkool
On Thu, Oct 17, 2019 at 10:37:33AM +0100, Iain Sandoe wrote:
> Segher Boessenkool  wrote:
> 
> > Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
> > again in two weeks, maybe?  I know it's important for the darwin port, but
> > the generic part is a little scary.
> 
> No problem (I would like to get this in to the final issue of 7, if possible).
> 
> FWIW, this implementation is completely guarded on TARGET_MACHO.

Yeah, I misread that, somehow I thought the big change was inside
mem_operand_gpr.  It isn't, and it is perfectly safe elsewhere, so no
special care is needed here at all, for backports.

> >> --- a/gcc/config/rs6000/rs6000.md
> >> +++ b/gcc/config/rs6000/rs6000.md
> >> @@ -6894,13 +6894,6 @@
> >> ;; do the load 16-bits at a time.  We could do this by loading from memory,
> >> ;; and this is even supposed to be faster, but it is simpler not to get
> >> ;; integers in the TOC.
> >> -(define_insn "movsi_low"
> > 
> > Should the preceding comment be moved elsewhere / changed / deleted?
> 
> It seemed to be a comment about the following code - or something that should
> have been deleted long ago - it mentions the TOC, which Darwin does not use
> so not a Darwin-related thing.
> 
> Happy to do a separate patch to delete it that’s desired.

I think the comment is more confusing than helpful, currently.  So sure,
removing it is pre-approved.


Segher


[PATCH] Fix TARGET_MEM_REF handling in PRE

2019-10-17 Thread Richard Biener


It's not exercised but I ran into this with trying PRE after
IVOPTs.

Committed as obvious.

Richard.

2019-10-17  Richard Biener  

* tree-ssa-pre.c (create_component_ref_by_pieces_1): Fix
TARGET_MEM_REF creation.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 277094)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2492,7 +2492,7 @@ create_component_ref_by_pieces_1 (basic_
 case TARGET_MEM_REF:
   {
tree genop0 = NULL_TREE, genop1 = NULL_TREE;
-   vn_reference_op_t nextop = >operands[++*operand];
+   vn_reference_op_t nextop = >operands[(*operand)++];
tree baseop = create_component_ref_by_pieces_1 (block, ref, operand,
stmts);
if (!baseop)


Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Iain Sandoe
Segher Boessenkool  wrote:

> Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
> again in two weeks, maybe?  I know it's important for the darwin port, but
> the generic part is a little scary.

No problem (I would like to get this in to the final issue of 7, if possible).

FWIW, this implementation is completely guarded on TARGET_MACHO.

I made some comments about “maybe the generic code would care about 
similar things” because when debugging and tracking through stuff, as you
note below, it wasn’t obvious.

> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
>> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
>> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
>> 
>> - note, that rs6000_offsettable_memref_p () will not handle these so that
>>   would return early, producing the issue with unsatisfiable constraints.
>> 
>>  - I do wonder if that's also the case for some non-Darwin lo_sum cases.
>> 
>> (some things might be hard to detect, since the code will generally fall
>> back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>> less efficient than it could be).
> 
> I'm putting this on the Big List of things I may some day have time to
> look at ;-)
> 
>> * config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete
> 
> Full stop.
> 
>> +  /* We only care if the access(es) would cause a change to the high part.  
>> */
>> +  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
>> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
> 
> So this works because the "extra" part only is relevant for positive
> offsets.  Okay.  Tricky.
> 
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -6894,13 +6894,6 @@
>> ;; do the load 16-bits at a time.  We could do this by loading from memory,
>> ;; and this is even supposed to be faster, but it is simpler not to get
>> ;; integers in the TOC.
>> -(define_insn "movsi_low"
> 
> Should the preceding comment be moved elsewhere / changed / deleted?

It seemed to be a comment about the following code - or something that should
have been deleted long ago - it mentions the TOC, which Darwin does not use
so not a Darwin-related thing.

Happy to do a separate patch to delete it that’s desired.

cheers
Iain





Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Segher Boessenkool
On Thu, Oct 17, 2019 at 01:46:41PM +1030, Alan Modra wrote:
> On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
> > On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> > > For 32bit cases this isn't a problem since we can load/store to unaligned
> > > addresses using D-mode insns.
> > 
> > Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
> > six years ago or so...  Alan, do you remember?  It required some assembler
> > work IIRC.
> 
> Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
> Rather than defining a whole series of _DS (and _DQ!) relocs, the
> linker inspects the instruction being relocated and complains if the
> relocation would modify opcode bits.  See is_insn_ds_form in
> bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

Ah right, that was it.  So it uses the D reloc but with DS or DQ
restrictions.  Gotcha.  For the compiler this is just as if those DS and
DQ relocs *do* exist.

> > I'll have another looke through this (esp. the generic part) when I'm fresh
> > awake (but not before coffee!).  Alan, can you have a look as well please?
> 
> It looks reasonable to me.

Thanks Alan!


Segher


Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Segher Boessenkool
Okay for trunk.  For backports maybe wait a bit longer than usual?  So ask
again in two weeks, maybe?  I know it's important for the darwin port, but
the generic part is a little scary.

On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> 2) To resolve this we need to extend the handling of the  mem_operand_gpr to
> allow looking through Mach-O PIC UNSPECs in the lo_sum cases.
> 
>  - note, that rs6000_offsettable_memref_p () will not handle these so that
>would return early, producing the issue with unsatisfiable constraints.
> 
>   - I do wonder if that's also the case for some non-Darwin lo_sum cases.
> 
> (some things might be hard to detect, since the code will generally fall
>  back to doing " la  Rx xxx@l ; ld Ry 0(Rx)" so it won't fail - just be
>  less efficient than it could be).

I'm putting this on the Big List of things I may some day have time to
look at ;-)

>   * config/rs6000/darwin.md (movdi_low, movsi_low_st): Delete

Full stop.

> +  /* We only care if the access(es) would cause a change to the high part.  
> */
> +  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
> +  return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);

So this works because the "extra" part only is relevant for positive
offsets.  Okay.  Tricky.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6894,13 +6894,6 @@
>  ;; do the load 16-bits at a time.  We could do this by loading from memory,
>  ;; and this is even supposed to be faster, but it is simpler not to get
>  ;; integers in the TOC.
> -(define_insn "movsi_low"

Should the preceding comment be moved elsewhere / changed / deleted?


Segher


Re: [PATCH] Fix an omission in the recent strlen optimization (PR tree-optimization/92056)

2019-10-17 Thread Richard Biener
On Thu, 17 Oct 2019, Jakub Jelinek wrote:

> Hi!
> 
> objsz computation has two modes.  One is a cheap one that doesn't handle
> SSA_NAMEs and is used in say random builtin folding.  The other is
> where compute_builtin_object_size is called in between init_object_sizes ()
> and fini_object_sizes () calls, where those set up data structures and the
> call then handles SSA_NAMEs and caches results for them.  This second mode
> is what the objsz pass uses, and in some cases the strlen pass too, but in
> other cases it doesn't.  While fini_object_sizes (); is called
> unconditionally at the end of strlen pass, init_object_sizes () is only
> called when the strlen pass calls handle_printf_call which calls
> get_destination_size; after that, any strcmp etc. takes advantage of that,
> but if no *printf is encountered, it will not.  Note, init_object_sizes ()
> can be called multiple times and does nothing the second and following time,
> unless fini_object_sizes () has been called.  And fini_object_sizes () can
> be called multiple times and doesn't do anything if since the last
> fini_object_sizes () no init_object_sizes () has been called.
> 
> So, on the following testcase without the patch, we set the value range
> of the first strcmp call to ~[0, 0], because we determine the buffer holding
> the first operand is at most 7 bytes long, but the second operand is a
> string literal with 7 characters + terminating NUL, but on the second call
> we don't, because no sprintf has been called in the function (and processed
> before the call).
> 
> Fixed thusly, ok for trunk if it passes bootstrap/regtest?

OK.

Richard.

> 2019-10-17  Jakub Jelinek  
> 
>   PR tree-optimization/92056
>   * tree-ssa-strlen.c (determine_min_objsize): Call init_object_sizes
>   before calling compute_builtin_object_size.
> 
>   * gcc.dg/tree-ssa/pr92056.c: New test.
> 
> --- gcc/tree-ssa-strlen.c.jj  2019-10-17 00:18:09.851648007 +0200
> +++ gcc/tree-ssa-strlen.c 2019-10-17 10:19:19.546086865 +0200
> @@ -3462,6 +3462,8 @@ determine_min_objsize (tree dest)
>  {
>unsigned HOST_WIDE_INT size = 0;
>  
> +  init_object_sizes ();
> +
>if (compute_builtin_object_size (dest, 2, ))
>  return size;
>  
> --- gcc/testsuite/gcc.dg/tree-ssa/pr92056.c.jj2019-10-17 
> 10:18:25.819907087 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr92056.c   2019-10-17 10:17:56.201359262 
> +0200
> @@ -0,0 +1,36 @@
> +/* PR tree-optimization/92056 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "strcmp \\(" "optimized" } } */
> +
> +void bar (int, char *);
> +
> +int
> +foo (int x, char *y, const char *z)
> +{
> +  char *a;
> +  __builtin_sprintf (y, z);
> +  if (x == 3)
> +a = __builtin_malloc (5);
> +  else if (x == 7)
> +a = __builtin_malloc (6);
> +  else
> +a = __builtin_malloc (7);
> +  bar (x, a);
> +  return __builtin_strcmp (a, "abcdefg") != 0;
> +}
> +
> +int
> +baz (int x)
> +{
> +  char *a;
> +  if (x == 3)
> +a = __builtin_malloc (5);
> +  else if (x == 7)
> +a = __builtin_malloc (6);
> +  else
> +a = __builtin_malloc (7);
> +  bar (x, a);
> +  return __builtin_strcmp (a, "abcdefg") != 0;
> +}
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

RE: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-17 Thread Yuliang Wang
Hi Richard,

Thanks for the suggestions, updated.

Regards,
Yuliang


gcc/ChangeLog:

2019-10-17  Yuliang Wang  

* config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3)
(aarch64_sve2_nor, aarch64_sve2_nand)
(aarch64_sve2_bsl, aarch64_sve2_nbsl)
(aarch64_sve2_bsl1n, aarch64_sve2_bsl2n):
New combine patterns.
* config/aarch64/iterators.md (BSL_DUP): New int iterator for the above.
(bsl_1st, bsl_2nd, bsl_dup, bsl_mov): Attributes for the above.
* config/aarch64/aarch64.h (AARCH64_ISA_SVE2_SHA3): New ISA flag macro.
(TARGET_SVE2_SHA3): New CPU target.

gcc/testsuite/ChangeLog:

2019-10-17  Yuliang Wang  

* gcc.target/aarch64/sve2/eor3_1.c: New test.
* gcc.target/aarch64/sve2/eor3_2.c: As above.
* gcc.target/aarch64/sve2/nlogic_1.c: As above.
* gcc.target/aarch64/sve2/nlogic_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_1.c: As above.
* gcc.target/aarch64/sve2/bitsel_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_3.c: As above.
* gcc.target/aarch64/sve2/bitsel_4.c: As above.


diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 
b018f5b0bc9b51edf831e2571f0f5a9af2210829..08d5214a3debb9e9a0796da0af3009ed3ff55774
 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -142,3 +142,189 @@
   }
 )
 
+;; Unpredicated 3-way exclusive OR.
+(define_insn "*aarch64_sve2_eor3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, ?")
+   (xor:SVE_I
+ (xor:SVE_I
+   (match_operand:SVE_I 1 "register_operand" "0, w, w, w")
+   (match_operand:SVE_I 2 "register_operand" "w, 0, w, w"))
+ (match_operand:SVE_I 3 "register_operand" "w, w, 0, w")))]
+  "TARGET_SVE2_SHA3"
+  "@
+  eor3\t%0.d, %0.d, %2.d, %3.d
+  eor3\t%0.d, %0.d, %1.d, %3.d
+  eor3\t%0.d, %0.d, %1.d, %2.d
+  movprfx\t%0, %1\;eor3\t%0.d, %0.d, %2.d, %3.d"
+  [(set_attr "movprfx" "*,*,*,yes")]
+)
+
+;; Use NBSL for vector NOR.
+(define_insn_and_rewrite "*aarch64_sve2_nor"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
+ [(match_operand 3)
+  (and:SVE_I
+(not:SVE_I
+  (match_operand:SVE_I 1 "register_operand" "%0, w"))
+(not:SVE_I
+  (match_operand:SVE_I 2 "register_operand" "w, w")))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE2"
+  "@
+  nbsl\t%0.d, %0.d, %2.d, %0.d
+  movprfx\t%0, %1\;nbsl\t%0.d, %0.d, %2.d, %0.d"
+  "&& !CONSTANT_P (operands[3])"
+  {
+operands[3] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Use NBSL for vector NAND.
+(define_insn_and_rewrite "*aarch64_sve2_nand"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
+ [(match_operand 3)
+  (ior:SVE_I
+(not:SVE_I
+  (match_operand:SVE_I 1 "register_operand" "%0, w"))
+(not:SVE_I
+  (match_operand:SVE_I 2 "register_operand" "w, w")))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE2"
+  "@
+  nbsl\t%0.d, %0.d, %2.d, %2.d
+  movprfx\t%0, %1\;nbsl\t%0.d, %0.d, %2.d, %2.d"
+  "&& !CONSTANT_P (operands[3])"
+  {
+operands[3] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Unpredicated bitwise select.
+;; N.B. non-canonical equivalent form due to expand pass.
+;; (op3 ? bsl_mov : bsl_dup) == (((bsl_mov ^ bsl_dup) & op3) ^ bsl_dup)
+(define_insn "*aarch64_sve2_bsl"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (xor:SVE_I
+ (and:SVE_I
+   (xor:SVE_I
+ (match_operand:SVE_I 1 "register_operand" ", w")
+ (match_operand:SVE_I 2 "register_operand" ", w"))
+   (match_operand:SVE_I 3 "register_operand" "w, w"))
+ (match_dup BSL_DUP)))]
+  "TARGET_SVE2"
+  "@
+  bsl\t%0.d, %0.d, %.d, %3.d
+  movprfx\t%0, %\;bsl\t%0.d, %0.d, %.d, %3.d"
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Unpredicated bitwise inverted select.
+;; N.B. non-canonical equivalent form.
+;; (~(op3 ? bsl_mov : bsl_dup)) == (~(((bsl_mov ^ bsl_dup) & op3) ^ bsl_dup))
+(define_insn_and_rewrite "*aarch64_sve2_nbsl"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
+   (unspec:SVE_I
+ [(match_operand 4)
+  (not:SVE_I
+(xor:SVE_I
+  (and:SVE_I
+(xor:SVE_I
+  (match_operand:SVE_I 1 "register_operand" ", w")
+  (match_operand:SVE_I 2 "register_operand" ", w"))
+(match_operand:SVE_I 3 "register_operand" "w, w"))
+  (match_dup BSL_DUP)))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE2"
+  "@
+  nbsl\t%0.d, %0.d, %.d, %3.d
+  movprfx\t%0, %\;nbsl\t%0.d, %0.d, %.d, %3.d"
+  "&& !CONSTANT_P (operands[4])"
+  {
+operands[4] = CONSTM1_RTX (mode);
+  }
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Unpredicated bitwise select with inverted first operand.
+;; N.B. non-canonical 

[PATCH] Fix an omission in the recent strlen optimization (PR tree-optimization/92056)

2019-10-17 Thread Jakub Jelinek
Hi!

objsz computation has two modes.  One is a cheap one that doesn't handle
SSA_NAMEs and is used in say random builtin folding.  The other is
where compute_builtin_object_size is called in between init_object_sizes ()
and fini_object_sizes () calls, where those set up data structures and the
call then handles SSA_NAMEs and caches results for them.  This second mode
is what the objsz pass uses, and in some cases the strlen pass too, but in
other cases it doesn't.  While fini_object_sizes (); is called
unconditionally at the end of strlen pass, init_object_sizes () is only
called when the strlen pass calls handle_printf_call which calls
get_destination_size; after that, any strcmp etc. takes advantage of that,
but if no *printf is encountered, it will not.  Note, init_object_sizes ()
can be called multiple times and does nothing the second and following time,
unless fini_object_sizes () has been called.  And fini_object_sizes () can
be called multiple times and doesn't do anything if since the last
fini_object_sizes () no init_object_sizes () has been called.

So, on the following testcase without the patch, we set the value range
of the first strcmp call to ~[0, 0], because we determine the buffer holding
the first operand is at most 7 bytes long, but the second operand is a
string literal with 7 characters + terminating NUL, but on the second call
we don't, because no sprintf has been called in the function (and processed
before the call).

Fixed thusly, ok for trunk if it passes bootstrap/regtest?

2019-10-17  Jakub Jelinek  

PR tree-optimization/92056
* tree-ssa-strlen.c (determine_min_objsize): Call init_object_sizes
before calling compute_builtin_object_size.

* gcc.dg/tree-ssa/pr92056.c: New test.

--- gcc/tree-ssa-strlen.c.jj2019-10-17 00:18:09.851648007 +0200
+++ gcc/tree-ssa-strlen.c   2019-10-17 10:19:19.546086865 +0200
@@ -3462,6 +3462,8 @@ determine_min_objsize (tree dest)
 {
   unsigned HOST_WIDE_INT size = 0;
 
+  init_object_sizes ();
+
   if (compute_builtin_object_size (dest, 2, ))
 return size;
 
--- gcc/testsuite/gcc.dg/tree-ssa/pr92056.c.jj  2019-10-17 10:18:25.819907087 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr92056.c 2019-10-17 10:17:56.201359262 
+0200
@@ -0,0 +1,36 @@
+/* PR tree-optimization/92056 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "strcmp \\(" "optimized" } } */
+
+void bar (int, char *);
+
+int
+foo (int x, char *y, const char *z)
+{
+  char *a;
+  __builtin_sprintf (y, z);
+  if (x == 3)
+a = __builtin_malloc (5);
+  else if (x == 7)
+a = __builtin_malloc (6);
+  else
+a = __builtin_malloc (7);
+  bar (x, a);
+  return __builtin_strcmp (a, "abcdefg") != 0;
+}
+
+int
+baz (int x)
+{
+  char *a;
+  if (x == 3)
+a = __builtin_malloc (5);
+  else if (x == 7)
+a = __builtin_malloc (6);
+  else
+a = __builtin_malloc (7);
+  bar (x, a);
+  return __builtin_strcmp (a, "abcdefg") != 0;
+}

Jakub


[PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-10-17 Thread Feng Xue OS
IPA does not allow constant propagation on parameter that is used to control
function recursion. 

recur_fn (i)
{
  if ( !terminate_recursion (i))
{
  ...  
  recur_fn (i + 1);
  ...
}
  ...
}

This patch is composed to enable multi-versioning for self-recursive function,
and versioning copies is limited by a specified option.

Feng
---
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 045072e02ec..6255a766e4d 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -229,7 +229,9 @@ public:
   inline bool set_contains_variable ();
   bool add_value (valtype newval, cgraph_edge *cs,
  ipcp_value *src_val = NULL,
- int src_idx = 0, HOST_WIDE_INT offset = -1);
+ int src_idx = 0, HOST_WIDE_INT offset = -1,
+ ipcp_value **val_pos_p = NULL,
+ bool unlimited = false);
   void print (FILE * f, bool dump_sources, bool dump_benefits);
 };
 
@@ -1579,22 +1581,37 @@ allocate_and_init_ipcp_value 
(ipa_polymorphic_call_context source)
 /* Try to add NEWVAL to LAT, potentially creating a new ipcp_value for it.  CS,
SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
meaning.  OFFSET -1 means the source is scalar and not a part of an
-   aggregate.  */
+   aggregate.  If non-NULL, VAL_POS_P specifies position in value list,
+   after which newly created ipcp_value will be inserted, and it is also
+   used to record address of the added ipcp_value before function returns.
+   UNLIMITED means whether value count should not exceed the limit given
+   by PARAM_IPA_CP_VALUE_LIST_SIZE.  */
 
 template 
 bool
 ipcp_lattice::add_value (valtype newval, cgraph_edge *cs,
  ipcp_value *src_val,
- int src_idx, HOST_WIDE_INT offset)
+ int src_idx, HOST_WIDE_INT offset,
+ ipcp_value **val_pos_p,
+ bool unlimited)
 {
   ipcp_value *val;
 
+  if (val_pos_p)
+{
+  for (val = values; val && val != *val_pos_p; val = val->next);
+  gcc_checking_assert (val);
+}
+
   if (bottom)
 return false;
 
   for (val = values; val; val = val->next)
 if (values_equal_for_ipcp_p (val->value, newval))
   {
+   if (val_pos_p)
+ *val_pos_p = val;
+
if (ipa_edge_within_scc (cs))
  {
ipcp_value_source *s;
@@ -1609,7 +1626,8 @@ ipcp_lattice::add_value (valtype newval, 
cgraph_edge *cs,
return false;
   }
 
-  if (values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
+  if (!unlimited
+  && values_count == PARAM_VALUE (PARAM_IPA_CP_VALUE_LIST_SIZE))
 {
   /* We can only free sources, not the values themselves, because sources
 of other values in this SCC might point to them.   */
@@ -1623,6 +1641,9 @@ ipcp_lattice::add_value (valtype newval, 
cgraph_edge *cs,
}
}
 
+  if (val_pos_p)
+   *val_pos_p = NULL;
+
   values = NULL;
   return set_to_bottom ();
 }
@@ -1630,8 +1651,54 @@ ipcp_lattice::add_value (valtype newval, 
cgraph_edge *cs,
   values_count++;
   val = allocate_and_init_ipcp_value (newval);
   val->add_source (cs, src_val, src_idx, offset);
-  val->next = values;
-  values = val;
+  if (val_pos_p)
+{
+  val->next = (*val_pos_p)->next;
+  (*val_pos_p)->next = val;
+  *val_pos_p = val;
+}
+  else
+{
+  val->next = values;
+  values = val;
+}
+
+  return true;
+}
+
+/* Return true, if a ipcp_value VAL is orginated from parameter value of
+   self-feeding recursive function by applying non-passthrough arithmetic
+   transformation.  */
+
+static bool
+self_recursively_generated_p (ipcp_value *val)
+{
+  class ipa_node_params *info = NULL;
+
+  for (ipcp_value_source *src = val->sources; src; src = src->next)
+{
+  cgraph_edge *cs = src->cs;
+
+  if (!src->val || cs->caller != cs->callee->function_symbol ()
+ || src->val == val)
+   return false;
+
+  if (!info)
+   info = IPA_NODE_REF (cs->caller);
+
+  class ipcp_param_lattices *plats = ipa_get_parm_lattices (info,
+   src->index);
+  ipcp_lattice *src_lat = src->offset == -1 ? >itself
+ : plats->aggs;
+  ipcp_value *src_val;
+
+  for (src_val = src_lat->values; src_val && src_val != val;
+  src_val = src_val->next);
+
+  if (!src_val)
+   return false;
+}
+
   return true;
 }
 
@@ -1649,20 +1716,72 @@ propagate_vals_across_pass_through (cgraph_edge *cs, 
ipa_jump_func *jfunc,
   ipcp_value *src_val;
   bool ret = false;
 
-  /* Do not create new values when propagating within an SCC because if there
- are arithmetic functions with circular dependencies, there is infinite
- number of them and we would just make lattices bottom.  If this condition
- is ever relaxed 

Re: Add a constant_range_value_p function (PR 92033)

2019-10-17 Thread Richard Sandiford
Christophe Lyon  writes:
> On Tue, 15 Oct 2019 at 12:36, Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On October 14, 2019 2:32:43 PM GMT+02:00, Richard Sandiford 
>> >  wrote:
>> >>Richard Biener  writes:
>> >>> On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
>> >>>  wrote:
>> 
>>  The range-tracking code has a pretty hard-coded assumption that
>>  is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
>>  ADDR_EXPR".  It seems better to add a predicate specifically for
>>  that rather than contiually fight cases in which it can't handle
>>  other invariants.
>> 
>>  Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> >>>
>> >>> ICK.  Nobody is going to remember this new restriction and
>> >>> constant_range_value_p reads like constant_value_range_p ;)
>> >>>
>> >>> Btw, is_gimple_invariant_address shouldn't have been exported,
>> >>> it's only use could have used is_gimple_min_invariant...
>> >>
>> >>What do you think we should do instead?
>> >
>> > Just handle POLY_INT_CST in a few place to quickly enough drop to varying.
>>
>> OK, how about this?  Aldy's suggestion would be fine by me too,
>> but I thought I'd try this first given Aldy's queasiness about
>> allowing POLY_INT_CSTs further in.
>>
>> The main case in which this gives useful ranges is a lower bound
>> of A + B * X becoming A when B >= 0.  E.g.:
>>
>>   (1) [32 + 16X, 100] -> [32, 100]
>>   (2) [32 + 16X, 32 + 16X] -> [32, MAX]
>>
>> But the same thing can be useful for the upper bound with negative
>> X coefficients.
>>
>> We can revisit this later if keeping a singleton range for (2)
>> would be better.
>>
>> Tested as before.
>>
>> Richard
>>
>>
> Hi Richard,
>
> This patch did improve aarch64 results quite a lot, however, there are
> still a few failures that used to pass circa r276650:
> gcc.target/aarch64/sve/loop_add_6.c -march=armv8.2-a+sve
> scan-assembler \\tfsub\\tz[0-9]+\\.d, p[0-7]/m
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
> z[0-9]+\\.b\\n 1
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
> z[0-9]+\\.h\\n 1
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tand\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
> z[0-9]+\\.b\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tand\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tand\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
> z[0-9]+\\.h\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tand\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\teor\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
> z[0-9]+\\.b\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\teor\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\teor\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
> z[0-9]+\\.h\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\teor\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tfadd\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 1
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tfadd\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
> z[0-9]+\\.h\\n 1
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\tfadd\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 1
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\torr\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
> z[0-9]+\\.b\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\torr\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
> z[0-9]+\\.d\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\torr\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
> z[0-9]+\\.h\\n 2
> gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> scan-assembler-times \\torr\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
> z[0-9]+\\.s\\n 2
> gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfsub\\tz[0-9]+\\.d, p[0-7]/m 1
> gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
> scan-assembler-times \\tfsub\\tz[0-9]+\\.s, 

Re: Add a constant_range_value_p function (PR 92033)

2019-10-17 Thread Christophe Lyon
On Tue, 15 Oct 2019 at 12:36, Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On October 14, 2019 2:32:43 PM GMT+02:00, Richard Sandiford 
> >  wrote:
> >>Richard Biener  writes:
> >>> On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
> >>>  wrote:
> 
>  The range-tracking code has a pretty hard-coded assumption that
>  is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
>  ADDR_EXPR".  It seems better to add a predicate specifically for
>  that rather than contiually fight cases in which it can't handle
>  other invariants.
> 
>  Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>>
> >>> ICK.  Nobody is going to remember this new restriction and
> >>> constant_range_value_p reads like constant_value_range_p ;)
> >>>
> >>> Btw, is_gimple_invariant_address shouldn't have been exported,
> >>> it's only use could have used is_gimple_min_invariant...
> >>
> >>What do you think we should do instead?
> >
> > Just handle POLY_INT_CST in a few place to quickly enough drop to varying.
>
> OK, how about this?  Aldy's suggestion would be fine by me too,
> but I thought I'd try this first given Aldy's queasiness about
> allowing POLY_INT_CSTs further in.
>
> The main case in which this gives useful ranges is a lower bound
> of A + B * X becoming A when B >= 0.  E.g.:
>
>   (1) [32 + 16X, 100] -> [32, 100]
>   (2) [32 + 16X, 32 + 16X] -> [32, MAX]
>
> But the same thing can be useful for the upper bound with negative
> X coefficients.
>
> We can revisit this later if keeping a singleton range for (2)
> would be better.
>
> Tested as before.
>
> Richard
>
>
Hi Richard,

This patch did improve aarch64 results quite a lot, however, there are
still a few failures that used to pass circa r276650:
gcc.target/aarch64/sve/loop_add_6.c -march=armv8.2-a+sve
scan-assembler \\tfsub\\tz[0-9]+\\.d, p[0-7]/m
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tadd\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
z[0-9]+\\.b\\n 1
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tadd\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
z[0-9]+\\.d\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tadd\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
z[0-9]+\\.h\\n 1
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tadd\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
z[0-9]+\\.s\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tand\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
z[0-9]+\\.b\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tand\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
z[0-9]+\\.d\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tand\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
z[0-9]+\\.h\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tand\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
z[0-9]+\\.s\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\teor\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
z[0-9]+\\.b\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\teor\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
z[0-9]+\\.d\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\teor\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
z[0-9]+\\.h\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\teor\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
z[0-9]+\\.s\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tfadd\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
z[0-9]+\\.d\\n 1
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tfadd\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
z[0-9]+\\.h\\n 1
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\tfadd\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
z[0-9]+\\.s\\n 1
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\torr\\tz[0-9]+\\.b, p[0-7]/m, z[0-9]+\\.b,
z[0-9]+\\.b\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\torr\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d,
z[0-9]+\\.d\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\torr\\tz[0-9]+\\.h, p[0-7]/m, z[0-9]+\\.h,
z[0-9]+\\.h\\n 2
gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
scan-assembler-times \\torr\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.s,
z[0-9]+\\.s\\n 2
gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
scan-assembler-times \\tfsub\\tz[0-9]+\\.d, p[0-7]/m 1
gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
scan-assembler-times \\tfsub\\tz[0-9]+\\.s, p[0-7]/m 1
gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
scan-assembler-times \\tsub\\tz[0-9]+\\.b, p[0-7]/m 1
gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
scan-assembler-times 

Re: [PATCH, Fortran] Fix Automatics in equivalence test cases was Re: Automatics in Equivalences failures

2019-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2019 at 08:43:29AM +0100, Mark Eggleston wrote:
> Please find attached patch for additional test cases. Are these sufficient?
> If so, OK to commit?
> 
> Change log:
> 
>     Mark Eggleston 
> 
>     * gfortran.dg/auto_in_equiv_3.f90: New test.
>     * gfortran.dg/auto_in_equiv_4.f90: New test.
>     * gfortran.dg/auto_in_equiv_5.f90: New test.
>     * gfortran.dg/auto_in_equiv_6.f90: New test.
>     * gfortran.dg/auto_in_equiv_7.f90: New test.

Ok, with small nits:

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/auto_in_equiv_4.f90
> @@ -0,0 +1,18 @@
> +! { dg-do compile }
> +! { dg-options "-fdec-static -fno-automatic -fdump-tree-original" }
> +!
> +! Neither of the local variable have the auotmatic attribute so they

s/auotmatic/automatic/

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/auto_in_equiv_5.f90
> @@ -0,0 +1,18 @@
> +! { dg-do compile }
> +! { dg-options "-fdump-tree-original" }
> +!
> +! Neither of the local variable have the auotmatic attribute so they
> +! not be allocated on the stack.

Likewise.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/auto_in_equiv_6.f90
> @@ -0,0 +1,18 @@
> +! { dg-do compile }
> +! { dg-options "-fdec-static -fdump-tree-original" }
> +!
> +! Neither of the local variable have the auotmatic attribute so they
> +! not be allocated on the stack.

Likewise.

Jakub


Re: [PATCH] i386: Add clear_ratio to processor_costs

2019-10-17 Thread Richard Biener
On Thu, Oct 17, 2019 at 8:47 AM Uros Bizjak  wrote:
>
> On Wed, Oct 16, 2019 at 5:06 PM H.J. Lu  wrote:
> >
> > i386.h has
> >
> >  #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
> >
> > It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
> > to processor_costs, sets it to the minimum of 6 and move_ratio in all
> > cost models and defines CLEAR_RATIO with clear_ratio.
> >
> > * config/i386/i386.h (processor_costs): Add clear_ratio.
> > (CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
> > * config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
> > of 6 and move_ratio in all cost models.
> >
> > OK for trunk?
>
> LGTM. Are these numbers backed by some benchmark results?

Note CLEAR_RATIO is also used by simplification of static
initializers and thus changing it can have "interesting" effects
(on the testsuite).

Richard.

> Thanks,
> Uros.
>
> >
> > --
> > H.J.


Re: [PATCH] Communicate lto-wrapper and ld through a file

2019-10-17 Thread Richard Biener
On Wed, Oct 16, 2019 at 7:46 PM Giuliano Belinassi
 wrote:
>
> Hi,
>
> Previously, the lto-wrapper communicates with ld by creating a pipe from
> lto-wrapper's stdout to ld's stdin. This patch uses a temporary file for
> this communication, releasing stdout to be used for debugging.

I'm not sure it is a good idea on its own.  Then you have to consider that
the lto-plugin is used to drive different GCC versions (and thus lto-wrappers)
and you are breaking compatibility with older versions which makes it
really not an option.

There's stderr for debugging...

> I've run a full testsuite and bootstrapped LTO in a linux x86_64, and found
> no issues so far. Do I need to write a testcase for this feature?
>
> Giuliano.
>
> gcc/ChangeLog
> 2019-10-16  Giuliano Belinassi  
>
> * lto-wrapper.c (STATIC_LEN): New macro.
> (to_ld): New.
> (find_crtofftable): Print to file to_ld.
> (find_ld_list_file): New.
> (main): Check if to_ld is valid or is stdout.
>
> gcc/libiberty
> 2019-10-16  Giuliano Belinassi  
>
> * pex-unix.c (pex_unix_exec_child): check for PEX_KEEP_STD_IO flag.
> (to_ld): New.
>
> gcc/include
> 19-10-16  Giuliano Belinassi  
>
> * libiberty.h (PEX_KEEP_STD_IO): New macro.
>
> gcc/lto-plugin
> 2019-10-16  Giuliano Belinassi  
>
> * lto-plugin.c (exec_lto_wrapper): Replace pipe from stdout to temporary
> file, and pass its name in argv.
>


Re: [PATCH, Fortran] Fix Automatics in equivalence test cases was Re: Automatics in Equivalences failures

2019-10-17 Thread Mark Eggleston


On 04/10/2019 11:39, Jakub Jelinek wrote:

On Wed, Oct 02, 2019 at 03:31:53PM +0100, Mark Eggleston wrote:

It was there because the code base has -fno-automatic for procedures are
that aren't recursive and the -frecursive was used because the recursive
routines don't have the recursive keyword. (Legacy issues...). It is to
ensure that the use of -fno-automatic does not affect local variables in
recursive routine.

Ah, seems while -frecursive -fno-automatic emits a warning that can be read
that as if -frecursive is ignored, it just forces all (unless explicitly 
automatic)
variables to be saved, but still allows recursion.  One just has to manually
arrange for the variables that shouldn't be saved to be automatic.
So, I'm not against that test.

If a procedure is marked with the keyword recursive all its local variables
are always automatic by default. If a procedure is not marked with the
keyword recursive its variables are not automatic when -fno-automatic is
used unless they have the automatic attribute specified directly or acquired
via the equivalence statement, recursion can still be effected using
-frecursive provided all the variables used by the recursion are automatic.


+! { dg-warning "Flag '-fno-automatic' overwrites '-frecursive'" "warning" { 
target *-*-* } 0 }

I think you want one runtime test (e.g. the one you wrote in
automatics_in_equivalence_1.f90)

The errors checked there only check that automatic can be used that's
already covered by dec_static_3.f90 and automatic can't be used in
equivalence with the use of -fdec-static.

   and the rest just dg-do compile tests that
will check the original or gimple dumps to verify what happened in addition
to checking diagnostics (none) from the compilation, one testing the
default, another -fno-automatic, but in both cases without -frecursive.

Not sure what you mean, here. Are these already handled by the tests for
-fdec-static?

I meant add a couple of new tests.  One like:
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
! { dg-final { scan-tree-dump "static union" "original" } }

subroutine foo
   integer, save :: a, b
   equivalence (a, b)
   a = 5
   if (b.ne.5) stop 1
end subroutine

which verifies that the equivalence is saved in that case.  Another one
like:
! { dg-do compile }
! { dg-options "-fdec-static -fdump-tree-original -fno-automatic" }
! { dg-final { scan-tree-dump-not "static union" "original" } }

subroutine foo
   integer, automatic :: a
   integer :: b
   equivalence (a, b)
   a = 5
   if (b.ne.5) stop 1
end subroutine

another one perhaps with swapped ", automatic" between a and b.
Another two without the -fno-automatic?

Jakub


Please find attached patch for additional test cases. Are these 
sufficient? If so, OK to commit?


Change log:

    Mark Eggleston 

    * gfortran.dg/auto_in_equiv_3.f90: New test.
    * gfortran.dg/auto_in_equiv_4.f90: New test.
    * gfortran.dg/auto_in_equiv_5.f90: New test.
    * gfortran.dg/auto_in_equiv_6.f90: New test.
    * gfortran.dg/auto_in_equiv_7.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From 55f816f614e22d2733b31d5e3a246caaa2809044 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Thu, 10 Oct 2019 15:01:11 +0100
Subject: [PATCH 1/2] New Automatics in equivalences test cases

---
 gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 | 19 +++
 gcc/testsuite/gfortran.dg/auto_in_equiv_4.f90 | 18 ++
 gcc/testsuite/gfortran.dg/auto_in_equiv_5.f90 | 18 ++
 gcc/testsuite/gfortran.dg/auto_in_equiv_6.f90 | 18 ++
 gcc/testsuite/gfortran.dg/auto_in_equiv_7.f90 | 19 +++
 5 files changed, 92 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_6.f90
 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_7.f90

diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 b/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90
new file mode 100644
index 000..35f6e0fa27d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! { dg-options "-fdec-static -fdump-tree-original" }
+!
+
+subroutine foo
+  integer, automatic :: a
+  integer :: b
+  equivalence (a, b)
+  a = 5
+  if (b.ne.5) stop 1
+end subroutine
+
+! { dg-final { scan-tree-dump "union" "original" } }
+! { dg-final { scan-tree-dump-not "static union" "original" } }
+! { dg-final { scan-tree-dump "integer\\(kind=4\\) a" "original" } }
+! { dg-final { scan-tree-dump-not "static integer\\(kind=4\\) a" "original" } }
+! { dg-final { scan-tree-dump "integer\\(kind=4\\) b" "original" } }
+! { dg-final { scan-tree-dump-not "static integer\\(kind=4\\) b" "original" } }
+
diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_4.f90 

Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-17 Thread Iain Sandoe
Thanks for the reviews, Segher, Alan,

Alan Modra  wrote:

> On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
>> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
>>> For 32bit cases this isn't a problem since we can load/store to unaligned
>>> addresses using D-mode insns.
>> 
>> Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
>> six years ago or so...  Alan, do you remember?  It required some assembler
>> work IIRC.
> 
> Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
> Rather than defining a whole series of _DS (and _DQ!) relocs, the
> linker inspects the instruction being relocated and complains if the
> relocation would modify opcode bits.  See is_insn_ds_form in
> bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

Ah, that makes a lot of sense - and ld64 also makes this check (for the same
underlying reason, I am sure - we are short of reloc space anyway in Mach-O).

>> I'll have another looke through this (esp. the generic part) when I'm fresh
>> awake (but not before coffee!).  Alan, can you have a look as well please?
> 
> It looks reasonable to me.

So, OK for trunk?
(and backports after some bake time)?

thanks
Iain



[PATCH] Fix PR92129

2019-10-17 Thread Richard Biener


Committed as obvious.

Richard.

2019-10-17  Richard Biener  

PR tree-optimization/92129
* tree-vect-loop.c (vectorizable_reduction): Also fail
on GIMPLE_SINGLE_RHS.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277091)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5772,6 +5772,7 @@ vectorizable_reduction (stmt_vec_info st
   break;
 
 case GIMPLE_UNARY_RHS:
+case GIMPLE_SINGLE_RHS:
   return false;
 
 default:


Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2019 at 03:15:28AM -0400, Aldy Hernandez wrote:
> On 10/16/19 3:46 AM, Jakub Jelinek wrote:
> > On Wed, Oct 16, 2019 at 03:38:38AM -0400, Aldy Hernandez wrote:
> > > Would you take care of this, or shall I?
> > 
> > Will defer to you, I have quite a lot of stuff on my plate ATM.
> > 
> > Jakub
> > 
> 
> No problem.  Thanks for your analysis though.
> 
> The attached patch fixes the regression.
> 
> OK pending tests?

> gcc/
> 
>   PR tree-optimization/92131
>   * tree-vrp.c (value_range_base::dump): Display +INF for both
>   pointers and integers when appropriate.
> 
> gcc/testsuite/
> 
>   * gcc.dg/tree-ssa/evrp4.c: Check for +INF instead of -1.

LGTM.

Jakub


[C++ PATCH] Fix handling of location wrappers in constexpr evaluation (PR c++/92015)

2019-10-17 Thread Jakub Jelinek
Hi!

As written in the PR, location wrappers are stripped by
cxx_eval_constant_expression as the first thing it does after dealing with
jump_target.  The problem is that when this function is called on a
CONSTRUCTOR, is TREE_CONSTANT and reduced_constant_expression_p (which
allows location wrappers around constants), we don't recurse on the
CONSTRUCTOR elements and so nothing strips them away.  Then when
cxx_eval_component_reference or bit_field_ref we actually can run into those
location wrappers and the callers assume they don't appear anymore.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

I admit I'm not 100% sure about the second case which will call the function
to do the stripping for all the elts until the one found.  The alternative
would be:
   tree bitpos = bit_position (field);
   if (bitpos == start && DECL_SIZE (field) == TREE_OPERAND (t, 1))
-   return value;
+   {
+ STRIP_ANY_LOCATION_WRAPPER (value);
+ return value;
+   }
   if (TREE_CODE (TREE_TYPE (field)) == INTEGER_TYPE
- && TREE_CODE (value) == INTEGER_CST
  && tree_fits_shwi_p (bitpos)
  && tree_fits_shwi_p (DECL_SIZE (field)))
{
  HOST_WIDE_INT bit = tree_to_shwi (bitpos);
  HOST_WIDE_INT sz = tree_to_shwi (DECL_SIZE (field));
  HOST_WIDE_INT shift;
  if (bit >= istart && bit + sz <= istart + isize)
{
+ STRIP_ANY_LOCATION_WRAPPER (value);
+ if (TREE_CODE (value) != INTEGER_CST)
+   continue;
but then we'd do those tree_fits_shwi_p/tree_to_shwi.  Guess
a micro-optimization without a clear winner.

2019-10-17  Jakub Jelinek  

PR c++/92015
* constexpr.c (cxx_eval_component_reference, cxx_eval_bit_field_ref):
Use STRIP_ANY_LOCATION_WRAPPER on CONSTRUCTOR elts.

* g++.dg/cpp0x/constexpr-92015.C: New test.

--- gcc/cp/constexpr.c.jj   2019-10-16 09:30:57.300112739 +0200
+++ gcc/cp/constexpr.c  2019-10-16 17:06:03.943539476 +0200
@@ -2887,7 +2887,10 @@ cxx_eval_component_reference (const cons
  : field == part)
{
  if (value)
-   return value;
+   {
+ STRIP_ANY_LOCATION_WRAPPER (value);
+ return value;
+   }
  else
/* We're in the middle of initializing it.  */
break;
@@ -2977,6 +2980,7 @@ cxx_eval_bit_field_ref (const constexpr_
   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (whole), i, field, value)
 {
   tree bitpos = bit_position (field);
+  STRIP_ANY_LOCATION_WRAPPER (value);
   if (bitpos == start && DECL_SIZE (field) == TREE_OPERAND (t, 1))
return value;
   if (TREE_CODE (TREE_TYPE (field)) == INTEGER_TYPE
--- gcc/testsuite/g++.dg/cpp0x/constexpr-92015.C.jj 2019-10-16 
17:16:44.204871319 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-92015.C2019-10-16 
17:14:14.773127884 +0200
@@ -0,0 +1,7 @@
+// PR c++/92015
+// { dg-do compile { target c++11 } }
+
+struct S1 { char c[6] {'h', 'e', 'l', 'l', 'o', 0}; };
+struct S2 { char c[6] = "hello"; };
+static_assert (S1{}.c[0] == 'h', "");
+static_assert (S2{}.c[0] == 'h', "");

Jakub


Re: [PATCH] Fix objsz ICE (PR tree-optimization/92056)

2019-10-17 Thread Richard Biener
On Thu, 17 Oct 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following bug has been introduced when cond_expr_object_size has been
> added in 2007.  We want to treat a COND_EXPR like a PHI with 2 arguments,
> and PHI is handled in a loop that breaks if the lhs value is unknown, and
> then does the if (TREE_CODE (arg) == SSA_NAME) merge_object_sizes else
> expr_object_size which is used even in places that handle just a single
> operand (with the lhs value initialized to the opposite value of unknown
> first).  At least expr_object_size asserts that the lhs value is not
> unknown at the start.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

> 2019-10-17  Jakub Jelinek  
> 
>   PR tree-optimization/92056
>   * tree-object-size.c (cond_expr_object_size): Return early if then_
>   processing resulted in unknown size.
> 
>   * gcc.c-torture/compile/pr92056.c: New test.
> 
> --- gcc/tree-object-size.c.jj 2019-10-05 09:35:14.895967464 +0200
> +++ gcc/tree-object-size.c2019-10-16 15:34:11.414769994 +0200
> @@ -903,6 +903,9 @@ cond_expr_object_size (struct object_siz
>else
>  expr_object_size (osi, var, then_);
>  
> +  if (object_sizes[object_size_type][varno] == unknown[object_size_type])
> +return reexamine;
> +
>if (TREE_CODE (else_) == SSA_NAME)
>  reexamine |= merge_object_sizes (osi, var, else_, 0);
>else
> --- gcc/testsuite/gcc.c-torture/compile/pr92056.c.jj  2019-10-16 
> 15:42:56.042848440 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr92056.c 2019-10-16 
> 15:42:41.595066602 +0200
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/92056 */
> +
> +const char *d;
> +
> +void
> +foo (int c, char *e, const char *a, const char *b)
> +{
> +  switch (c)
> +{
> +case 33:
> +  for (;; d++)
> +if (__builtin_strcmp (b ? : "", d))
> +  return;
> +  break;
> +case 4:
> +  __builtin_sprintf (e, a);
> +}
> +}
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] Fix ifcombine ICE (PR tree-optimization/92115)

2019-10-17 Thread Richard Biener
On Thu, 17 Oct 2019, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because ifcombine_ifandif attempts to set
> GIMPLE_COND condition to a condition that might trap (which is allowed
> only in COND_EXPR/VEC_COND_EXPR).
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

> 2019-10-17  Jakub Jelinek  
> 
>   PR tree-optimization/92115
>   * tree-ssa-ifcombine.c (ifcombine_ifandif): Force condition into
>   temporary if it could trap.
> 
>   * gcc.dg/pr92115.c: New test.
> 
> --- gcc/tree-ssa-ifcombine.c.jj   2019-09-20 12:25:42.232479343 +0200
> +++ gcc/tree-ssa-ifcombine.c  2019-10-16 10:05:06.826174814 +0200
> @@ -599,6 +599,12 @@ ifcombine_ifandif (basic_block inner_con
>t = canonicalize_cond_expr_cond (t);
>if (!t)
>   return false;
> +  if (!is_gimple_condexpr_for_cond (t))
> + {
> +   gsi = gsi_for_stmt (inner_cond);
> +   t = force_gimple_operand_gsi_1 (, t, is_gimple_condexpr_for_cond,
> +   NULL, true, GSI_SAME_STMT);
> + }
>gimple_cond_set_condition_from_tree (inner_cond, t);
>update_stmt (inner_cond);
>  
> --- gcc/testsuite/gcc.dg/pr92115.c.jj 2019-10-16 10:07:35.923924633 +0200
> +++ gcc/testsuite/gcc.dg/pr92115.c2019-10-16 10:06:56.831514691 +0200
> @@ -0,0 +1,10 @@
> +/* PR tree-optimization/92115 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fexceptions -ffinite-math-only -fnon-call-exceptions 
> -fsignaling-nans -fno-signed-zeros" } */
> +
> +void
> +foo (double x)
> +{
> +  if (x == 0.0 && !__builtin_signbit (x))
> +__builtin_abort ();
> +}
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-17 Thread Aldy Hernandez

On 10/16/19 3:46 AM, Jakub Jelinek wrote:

On Wed, Oct 16, 2019 at 03:38:38AM -0400, Aldy Hernandez wrote:

Would you take care of this, or shall I?


Will defer to you, I have quite a lot of stuff on my plate ATM.

Jakub



No problem.  Thanks for your analysis though.

The attached patch fixes the regression.

OK pending tests?
gcc/

	PR tree-optimization/92131
	* tree-vrp.c (value_range_base::dump): Display +INF for both
	pointers and integers when appropriate.

gcc/testsuite/

	* gcc.dg/tree-ssa/evrp4.c: Check for +INF instead of -1.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp4.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp4.c
index ba2f6b9b430..6710e6b5eff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/evrp4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp4.c
@@ -17,4 +17,4 @@ int bar (struct st *s)
   foo (>a);
 }
 
-/* { dg-final { scan-tree-dump "\\\[1B, -1B\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "\\\[1B, \\+INF\\\]" "evrp" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 21910b36518..8d4f16e9e1f 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -428,8 +428,8 @@ value_range_base::dump (FILE *file) const
 
   fprintf (file, ", ");
 
-  if (INTEGRAL_TYPE_P (ttype)
-	  && vrp_val_is_max (max ())
+  if (supports_type_p (ttype)
+	  && vrp_val_is_max (max (), true)
 	  && TYPE_PRECISION (ttype) != 1)
 	fprintf (file, "+INF");
   else


Order symbols before section copying in the lto streamer

2019-10-17 Thread Jan Hubicka
Hi,
this patch orders symbols where we copy sections to match the order
of files in the command line.  This optimizes streaming process since we
are not opening and closing files randomly and also we read them more
sequentially.  This saves some kernel time though I think more can be
done if we avoid doing pair of mmap/unmap for every file section we
read.

We also read files in random order in ipa-cp and during devirt.
I guess also summary streaming can be refactored to stream all summaries
for a given file instead of reading one sumarry from all files.

Bootstrapped/regtested x86_64-linux, plan to commit it this afternoon if
there are no complains.

Honza

* lto-common.c (lto_file_finalize): Add order attribute.
(lto_create_files_from_ids): Pass order.
(lto_file_read): UPdate call of lto_create_files_from_ids.
* lto-streamer-out.c (output_constructor): Push CTORS_OUT timevar.
(cmp_symbol_files): New.
(lto_output): Copy sections in file order.
* lto-streamer.h (lto_file_decl_data): Add field order.
Index: lto/lto-common.c
===
--- lto/lto-common.c(revision 276986)
+++ lto/lto-common.c(working copy)
@@ -2177,7 +2177,8 @@ create_subid_section_table (struct lto_s
 /* Read declarations and other initializations for a FILE_DATA.  */
 
 static void
-lto_file_finalize (struct lto_file_decl_data *file_data, lto_file *file)
+lto_file_finalize (struct lto_file_decl_data *file_data, lto_file *file,
+  int order)
 {
   const char *data;
   size_t len;
@@ -2195,6 +2196,7 @@ lto_file_finalize (struct lto_file_decl_
 
   file_data->renaming_hash_table = lto_create_renaming_table ();
   file_data->file_name = file->filename;
+  file_data->order = order;
 #ifdef ACCEL_COMPILER
   lto_input_mode_table (file_data);
 #else
@@ -2231,9 +2233,9 @@ lto_file_finalize (struct lto_file_decl_
 
 static int
 lto_create_files_from_ids (lto_file *file, struct lto_file_decl_data 
*file_data,
-  int *count)
+  int *count, int order)
 {
-  lto_file_finalize (file_data, file);
+  lto_file_finalize (file_data, file, order);
   if (symtab->dump_file)
 fprintf (symtab->dump_file,
 "Creating file %s with sub id " HOST_WIDE_INT_PRINT_HEX "\n",
@@ -2285,9 +2287,10 @@ lto_file_read (lto_file *file, FILE *res
   lto_resolution_read (file_ids, resolution_file, file);
 
   /* Finalize each lto file for each submodule in the merged object.  */
+  int order = 0;
   for (file_data = file_list.first; file_data != NULL;
file_data = file_data->next)
-lto_create_files_from_ids (file, file_data, count);
+lto_create_files_from_ids (file, file_data, count, order++);
 
   splay_tree_delete (file_ids);
   htab_delete (section_hash_table);
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 276986)
+++ lto-streamer-out.c  (working copy)
@@ -2217,6 +2224,7 @@ output_constructor (struct varpool_node
 fprintf (streamer_dump_file, "\nStreaming constructor of %s\n",
 node->name ());
 
+  timevar_push (TV_IPA_LTO_CTORS_OUT);
   ob = create_output_block (LTO_section_function_body);
 
   clear_line_info (ob);
@@ -2236,6 +2244,7 @@ output_constructor (struct varpool_node
   if (streamer_dump_file)
 fprintf (streamer_dump_file, "Finished streaming %s\n",
 node->name ());
+  timevar_pop (TV_IPA_LTO_CTORS_OUT);
 }
 
 
@@ -2416,6 +2425,30 @@ produce_lto_section ()
   destroy_output_block (ob);
 }
 
+/* Compare symbols to get them sorted by filename (to optimize streaming)  */
+
+static int
+cmp_symbol_files (const void *pn1, const void *pn2)
+{
+  const symtab_node *n1 = *(const symtab_node * const *)pn1;
+  const symtab_node *n2 = *(const symtab_node * const *)pn2;
+
+  int file_order1 = n1->lto_file_data ? n1->lto_file_data->order : -1;
+  int file_order2 = n2->lto_file_data ? n2->lto_file_data->order : -1;
+
+  /* Order files same way as they appeared in the command line to reduce
+ seeking while copying sections.  */
+  if (file_order1 != file_order2)
+return file_order1 - file_order2;
+
+  /* Order within static library.  */
+  if (n1->lto_file_data && n1->lto_file_data->id != n2->lto_file_data->id)
+return n1->lto_file_data->id - n2->lto_file_data->id;
+
+  /* And finaly order by the definition order.  */
+  return n1->order - n2->order;
+}
+
 /* Main entry point from the pass manager.  */
 
 void
@@ -2424,8 +2457,9 @@ lto_output (void)
   struct lto_out_decl_state *decl_state;
   bitmap output = NULL;
   bitmap_obstack output_obstack;
-  int i, n_nodes;
+  unsigned int i, n_nodes;
   lto_symtab_encoder_t encoder = lto_get_out_decl_state 
()->symtab_node_encoder;
+  auto_vec symbols_to_copy;
 
   prune_offload_funcs ();
 
@@ -2441,32 +2475,17 @@ lto_output (void)
   produce_lto_section ();
 
   n_nodes = lto_symtab_encoder_size 

[PATCH] Fix objsz ICE (PR tree-optimization/92056)

2019-10-17 Thread Jakub Jelinek
Hi!

The following bug has been introduced when cond_expr_object_size has been
added in 2007.  We want to treat a COND_EXPR like a PHI with 2 arguments,
and PHI is handled in a loop that breaks if the lhs value is unknown, and
then does the if (TREE_CODE (arg) == SSA_NAME) merge_object_sizes else
expr_object_size which is used even in places that handle just a single
operand (with the lhs value initialized to the opposite value of unknown
first).  At least expr_object_size asserts that the lhs value is not
unknown at the start.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2019-10-17  Jakub Jelinek  

PR tree-optimization/92056
* tree-object-size.c (cond_expr_object_size): Return early if then_
processing resulted in unknown size.

* gcc.c-torture/compile/pr92056.c: New test.

--- gcc/tree-object-size.c.jj   2019-10-05 09:35:14.895967464 +0200
+++ gcc/tree-object-size.c  2019-10-16 15:34:11.414769994 +0200
@@ -903,6 +903,9 @@ cond_expr_object_size (struct object_siz
   else
 expr_object_size (osi, var, then_);
 
+  if (object_sizes[object_size_type][varno] == unknown[object_size_type])
+return reexamine;
+
   if (TREE_CODE (else_) == SSA_NAME)
 reexamine |= merge_object_sizes (osi, var, else_, 0);
   else
--- gcc/testsuite/gcc.c-torture/compile/pr92056.c.jj2019-10-16 
15:42:56.042848440 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr92056.c   2019-10-16 
15:42:41.595066602 +0200
@@ -0,0 +1,18 @@
+/* PR tree-optimization/92056 */
+
+const char *d;
+
+void
+foo (int c, char *e, const char *a, const char *b)
+{
+  switch (c)
+{
+case 33:
+  for (;; d++)
+if (__builtin_strcmp (b ? : "", d))
+  return;
+  break;
+case 4:
+  __builtin_sprintf (e, a);
+}
+}

Jakub


[PATCH] Fix ifcombine ICE (PR tree-optimization/92115)

2019-10-17 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because ifcombine_ifandif attempts to set
GIMPLE_COND condition to a condition that might trap (which is allowed
only in COND_EXPR/VEC_COND_EXPR).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2019-10-17  Jakub Jelinek  

PR tree-optimization/92115
* tree-ssa-ifcombine.c (ifcombine_ifandif): Force condition into
temporary if it could trap.

* gcc.dg/pr92115.c: New test.

--- gcc/tree-ssa-ifcombine.c.jj 2019-09-20 12:25:42.232479343 +0200
+++ gcc/tree-ssa-ifcombine.c2019-10-16 10:05:06.826174814 +0200
@@ -599,6 +599,12 @@ ifcombine_ifandif (basic_block inner_con
   t = canonicalize_cond_expr_cond (t);
   if (!t)
return false;
+  if (!is_gimple_condexpr_for_cond (t))
+   {
+ gsi = gsi_for_stmt (inner_cond);
+ t = force_gimple_operand_gsi_1 (, t, is_gimple_condexpr_for_cond,
+ NULL, true, GSI_SAME_STMT);
+   }
   gimple_cond_set_condition_from_tree (inner_cond, t);
   update_stmt (inner_cond);
 
--- gcc/testsuite/gcc.dg/pr92115.c.jj   2019-10-16 10:07:35.923924633 +0200
+++ gcc/testsuite/gcc.dg/pr92115.c  2019-10-16 10:06:56.831514691 +0200
@@ -0,0 +1,10 @@
+/* PR tree-optimization/92115 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fexceptions -ffinite-math-only -fnon-call-exceptions 
-fsignaling-nans -fno-signed-zeros" } */
+
+void
+foo (double x)
+{
+  if (x == 0.0 && !__builtin_signbit (x))
+__builtin_abort ();
+}

Jakub


[committed] Add testcase for already fixed omp simd ICE (PR fortran/87752)

2019-10-17 Thread Jakub Jelinek
Hi!

This has been fixed already with r273096, but it is useful to have also a
Fortran testcase.  Committed to trunk as obvious.

2019-10-17  Jakub Jelinek  

PR fortran/87752
* gfortran.dg/gomp/pr87752.f90: New test.

--- gcc/testsuite/gfortran.dg/gomp/pr87752.f90.jj   2019-10-16 
10:19:09.584439754 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr87752.f90  2019-10-16 10:19:02.343549209 
+0200
@@ -0,0 +1,12 @@
+! PR fortran/87752
+! { dg-do compile }
+! { dg-additional-options "-Ofast" }
+
+subroutine foo (n, u, v)
+  integer :: n
+  real, pointer :: u(:), v(:)
+  !$omp parallel do simd
+  do i = 1, n
+u(:) = v(:)
+  end do
+end

Jakub


Re: [PATCH] [x86] Add detection of Icelake Client and Server

2019-10-17 Thread Uros Bizjak
> gcc/ChangeLog:
> * config/i386/driver-i386.c (host_detect_local_cpu): Handle
>   icelake-client and icelake-server.
> * testsuite/gcc.target/i386/builtin_target.c (check_intel_cpu_model):
>   Verify icelakes are detected correctly.
>
> libgcc/ChangeLog:
> * config/i386/cpuinfo.c (get_intel_cpu): Handle icelake-client
>   and icelake-server.

Please also state how you bootstrapped and tested the patch.

Otherwise OK.

Thanks,
Uros.


Re: [PATCH] i386: Add clear_ratio to processor_costs

2019-10-17 Thread Uros Bizjak
On Wed, Oct 16, 2019 at 5:06 PM H.J. Lu  wrote:
>
> i386.h has
>
>  #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
>
> It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
> to processor_costs, sets it to the minimum of 6 and move_ratio in all
> cost models and defines CLEAR_RATIO with clear_ratio.
>
> * config/i386/i386.h (processor_costs): Add clear_ratio.
> (CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
> * config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
> of 6 and move_ratio in all cost models.
>
> OK for trunk?

LGTM. Are these numbers backed by some benchmark results?

Thanks,
Uros.

>
> --
> H.J.