[PATCH] sccvn: Handle bitfields in push_partial_def [PR93582]

2020-02-21 Thread Jakub Jelinek
Hi!

The following patch adds support for bitfields to push_partial_def.
Previously pd.offset and pd.size were counted in bytes and maxsizei
in bits, now everything is counted in bits.

Not really sure how much of the further code can be outlined and moved, e.g. 
the full def and partial def code doesn't have pretty much anything in
common (the partial defs case basically have some load bit range and a set
of store bit ranges that at least partially overlap and we need to handle
all the different cases, like negative pd.offset or non-negative, little vs. 
bit endian, size so small that we need to preserve original bits on both
sides of the byte, size that fits or is too large.
Perhaps the storing of some value into a middle of existing buffer (i.e.
what push_partial_def now does in the loop) could, but the candidate for
sharing would be most likely store-merging rather than the other spots in
sccvn, and I think it is better not to touch store-merging at this stage.

Yes, I've thought about trying to do everything in place, but the code is
quite hard to understand and get right already now and if we tried to do the
optimize on the fly, it would need more special cases and would for gcov
coverage need more testcases to cover it.  Most of the time the sizes will
be small.  Furthermore, for bitfields native_encode_expr stores actually
number of bytes in the mode and not say actual bitsize rounded up to bytes,
so it wouldn't be just a matter of saving/restoring bytes at the start and
end, but we might need even 7 further bytes e.g. for __int128 bitfields.
Perhaps we could have just a fast path for the case where everything is byte
aligned and (for integral types the mode bitsize is equal to the size too)?

Bootstrapped/regtested on {x86_64,i686,powerpc64{,le}}-linux, on
powerpc64-linux with -m32/-m64 testing, on {x86_64,i686}-linux
bootstrap/regtests together I've also gathered statistics, where the
new code (where something in the partial defs handling wasn't byte
aligned/sized and still found a constant) triggered 5266 times,
attached is sort | uniq -c | sort -n list of those, i.e. first column
is number of times it hit in the same file/function/wordsize (across
the 2 bootstraps/regtests), second is BITS_PER_WORD, third is filename
and last is current_function_name ().

Ok for trunk?

2020-02-22  Jakub Jelinek  

PR tree-optimization/93582
* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Consider
pd.offset and pd.size to be counted in bits rather than bytes, add
support for maxsizei that is not a multiple of BITS_PER_UNIT and
handle bitfield stores and loads.
(vn_reference_lookup_3): Don't call ranges_known_overlap_p with
uncomparable quantities - bytes vs. bits.  Allow push_partial_def
on offsets/sizes that aren't multiple of BITS_PER_UNIT and adjust
pd.offset/pd.size to be counted in bits rather than bytes.
Formatting fix.  Rename shadowed len variable to buflen.

* gcc.dg/tree-ssa/pr93582-4.c: New test.
* gcc.dg/tree-ssa/pr93582-5.c: New test.
* gcc.dg/tree-ssa/pr93582-6.c: New test.
* gcc.dg/tree-ssa/pr93582-7.c: New test.
* gcc.dg/tree-ssa/pr93582-8.c: New test.

--- gcc/tree-ssa-sccvn.c.jj 2020-02-18 08:52:26.156952846 +0100
+++ gcc/tree-ssa-sccvn.c2020-02-18 15:44:53.446837342 +0100
@@ -1774,7 +1774,11 @@ vn_walk_cb_data::push_partial_def (const
   const HOST_WIDE_INT bufsize = 64;
   /* We're using a fixed buffer for encoding so fail early if the object
  we want to interpret is bigger.  */
-  if (maxsizei > bufsize * BITS_PER_UNIT)
+  if (maxsizei > bufsize * BITS_PER_UNIT
+  || CHAR_BIT != 8
+  || BITS_PER_UNIT != 8
+  /* Not prepared to handle PDP endian.  */
+  || BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
 return (void *)-1;
 
   bool pd_constant_p = (TREE_CODE (pd.rhs) == CONSTRUCTOR
@@ -1854,41 +1858,39 @@ vn_walk_cb_data::push_partial_def (const
   /* Now we have merged newr into the range tree.  When we have covered
  [offseti, sizei] then the tree will contain exactly one node which has
  the desired properties and it will be 'r'.  */
-  if (!known_subrange_p (0, maxsizei / BITS_PER_UNIT, r->offset, r->size))
+  if (!known_subrange_p (0, maxsizei, r->offset, r->size))
 /* Continue looking for partial defs.  */
 return NULL;
 
   /* Now simply native encode all partial defs in reverse order.  */
   unsigned ndefs = partial_defs.length ();
   /* We support up to 512-bit values (for V8DFmode).  */
-  unsigned char buffer[bufsize];
+  unsigned char buffer[bufsize + 1];
+  unsigned char this_buffer[bufsize + 1];
   int len;
 
+  memset (buffer, 0, bufsize + 1);
+  unsigned needed_len = ROUND_UP (maxsizei, BITS_PER_UNIT) / BITS_PER_UNIT;
   while (!partial_defs.is_empty ())
 {
   pd_data pd = partial_defs.pop ();
-  gcc_checking_assert (pd.offset < bufsize);
+  unsigned int amnt;
   if (TREE_CODE (pd.rhs) =

Re: [PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-21 Thread Feng Xue OS
It is a good solution.

Thanks,
Feng

From: Martin Jambor 
Sent: Saturday, February 22, 2020 2:15 AM
To: Feng Xue OS; Tamar Christina; Jan Hubicka; gcc-patches@gcc.gnu.org
Cc: nd
Subject: Re: [PATCH] Fix bug in recursiveness check for function to be cloned 
(ipa/pr93707)

Hi,

On Thu, Feb 20 2020, Feng Xue OS wrote:
> This is a simpel and nice fix, but could suppress some CP opportunities for
> self-recursive call.  Using the test case as example, the first should be a
> for-all-context clone, and the call "recur_fn (i, 1, depth + 1)" is replaced 
> with
> a newly created recursive node. Thus, in the next round of CP iteration, the
> way to do CP for the 2nd arugment "1" is blocked, because its coming edge
> can not pass check by cgraph_edge_brings_value_p().
>
> +__attribute__((noinline)) static int recur_fn (int i, int j, int depth)
> +{
> +   if (depth > 10)
> + return 1;
> +
> +   data[i + j]++;
> +
> +   if (depth & 3)
> + recur_fn (i, 1, depth + 1);
> +   else
> + recur_fn (i, j & 1, depth + 1);
> +
> +   foo();
> +
> +   return i + j;
> +}
> +
> +int caller (int v, int depth)
> +{
> +  recur_fn (1, v, depth);
> +
> +  return 0;
> +}
>
>
>>However, I believe that his approach mostly papers over a bug that
>>happens earlier, specifically that cgraph_edge_brings_value_p returned
>>true for the self-recursive edge from all-context clone to itself, even
>>though when evaluating the second argument.  We assume that all context
>>clones get at least as many constants as any other potential clone, but
>>that does not work for self-recursive edges with pass-through parameters
>>that that just pass along the received constant.
>
> The following check on value in cgraph_edge_brings_value_p could ensure
> whether the value can arrive the dest node or not. If the value is a constant
> without source, as above example "1", this is allowed. Otherwise, code snippet
> enclosed by "if (caller_info->ipcp_orig_node)" could capture for-all-context 
> clone.

there has not been any "following check" in your email but I believe I
understand what you mean, and I added such check to my patch so that the
edge carrying the non-pass through jump function was accepted by the
cgraph_edge_brings_value_p predicate.

However, that lead to the same assert in
find_more_scalar_values_for_callers_subset because on that edge it tried
to compute the depth + 1 value before it had any value to calculate it
from.

So after staring at the problem for another while I realized that the
users self_recursive_pass_through_p and
self_recursive_agg_pass_through_p would be OK if it returned false for
self-recursive calls from/to a node which is already a clone - clones
have their known constant values set at the point of their creation -
and that doing so avoids this problem.  So that is what the patch below
does.  I have still kept the cgraph_edge_brings_value_p hunks too, so
that edges are collected reliably.

Bootstrapped and tested on an x86_64-linux, LTO bootstrap underway.

What do you think?

Martin


2020-02-21  Martin Jambor  
Feng Xue  

PR ipa/93707
* ipa-cp.c (same_node_or_its_all_contexts_clone_p): Replaced with
new function calls_same_node_or_its_all_contexts_clone_p.
(cgraph_edge_brings_value_p): Use it.
(cgraph_edge_brings_value_p): Likewise.
(self_recursive_pass_through_p): Return false if caller is a clone.
(self_recursive_agg_pass_through_p): Likewise.

testsuite/
* gcc.dg/ipa/pr93707.c: New test.
---
 gcc/ChangeLog  | 11 +
 gcc/ipa-cp.c   | 38 +-
 gcc/testsuite/ChangeLog|  6 +
 gcc/testsuite/gcc.dg/ipa/pr93707.c | 31 
 4 files changed, 69 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93707.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c481407de9..a965cae4f07 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2020-02-21  Martin Jambor  
+   Feng Xue  
+
+   PR ipa/93707
+   * ipa-cp.c (same_node_or_its_all_contexts_clone_p): Replaced with
+   new function calls_same_node_or_its_all_contexts_clone_p.
+   (cgraph_edge_brings_value_p): Use it.
+   (cgraph_edge_brings_value_p): Likewise.
+   (self_recursive_pass_through_p): Return false if caller is a clone.
+   (self_recursive_agg_pass_through_p): Likewise.
+
 2020-02-17  Richard Biener  

PR c/86134
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4f5b72e6994..aa228df1204 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4033,15 +4033,24 @@ edge_clone_summary_t::duplicate (cgraph_edge *src_edge, 
cgraph_edge *dst_edge,
   src_data->next_clone = dst_edge;
 }

-/* Return true is NODE is DEST or its clone for all contexts.  */
+/* Return true is CS calls DEST or its clone for all contexts, except for
+   self-recursive nodes in which it has to be

Re: [PATCH] issues with configure --enable-checking option

2020-02-21 Thread Sandra Loosemore

On 2/11/20 7:46 AM, Roman Zhuykov wrote:

[snip]
Since I have to ask again about backports, I've decided to make few more
steps and with Alexander's help created new patch which rewords the
whole option description and covers items (3), (4) and (8).  CCing Jakub
and Richard as release managers, also ask Sandra to take a quick look if
new wording is alright.  New patch suits all active branches.  OK for
10-9-8 ?


I'm not an expert on the content, but the new text reads OK except for 
using future tense to describe current behavior.  Namely:



+requested complexity.  This will slow down the compiler and may only work


s/will slow/slows/


+must be explicitly requested.  Disabling assertions will make the compiler
+and runtime slightly faster but increase the risk of undetected internal


s/will make/makes/
s/increase/increases/


+option is specified the stage1 compiler will be built with @samp{yes}


s/will be/is/

OK with those fixes applied.

-Sandra


Re: [PATCH] avoid user-constructible types in reshape_init_array (PR 90938)

2020-02-21 Thread Martin Sebor

On 2/17/20 10:54 AM, Jason Merrill wrote:

On 2/14/20 9:06 PM, Martin Sebor wrote:

On 2/13/20 3:59 PM, Jason Merrill wrote:

On 2/12/20 9:21 PM, Martin Sebor wrote:

On 2/11/20 5:28 PM, Jason Merrill wrote:

On 2/11/20 9:00 PM, Martin Sebor wrote:

r270155, committed in GCC 9, introduced a transformation that strips
redundant trailing zero initializers from array initializer lists in
order to support string literals as template arguments.

The transformation neglected to consider the case of array elements
of trivial class types with user-defined conversion ctors and either
defaulted or deleted default ctors.  (It didn't occur to me that
those qualify as trivial types despite the user-defined ctors.)  As
a result, some valid initialization expressions are rejected when
the explicit zero-initializers are dropped in favor of the (deleted)
default ctor,


Hmm, a type with only a deleted default constructor is not trivial, 
that should have been OK already.


For Marek's test case:
   struct A { A () == delete; A (int) = delete; };

trivial_type_p() returns true (as does __is_trivial (A) in both GCC
and Clang).

[class.prop] says that

   A trivial class is a class that is trivially copyable and has one
   or more default constructors (10.3.4.1), all of which are either
   trivial or deleted and at least one of which is not deleted.

That sounds like A above is not trivial because it doesn't have
at least one default ctor that's not deleted, but both GCC and
Clang say it is.  What am I missing?  Is there some other default
constructor hiding in there that I don't know about?


and others are eliminated in favor of the defaulted
ctor instead of invoking a user-defined conversion ctor, leading to
wrong code.


This seems like a bug in type_initializer_zero_p; it shouldn't 
treat 0 as a zero initializer for any class.


That does fix it, and it seems like the right solution to me as well.
Thanks for the suggestion.  I'm a little unsure about the condition
I put in place though.

Attached is an updated patch rested on x86_64-linux.



-  if (sized_array_p && trivial_type_p (elt_type))
+  if (sized_array_p
+  && trivial_type_p (elt_type)
+  && !TYPE_NEEDS_CONSTRUCTING (elt_type))


Do we still need this change?  If so, please add a comment about the 
trivial_type_p bug.


The change isn't needed with my patch as it was, but it would still
be needed with the changes you suggested (even then it doesn't help
with the problem I describe below).




   if (TREE_CODE (init) != CONSTRUCTOR

I might change this to

  if (!CP_AGGREGATE_TYPE_P (type))
    return initializer_zerop (init);


This behaves differently in C++ 2a mode (the whole condition evaluates
to true for class A below) than in earlier modes and causes a failure
in the new array55.C test:


True, my suggestion above does the wrong thing for non-aggregate classes.


+  /* A class can only be initialized by a non-class type if it has
+ a ctor that converts from that type.  Such classes are excluded
+ since their semantics are unknown.  */
+  if (RECORD_OR_UNION_TYPE_P (type)
+  && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (init)))
+    return false;


How about if (!SCALAR_TYPE_P (type)) here?

More broadly, it seems like doing this optimization here at all is 
questionable, since we don't yet know whether there's a valid conversion 
from the zero-valued initializer to the element type.  It would seem 
better to do it in process_init_constructor_array after the call to 
massage_init_elt, when we know the actual value of the element.


Yes, it seems that it might be better to do it there.  Attached
is a revised patch that implements your suggestion.  It's probably
more intrusive than you envisioned.  The stripping of the redundant
trailing initializers was straightforward.  Most of the rest of
the changes are only necessary to strip redundant initializers
for pointers to data members.

Martin

PS I'm uneasy about this patch this late in the cycle.  The bug I'm
fixing was introduced at the end of the last release, as a result
of a last minute patch not unlike this one.  It caused at least two
codegen regressions in all language modes.  I'd hate for this change
to have similar repercussions.

PR c++/90938 - Initializing array with {1} works but not {0}

gcc/cp/ChangeLog:

	PR c++/90938
	* decl.c (reshape_init_array_1): Avoid stripping redundant trailing
	zero initializers here...
	* typeck2.c (process_init_constructor_array): ...and instead strip
	them here.  Extend the range of same trailing implicit initializers
	to also include preceding explicit initializers.

gcc/testsuite/ChangeLog:

	PR c++/90938
	* g++.dg/init/array55.C: New test.
	* g++.dg/init/array56.C: New test.
	* g++.dg/cpp2a/nontype-class33.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 31a556a0a1f..b2259fc6f20 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6010,9 +6010,6 @@ reshape_init_array_1 (tree elt_type, tree max_index, reshape_iter *d,
 	max_index_c

[committed] Fix passing of homogeneous SFmode and DFmode on hppa

2020-02-21 Thread John David Anglin
The attached patch fixes a problem noted in the libffi testsuite.  This is a 
particular problem
on 32-bit hppa*-hpux*.

We pass homogeneous SFmode and DFmode aggregates in the general registers 
instead of the floating
point registers that are used to pass SFmode and DFmode values.  
ASM_DECLARE_FUNCTION_NAME did not
correctly indicate that the general registers were being used.  The HP linker 
adjusts the call and
return values to compensate for the registers being used.  Since the indication 
was wrong, this
resulted in passed and return values being clobbered.

I also adjusted the value size check in pa_function_value as it could accept 
arguments with a size
other than word or double word.

Tested on hppa2.0w-hp-hpux11.11 and hppa-unknown-linux-gnu.  Committed to 
trunk, gcc-9 and gcc-8.

Dave

2020-02-21  John David Anglin  

* gcc/config/pa/pa.c (pa_function_value): Fix check for word and
double-word size when handling aggregate return values.
* gcc/config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate
that homogeneous SFmode and DFmode aggregates are passed and returned
in general registers.

diff --git a/gcc/config/pa/pa.c b/gcc/config/pa/pa.c
index 24b88304637..a662de96ac9 100644
--- a/gcc/config/pa/pa.c
+++ b/gcc/config/pa/pa.c
@@ -9335,7 +9335,7 @@ pa_function_value (const_tree valtype,
   HOST_WIDE_INT valsize = int_size_in_bytes (valtype);

   /* Handle aggregates that fit exactly in a word or double word.  */
-  if ((valsize & (UNITS_PER_WORD - 1)) == 0)
+  if (valsize == UNITS_PER_WORD || valsize == 2 * UNITS_PER_WORD)
return gen_rtx_REG (TYPE_MODE (valtype), 28);

   if (TARGET_64BIT)
diff --git a/gcc/config/pa/som.h b/gcc/config/pa/som.h
index 95c3bd238fe..505fdd65d79 100644
--- a/gcc/config/pa/som.h
+++ b/gcc/config/pa/som.h
@@ -98,8 +98,8 @@ do {  
\

 
 #define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL) \
-do { tree fntype = TREE_TYPE (TREE_TYPE (DECL));   \
-tree tree_type = TREE_TYPE (DECL); \
+do { tree tree_type = TREE_TYPE (DECL);\
+tree fntype = TREE_TYPE (tree_type);   \
 tree parm; \
 int i; \
 if (TREE_PUBLIC (DECL) || TARGET_GAS)  \
@@ -121,9 +121,11 @@ do {   
\
   {\
 tree type = DECL_ARG_TYPE (parm);  \
 machine_mode mode = TYPE_MODE (type);  \
-if (mode == SFmode && ! TARGET_SOFT_FLOAT) \
+if (!AGGREGATE_TYPE_P (type)   \
+&& mode == SFmode && ! TARGET_SOFT_FLOAT)  \
   fprintf (FILE, ",ARGW%d=FR", i++);   \
-else if (mode == DFmode && ! TARGET_SOFT_FLOAT)\
+else if (!AGGREGATE_TYPE_P (type)  \
+ && mode == DFmode && ! TARGET_SOFT_FLOAT) \
   {\
 if (i <= 2)\
   {\
@@ -158,9 +160,13 @@ do {   
\
 for (; i < 4; i++) \
   fprintf (FILE, ",ARGW%d=GR", i); \
   }\
-if (TYPE_MODE (fntype) == DFmode && ! TARGET_SOFT_FLOAT)   \
+if (!AGGREGATE_TYPE_P (fntype) \
+&& TYPE_MODE (fntype) == DFmode\
+&& ! TARGET_SOFT_FLOAT)\
   fputs (DFMODE_RETURN_STRING, FILE);  \
-else if (TYPE_MODE (fntype) == SFmode && ! TARGET_SOFT_FLOAT) \
+else if (!AGGREGATE_TYPE_P (fntype)\
+ && TYPE_MODE (fntype) == SFmode   \
+ && ! TARGET_SOFT_FLOAT)   \
   fputs (SFMODE_RETURN_STRING, FILE);  \
 else if (fntype != void_type_node) \
   fputs (",RTNVAL=GR", FILE);  \


Re: [PATCH v2] c++: Fix ICE with ill-formed array list-initialization [PR93712]

2020-02-21 Thread Marek Polacek
On Thu, Feb 20, 2020 at 12:28:42AM +, Jason Merrill wrote:
> On 2/19/20 7:30 PM, Marek Polacek wrote:
> > On Fri, Feb 14, 2020 at 09:12:58AM +0100, Jason Merrill wrote:
> > > On 2/13/20 8:56 PM, Marek Polacek wrote:
> > > > My P0388R4 patch changed build_array_conv to create an identity
> > > > conversion at the start of the conversion chain.
> > > 
> > > Hmm, an identity conversion of {} suggests that it has a type, which it
> > > doesn't in the language.  I'm not strongly against it, but what was the
> > > reason for this change?
> > 
> > There are two reasons:
> > 1) without it we couldn't get to the original expression at the start
> > of the conversion chain (saved in .u.expr), this is needed in compare_ics:
> > 10660   tree n1 = nelts_initialized_by_list_init (t1);
> > 10661   tree n2 = nelts_initialized_by_list_init (t2);
> > and nelts_initialized_by_list_init uses conv_get_original_expr for
> > arrays that have no dimensions.
> 
> Ah, ck_aggr and ck_list probably should have used u.expr like ck_identity
> and ck_ambig
> 
> > 2) struct conversion says
> > /* An implicit conversion sequence, in the sense of [over.best.ics].
> > The first conversion to be performed is at the end of the chain.
> > That conversion is always a cr_identity conversion.  */
> > and we were breaking that promise.
> 
> ...or, if we're going to enforce this, ck_ambig will need to change as well.
> 
> And build_aggr_conv and build_complex_conv will need adjusting (one way or
> another).

Changed my mind :-).  Turned out using u.expr for ck_aggr is a much smaller
change.  Moreover, that struct conversion talks about cr_identity, a rank,
not ck_identity.

ck_list uses u.list, so I'm not changing that.

-- >8 --
My P0388R4 patch changed build_array_conv to create an identity
conversion at the start of the conversion chain and now we crash
in convert_like_real:

 7457 case ck_identity:
 7458   if (BRACE_ENCLOSED_INITIALIZER_P (expr))
 7459 {
 7460   int nelts = CONSTRUCTOR_NELTS (expr);
 7461   if (nelts == 0)
 7462 expr = build_value_init (totype, complain);
 7463   else if (nelts == 1)
 7464 expr = CONSTRUCTOR_ELT (expr, 0)->value;
 7465   else
 7466 gcc_unreachable ();  // HERE
 7467 }

in a test like this

  int f (int const (&)[2])
  { return f({1, "M"}); }

Instead of creating a ck_identity at the start of the conversion chain,
so that conv_get_original_expr can be used with a ck_aggr, let's set
u.expr for a ck_aggr, and adjust next_conversion not to try to see
what's next in the chain if it gets a ck_aggr.

Bootstrapped/regtested on x86_64-linux, built cmcstl2 and Boost, ok for
trunk?

2020-02-21  Marek Polacek  

PR c++/93712 - ICE with ill-formed array list-initialization.
* call.c (next_conversion): Return NULL for ck_aggr.
(build_aggr_conv): Set u.expr instead of u.next.
(build_array_conv): Likewise.
(build_complex_conv): Likewise.
(conv_get_original_expr): Handle ck_aggr.

* g++.dg/cpp0x/initlist-array11.C: New test.
---
 gcc/cp/call.c | 17 +
 gcc/testsuite/g++.dg/cpp0x/initlist-array11.C | 10 ++
 2 files changed, 19 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-array11.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f47f96bf1c2..84230b9ecb8 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -117,13 +117,13 @@ struct conversion {
 /* The next conversion in the chain.  Since the conversions are
arranged from outermost to innermost, the NEXT conversion will
actually be performed before this conversion.  This variant is
-   used only when KIND is neither ck_identity, ck_ambig nor
+   used only when KIND is neither ck_identity, ck_aggr, ck_ambig nor
ck_list.  Please use the next_conversion function instead
of using this field directly.  */
 conversion *next;
 /* The expression at the beginning of the conversion chain.  This
-   variant is used only if KIND is ck_identity or ck_ambig.  You can
-   use conv_get_original_expr to get this expression.  */
+   variant is used only if KIND is ck_identity, ck_aggr, or ck_ambig.
+   You can use conv_get_original_expr to get this expression.  */
 tree expr;
 /* The array of conversions for an initializer_list, so this
variant is used only when KIN D is ck_list.  */
@@ -861,7 +861,8 @@ next_conversion (conversion *conv)
   if (conv == NULL
   || conv->kind == ck_identity
   || conv->kind == ck_ambig
-  || conv->kind == ck_list)
+  || conv->kind == ck_list
+  || conv->kind == ck_aggr)
 return NULL;
   return conv->u.next;
 }
@@ -1030,7 +1031,7 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
   c->rank = cr_exact;
   c->user_conv_p = true;
   c->check_narrowing = true;
-  c->u

RE: [PATCH][AARCH64] Fix for PR86901

2020-02-21 Thread Modi Mo via gcc-patches
> > Sounds good. I'll get those setup and running and will report back on
> > findings. What's the preferred way to measure codesize? I'm assuming
> > by default the code pages are aligned so smaller differences would need to 
> > trip
> over the boundary to actually show up.
> 
> You can use the size command on the binaries:
> 
> >size /bin/ls
>text  data bss dec hex filename
>  107271  20243472  112767   1b87f /bin/ls
> 
> As you can see it shows the text size in bytes. It is not rounded up to a 
> page, so it
> is an accurate measure of the codesize. Generally -O2 size is most useful to
> check (since that is what most applications build with), but -Ofast -flto can 
> be
> useful as well (the global inlining means you get instruction combinations 
> which
> appear less often with -O2).
> 
> Cheers,
> Wilco
Alrighty, I've got spec 2017 and spec 2006 setup and building. Using default 
configurations so -O2 in spec2006 and -O3 in spec2017. Testing the patch as 
last sent showed a 1% code size regression in spec 2017 perlbench which turned 
out to be a missing pattern for tbnz and all its variants:

(define_insn "*tb1"
  [(set (pc) (if_then_else
  (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" 
"r")  <--- only matches against zero_extract:DI
(const_int 1)
(match_operand 1
  "aarch64_simd_shift_imm_" "n"))

The zero extract now matching against other modes would generate a test + 
branch rather than the combined instruction which led to the code size 
regression. I've updated the patch so that tbnz etc. matches GPI and that 
brings code size down to <0.2% in spec2017 and <0.4% in spec2006.

Spec results are attached for reference.

@Wilco I've gotten instruction on my side to set up an individual contributor's 
license for the time being. Can you send me the necessary documents to make 
that happen? Thanks!

ChangeLog:
2020-02-21  Di Mo  

gcc/
* config/aarch64/aarch64.md: Add GPI modes to extsv/extv patterns. 
Allow tb1 pattern to match against zero_extract:GPI.
* expmed.c (extract_bit_field_using_extv): Change gen_lowpart to 
gen_lowpart_if_possible to avoid compiler assert building libgcc.
testsuite/
* gcc.target/aarch64/pr86901.c: Add new test.

Modi


pr86901.patch
Description: pr86901.patch


spec2006
Description: spec2006


spec2017
Description: spec2017


Re: [PATCH v2] RISC-V: Adjust floating point code gen for LTGT compare

2020-02-21 Thread Jim Wilson
On Fri, Feb 21, 2020 at 1:04 AM Kito Cheng  wrote:
> * config/riscv/riscv.c (riscv_emit_float_compare): Change the code gen
> for LTGT.
> (riscv_rtx_costs): Update cost model for LTGT.

Thanks.  This looks good to me.

Jim


Re: [PATCH] libstdc++: P0769R2 Add shift to

2020-02-21 Thread Patrick Palka
On Fri, 21 Feb 2020, Patrick Palka wrote:

> This patch adds std::shift_left and std::shift_right.  Alhough these are
> STL-style algos, they are nonetheless placed in  because
> they make use of some functions in the ranges namespace that are more easily
> reachable from  than from , namely
> ranges::next and ranges::swap_ranges.
> 
> This implementation of std::shift_right for non-bidirectional iterators 
> deviates
> from the reference implementation a bit.  The main difference is that this
> implementation is significantly simpler, and yet saves around n/2 additional
> iterator increments and n/2 iter_moves at the cost of around n/2 additional
> iter_swaps (where n is the shift amount).

On second thought, this simplification of shift_right is a not a good
idea because the objects that were shifted and that are no longer a part
of the new range do not end up in a moved-from state at the end of the
algorithm.  Here is a version of the patch that instead adds something
akin to the reference implementation and improves the tests to verify
this moved-from property of both algorithms.

-- >8 --

Subject: [PATCH] libstdc++: P0769R2 Add shift to 

This patch adds std::shift_left and std::shift_right as per P0769R2.  Alhough
these are STL-style algos, this patch places them in 
because they make use of some functions in the ranges namespace that are more
easily reachable from  than from , namely
ranges::next.  In order to place these algos in , we would need
to include  from  which would undesirably
increase the size of .

libstdc++-v3/ChangeLog:

P0769R2 Add shift to 
* include/bits/ranges_algo.h (shift_left, shift_right): New.
* testsuite/25_algorithms/shift_left/1.cc: New test.
* testsuite/25_algorithms/shift_right/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   |  92 
 .../testsuite/25_algorithms/shift_left/1.cc   | 104 ++
 .../testsuite/25_algorithms/shift_right/1.cc  | 103 +
 3 files changed, 299 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/shift_left/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/shift_right/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 7de1072abf0..7d7dbf04103 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3683,6 +3683,98 @@ namespace ranges
   inline constexpr __prev_permutation_fn prev_permutation{};
 
 } // namespace ranges
+
+  template
+constexpr ForwardIterator
+shift_left(ForwardIterator __first, ForwardIterator __last,
+  typename iterator_traits::difference_type __n)
+{
+  __glibcxx_assert(__n >= 0);
+  if (__n == 0)
+   return __last;
+
+  auto __mid = ranges::next(__first, __n, __last);
+  if (__mid == __last)
+   return __first;
+  return std::move(std::move(__mid), std::move(__last), 
std::move(__first));
+}
+
+  template
+constexpr ForwardIterator
+shift_right(ForwardIterator __first, ForwardIterator __last,
+   typename iterator_traits::difference_type __n)
+{
+  __glibcxx_assert(__n >= 0);
+  if (__n == 0)
+   return __first;
+
+  using _Cat = iterator_traits::iterator_category;
+  if constexpr (derived_from<_Cat, bidirectional_iterator_tag>)
+   {
+ auto __mid = ranges::next(__last, -__n, __first);
+ if (__mid == __first)
+   return __last;
+
+ return std::move_backward(std::move(__first), std::move(__mid),
+   std::move(__last));
+   }
+  else
+   {
+ auto __result = ranges::next(__first, __n, __last);
+ if (__result == __last)
+   return __last;
+
+ auto __dest_head = __first, __dest_tail = __result;
+ while (__dest_head != __result)
+   {
+ if (__dest_tail == __last)
+   {
+ // If we get here, then we must have
+ // 2*n >= distance(__first, __last)
+ // i.e. we are shifting out at least half of the range.  In
+ // this case we can safely perform the shift with a single
+ // move.
+ std::move(std::move(__first), std::move(__dest_head),
+   std::move(__result));
+ return __result;
+   }
+ ++__dest_head;
+ ++__dest_tail;
+   }
+
+ for (;;)
+   {
+ // At the start of each iteration of this outer loop, the range
+ // [__first, __result) contains those elements that after shifting
+ // the whole range right by __n, should end up in
+ // [__dest_head, __dest_tail) in order.
+
+ // The below inner loop swaps the elements of [__first, __result)
+ // and [__dest_head, __dest_tail), while simultaneously shifti

Re: [PATCH] testsuite: Require vect_mutiple_sizes for scan-tree-dump in vect-epilogues.c

2020-02-21 Thread Jeff Law
On Wed, 2020-02-19 at 11:02 +0100, Uros Bizjak wrote:
> Default testsuite flags do not enable V8QI (MMX) vector mode for
> 32bit x86 targets.  Require vect_multiple_sizes effective target in
> scan-tree-dump to avoid "LOOP EPILOGUE VECTORIZED" failure.
> 
> Tested on x86_64-linux-gnu {,-m32}.
> 
> 2020-02-19  Uroš Bizjak  
> 
> * gcc.dg/vect/vect-epilogues.c (scan-tree-dump): Require
> vect_mutiple_sizes effective target.
> 
> OK for mainline?
OK
jeff



Re: [PATCH] lra: Stop registers being incorrectly marked live v2 [PR92989]

2020-02-21 Thread Jeff Law
On Wed, 2020-02-19 at 12:59 +, Richard Sandiford wrote:
> This PR is about a case in which the clobbers at the start of
> an EH receiver can lead to registers becoming unnecessarily
> live in predecessor blocks.  My first attempt at fixing this
> made sure that we update the bb liveness info based on the
> real live set:
> 
>   http://gcc.gnu.org/g:e648e57efca6ce6d751ef8c2038608817b514fb4
> 
> But it turns out that the clobbered registers were also added to
> the "gen" set of LRA's private liveness problem, where "gen" in
> this context means "generates a requirement for a live value".
> So the clobbered registers could still end up live via that
> mechanism instead.
> 
> This patch therefore reverts the patch above and takes the other
> approach floated in the original patch description: model the full
> clobber by making the registers live and then dead again.
> 
> There's no specific need to revert the original patch, since the
> code should no longer be sensitive to the order of the bb liveness
> update and the modelling of the clobber.  But given that there's
> no specific need to keep the original patch either, it seemed better
> to restore the code to the more well-tested order.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2020-02-19  Richard Sandiford  
> 
> gcc/
>   PR rtl-optimization/PR92989
>   * lra-lives.c (process_bb_lives): Restore the original order
>   of the bb liveness update.  Call make_hard_regno_dead for each
>   register clobbered at the start of an EH receiver.
THanks.  Installed.
jeff
> 



Re: [fixinc] Allow CONFIG_SHELL to override build-time shell in mkheaders

2020-02-21 Thread Jeff Law
On Thu, 2020-02-20 at 22:17 -0300, Alexandre Oliva wrote:
> mkheaders.in uses substitutions of @SHELL@ to run fixinc.sh and
> mkinstalldirs.  Problem is, SHELL comes from CONFIG_SHELL for the
> build system, and it needs not match whatever is available at an
> unrelated host system after installation, when mkheaders is supposed
> to be run.
> 
> I considered ditching the hardcoding altogether, but decided to retain
> it, but allowing CONFIG_SHELL and SHELL to override it, if any of them
> can successfully run mkinstalldirs, and if those and the substituted
> @SHELL@ fail, fallback to /bin/sh and to plain execution of the
> script, which appears to enable at least one shell on a system that
> doesn't typicall have a shell to recognize a script by #!/bin/sh and
> reinvoke itself to run it.
> 
> If all of these fail, we fail, but only after telling the user to
> retry after setting CONFIG_SHELL, that fixincl itself also uses.
> 
> Tested with a x86_64-linux-gnu native, and on various combinations of
> build, host and targets, including cross and canadian crosses involving
> host systems that don't have a built-in Bourne Shell.  I'm going ahead
> and checking it in as part of the build machinery, though I acknowledge
> this might be stretching configury a bit.  Please let me know if you
> have any objections to this stretching, or to the change itself.
> 
> 
> for  fixincludes/ChangeLog
> 
>   * mkheaders.in: Don't require build-time shell on host.
OK
jeff
> 



Re: [PATCH] Do not propagate self-dependent value (PR ipa/93763)

2020-02-21 Thread Jeff Law
On Fri, 2020-02-21 at 18:59 +0100, Martin Jambor wrote:
> Hi,
> 
> On Tue, Feb 18 2020, Feng Xue OS wrote:
> > Currently, for self-recursive call, we never use value originated from 
> > non-passthrough
> > jump function as source to avoid propagation explosion, but self-dependent 
> > value is
> > missed. This patch is made to fix the bug.
> > 
> > Bootstrapped/regtested on x86_64-linux and aarch64-linux.
> > 
> > Feng
> > ---
> > 2020-02-18  Feng Xue  
> > 
> > PR ipa/93763
> > * ipa-cp.c (self_recursively_generated_p): Mark self-dependent 
> > value as
> > self-recursively generated.
> 
> Honza, this is OK.
Thanks.  I went ahead and installed Feng's patch.

jeff
> 



Darwin: Fix wrong quoting on an error string (PR93860).

2020-02-21 Thread Iain Sandoe
The quotes should surround all of the literal content from the pragma
that has incorrect usage.

Fixed as below,
tested on x86_64-apple-darwin16,
applied to master,
thanks
Iain

2020-02-21  Iain Sandoe  

PR target/93860
* config/darwin-c.c (pop_field_alignment): Adjust quoting of
error string.

diff --git a/gcc/config/darwin-c.c b/gcc/config/darwin-c.c
index 85d775f056a..e3b999e166b 100644
--- a/gcc/config/darwin-c.c
+++ b/gcc/config/darwin-c.c
@@ -79,7 +79,7 @@ pop_field_alignment (void)
   free (entry);
 }
   else
-error ("too many %<#pragma options%> align=reset");
+error ("too many %<#pragma options align=reset%>");
 }
 
 /* Handlers for Darwin-specific pragmas.  */



Re: [Patch, fortran] PR fortran/92621 Problems with memory handling with allocatable intent(out) arrays with bind(c)

2020-02-21 Thread José Rui Faustino de Sousa

On 21/02/20 12:38, Tobias Burnus wrote:
Hmm, that sounds like papering over a real bug. 



It is possible, I tried using DECL_INITIAL to nullify cfi.n but it did 
not made any difference.


I tried to play with optimization and up to -O1 it does not seem to 
crash but it always seems to crash at -O2 (without the 
-static-libgfortran switch). Looking at the generated code I could not 
see anything obviously different...


Before the patch to PR92123 there were situations where there would be 
off bounds memory writes, but that would cause a crash at a later time.


I have no idea of what it could be...

Best regards,
José Rui



[PATCH] libstdc++: P0769R2 Add shift to

2020-02-21 Thread Patrick Palka
This patch adds std::shift_left and std::shift_right.  Alhough these are
STL-style algos, they are nonetheless placed in  because
they make use of some functions in the ranges namespace that are more easily
reachable from  than from , namely
ranges::next and ranges::swap_ranges.

This implementation of std::shift_right for non-bidirectional iterators deviates
from the reference implementation a bit.  The main difference is that this
implementation is significantly simpler, and yet saves around n/2 additional
iterator increments and n/2 iter_moves at the cost of around n/2 additional
iter_swaps (where n is the shift amount).

libstdc++-v3/ChangeLog:

P0769R2 Add shift to 
* include/bits/ranges_algo.h (shift_left, shift_right): New.
* testsuite/25_algorithms/shift_left/1.cc: New test.
* testsuite/25_algorithms/shift_right/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 48 +
 .../testsuite/25_algorithms/shift_left/1.cc   | 67 +++
 .../testsuite/25_algorithms/shift_right/1.cc  | 67 +++
 3 files changed, 182 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/shift_left/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/shift_right/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 7de1072abf0..c36afc6e19b 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3683,6 +3683,54 @@ namespace ranges
   inline constexpr __prev_permutation_fn prev_permutation{};
 
 } // namespace ranges
+
+  template
+constexpr ForwardIterator
+shift_left(ForwardIterator __first, ForwardIterator __last,
+  typename iterator_traits::difference_type __n)
+{
+  __glibcxx_assert(__n >= 0);
+  if (__n == 0)
+   return __last;
+
+  auto __mid = ranges::next(__first, __n, __last);
+  if (__mid == __last)
+   return __first;
+  return std::move(std::move(__mid), std::move(__last), 
std::move(__first));
+}
+
+  template
+constexpr ForwardIterator
+shift_right(ForwardIterator __first, ForwardIterator __last,
+   typename iterator_traits::difference_type __n)
+{
+  __glibcxx_assert(__n >= 0);
+  if (__n == 0)
+   return __first;
+
+  using _Cat = iterator_traits::iterator_category;
+  if constexpr (derived_from<_Cat, bidirectional_iterator_tag>)
+   {
+ auto __mid = ranges::next(__last, -__n, __first);
+ if (__mid == __first)
+   return __last;
+ return std::move_backward(std::move(__first), std::move(__mid),
+   std::move(__last));
+   }
+  else
+   {
+ auto __result = ranges::next(__first, __n, __last);
+ if (__result == __last)
+   return __last;
+ auto __dest = __result;
+ do
+   __dest = ranges::swap_ranges(__first, __result,
+std::move(__dest), __last).in2;
+ while (__dest != __last);
+ return __result;
+   }
+}
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // concepts
diff --git a/libstdc++-v3/testsuite/25_algorithms/shift_left/1.cc 
b/libstdc++-v3/testsuite/25_algorithms/shift_left/1.cc
new file mode 100644
index 000..9bdb843adbc
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/shift_left/1.cc
@@ -0,0 +1,67 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+#include 
+
+using __gnu_test::test_container;
+using __gnu_test::forward_iterator_wrapper;
+using __gnu_test::bidirectional_iterator_wrapper;
+using __gnu_test::random_access_iterator_wrapper;
+
+template typename Wrapper>
+void
+test01()
+{
+  static_assert(K == 10 || K == 11);
+  for (int i = 0; i < 15; i++)
+{
+  int x[K];
+  for (int t = 0; t < K; t++)
+   x[t] = t;
+  test_container cx(x);
+  auto out = std::shift_left(cx.begin(), cx.end(), i);
+  if (i < K)
+   {
+ VERIFY( out.ptr == x+(K-i) );
+ for (int j = i; j < K-i; j++)
+   VERIFY( x[j] == i+j );
+   }
+  e

Re: [PATCH] c++: Implement P1957R2, T* to bool should be considered narrowing.

2020-02-21 Thread Marek Polacek
On Fri, Feb 21, 2020 at 02:14:26PM -0500, Marek Polacek wrote:
> This was approved in the Prague 2020 WG21 meeting so let's adjust the
> comment.  Since it's supposed to be a DR I think we should no longer
> limit it to C++20.

Which is what clang++ trunk does.

Marek



[PATCH] c++: Implement P1957R2, T* to bool should be considered narrowing.

2020-02-21 Thread Marek Polacek
This was approved in the Prague 2020 WG21 meeting so let's adjust the
comment.  Since it's supposed to be a DR I think we should no longer
limit it to C++20.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2020-02-21  Marek Polacek  

P1957R2
* typeck2.c (check_narrowing): Consider T* to bool narrowing
in C++11 and up.

* g++.dg/cpp0x/initlist92.C: Don't expect an error in C++20 only.
---
 gcc/cp/typeck2.c| 7 ---
 gcc/testsuite/g++.dg/cpp0x/initlist92.C | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 48920894b3b..68bc2e5c170 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1036,9 +1036,10 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
 }
   else if (TREE_CODE (type) == BOOLEAN_TYPE
   && (TYPE_PTR_P (ftype) || TYPE_PTRMEM_P (ftype)))
-/* This hasn't actually made it into C++20 yet, but let's add it now to get
-   an idea of the impact.  */
-ok = (cxx_dialect < cxx2a);
+/* C++20 P1957R2: converting from a pointer type or a pointer-to-member
+   type to bool should be considered narrowing.  This is a DR so is not
+   limited to C++20 only.  */
+ok = false;
 
   bool almost_ok = ok;
   if (!ok && !CONSTANT_CLASS_P (init) && (complain & tf_warning_or_error))
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist92.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist92.C
index 319264ae274..213b192d441 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist92.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist92.C
@@ -23,7 +23,7 @@ bool Test4(std::initializer_list);
 
 int main () 
 {
-  ( Test1({"false"}) );// { dg-error "narrowing" "" { target c++2a } }
+  ( Test1({"false"}) );// { dg-error "narrowing" }
   ( Test2({123}) );
   ( Test3({456}) );
   ( Test4({"false"}) );

base-commit: dbfba41e95d1d93b17e907b7f516b52ed3a3c415
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-21 Thread Martin Jambor
Hi,

On Thu, Feb 20 2020, Feng Xue OS wrote:
> This is a simpel and nice fix, but could suppress some CP opportunities for
> self-recursive call.  Using the test case as example, the first should be a
> for-all-context clone, and the call "recur_fn (i, 1, depth + 1)" is replaced 
> with
> a newly created recursive node. Thus, in the next round of CP iteration, the
> way to do CP for the 2nd arugment "1" is blocked, because its coming edge
> can not pass check by cgraph_edge_brings_value_p().
>
> +__attribute__((noinline)) static int recur_fn (int i, int j, int depth)
> +{
> +   if (depth > 10)
> + return 1;
> +
> +   data[i + j]++;
> +
> +   if (depth & 3)
> + recur_fn (i, 1, depth + 1);
> +   else
> + recur_fn (i, j & 1, depth + 1);
> +
> +   foo();
> +
> +   return i + j;
> +}
> +
> +int caller (int v, int depth)
> +{
> +  recur_fn (1, v, depth);
> +
> +  return 0;
> +}
>
>
>>However, I believe that his approach mostly papers over a bug that
>>happens earlier, specifically that cgraph_edge_brings_value_p returned
>>true for the self-recursive edge from all-context clone to itself, even
>>though when evaluating the second argument.  We assume that all context
>>clones get at least as many constants as any other potential clone, but
>>that does not work for self-recursive edges with pass-through parameters
>>that that just pass along the received constant.
>
> The following check on value in cgraph_edge_brings_value_p could ensure
> whether the value can arrive the dest node or not. If the value is a constant
> without source, as above example "1", this is allowed. Otherwise, code snippet
> enclosed by "if (caller_info->ipcp_orig_node)" could capture for-all-context 
> clone.

there has not been any "following check" in your email but I believe I
understand what you mean, and I added such check to my patch so that the
edge carrying the non-pass through jump function was accepted by the
cgraph_edge_brings_value_p predicate.

However, that lead to the same assert in
find_more_scalar_values_for_callers_subset because on that edge it tried
to compute the depth + 1 value before it had any value to calculate it
from.

So after staring at the problem for another while I realized that the
users self_recursive_pass_through_p and
self_recursive_agg_pass_through_p would be OK if it returned false for
self-recursive calls from/to a node which is already a clone - clones
have their known constant values set at the point of their creation -
and that doing so avoids this problem.  So that is what the patch below
does.  I have still kept the cgraph_edge_brings_value_p hunks too, so
that edges are collected reliably.

Bootstrapped and tested on an x86_64-linux, LTO bootstrap underway.

What do you think?

Martin


2020-02-21  Martin Jambor  
Feng Xue  

PR ipa/93707
* ipa-cp.c (same_node_or_its_all_contexts_clone_p): Replaced with
new function calls_same_node_or_its_all_contexts_clone_p.
(cgraph_edge_brings_value_p): Use it.
(cgraph_edge_brings_value_p): Likewise.
(self_recursive_pass_through_p): Return false if caller is a clone.
(self_recursive_agg_pass_through_p): Likewise.

testsuite/
* gcc.dg/ipa/pr93707.c: New test.
---
 gcc/ChangeLog  | 11 +
 gcc/ipa-cp.c   | 38 +-
 gcc/testsuite/ChangeLog|  6 +
 gcc/testsuite/gcc.dg/ipa/pr93707.c | 31 
 4 files changed, 69 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93707.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7c481407de9..a965cae4f07 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2020-02-21  Martin Jambor  
+   Feng Xue  
+
+   PR ipa/93707
+   * ipa-cp.c (same_node_or_its_all_contexts_clone_p): Replaced with
+   new function calls_same_node_or_its_all_contexts_clone_p.
+   (cgraph_edge_brings_value_p): Use it.
+   (cgraph_edge_brings_value_p): Likewise.
+   (self_recursive_pass_through_p): Return false if caller is a clone.
+   (self_recursive_agg_pass_through_p): Likewise.
+
 2020-02-17  Richard Biener  
 
PR c/86134
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4f5b72e6994..aa228df1204 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4033,15 +4033,24 @@ edge_clone_summary_t::duplicate (cgraph_edge *src_edge, 
cgraph_edge *dst_edge,
   src_data->next_clone = dst_edge;
 }
 
-/* Return true is NODE is DEST or its clone for all contexts.  */
+/* Return true is CS calls DEST or its clone for all contexts, except for
+   self-recursive nodes in which it has to be DEST itself.  */
 
 static bool
-same_node_or_its_all_contexts_clone_p (cgraph_node *node, cgraph_node *dest)
+calls_same_node_or_its_all_contexts_clone_p (cgraph_edge *cs, cgraph_node 
*dest,
+bool allow_recursion_to_clone)
 {
-  if (node == dest)
+  enum availabil

[PATCH] c++: Fix ICE with -Wmismatched-tags [PR93869]

2020-02-21 Thread Marek Polacek
This is a crash in cp_parser_check_class_key:
  tree type_decl = TYPE_MAIN_DECL (type);
  tree name = DECL_NAME (type_decl); // HERE
because TYPE_MAIN_DECL of type was null as it's not a class type.
Instead of checking CLASS_TYPE_P we should simply check class_key
a bit earlier (in this case it was typename_type).

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2020-02-21  Marek Polacek  

PR c++/93869 - ICE with -Wmismatched-tags.
* parser.c (cp_parser_check_class_key): Check class_key earlier.

* g++.dg/warn/Wmismatched-tags-2.C: New test.
---
 gcc/cp/parser.c| 14 +++---
 gcc/testsuite/g++.dg/warn/Wmismatched-tags-2.C |  6 ++
 2 files changed, 13 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wmismatched-tags-2.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ee534b5db21..d5f2c14d951 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30987,6 +30987,13 @@ cp_parser_check_class_key (cp_parser *parser, 
location_t key_loc,
   if (!warn_mismatched_tags && !warn_redundant_tags)
 return;
 
+  /* Only consider the true class-keys below and ignore typename_type,
+ etc. that are not C++ class-keys.  */
+  if (class_key != class_type
+  && class_key != record_type
+  && class_key != union_type)
+return;
+
   tree type_decl = TYPE_MAIN_DECL (type);
   tree name = DECL_NAME (type_decl);
   /* Look up the NAME to see if it unambiguously refers to the TYPE
@@ -30995,13 +31002,6 @@ cp_parser_check_class_key (cp_parser *parser, 
location_t key_loc,
   tree decl = cp_parser_lookup_name_simple (parser, name, input_location);
   pop_deferring_access_checks ();
 
-  /* Only consider the true class-keys below and ignore typename_type,
- etc. that are not C++ class-keys.  */
-  if (class_key != class_type
-  && class_key != record_type
-  && class_key != union_type)
-return;
-
   /* The class-key is redundant for uses of the CLASS_TYPE that are
  neither definitions of it nor declarations, and for which name
  lookup returns just the type itself.  */
diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-tags-2.C 
b/gcc/testsuite/g++.dg/warn/Wmismatched-tags-2.C
new file mode 100644
index 000..00193f02f61
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wmismatched-tags-2.C
@@ -0,0 +1,6 @@
+// PR c++/93869 - ICE with -Wmismatched-tags.
+// { dg-do compile }
+// { dg-options "-Wmismatched-tags" }
+
+namespace N { typedef int T; }
+typename N::T x;

base-commit: dbfba41e95d1d93b17e907b7f516b52ed3a3c415
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [PATCH] Do not propagate self-dependent value (PR ipa/93763)

2020-02-21 Thread Martin Jambor
Hi,

On Tue, Feb 18 2020, Feng Xue OS wrote:
> Currently, for self-recursive call, we never use value originated from 
> non-passthrough
> jump function as source to avoid propagation explosion, but self-dependent 
> value is
> missed. This patch is made to fix the bug.
>
> Bootstrapped/regtested on x86_64-linux and aarch64-linux.
>
> Feng
> ---
> 2020-02-18  Feng Xue  
>
> PR ipa/93763
> * ipa-cp.c (self_recursively_generated_p): Mark self-dependent value 
> as
> self-recursively generated.

Honza, this is OK.

Thanks,

Martin



> From 1ff803f33de0fe86d526deb23af2d08c15028ff9 Mon Sep 17 00:00:00 2001
> From: Feng Xue 
> Date: Mon, 17 Feb 2020 17:07:04 +0800
> Subject: [PATCH] Do not propagate self-dependent value (PR ipa/93763)
>
> ---
>  gcc/ipa-cp.c   | 10 ---
>  gcc/testsuite/g++.dg/ipa/pr93763.C | 15 ++
>  gcc/testsuite/gcc.dg/ipa/pr93763.c | 46 ++
>  3 files changed, 67 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr93763.C
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93763.c
>
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 4f5b72e6994..1d0c1ac0f35 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1897,8 +1897,8 @@ ipcp_lattice::add_value (valtype newval, 
> cgraph_edge *cs,
>  }
>  
>  /* Return true, if a ipcp_value VAL is orginated from parameter value of
> -   self-feeding recursive function by applying non-passthrough arithmetic
> -   transformation.  */
> +   self-feeding recursive function via some kind of pass-through jump
> +   function.  */
>  
>  static bool
>  self_recursively_generated_p (ipcp_value *val)
> @@ -1909,10 +1909,12 @@ self_recursively_generated_p (ipcp_value *val)
>  {
>cgraph_edge *cs = src->cs;
>  
> -  if (!src->val || cs->caller != cs->callee->function_symbol ()
> -   || src->val == val)
> +  if (!src->val || cs->caller != cs->callee->function_symbol ())
>   return false;
>  
> +  if (src->val == val)
> + continue;
> +
>if (!info)
>   info = IPA_NODE_REF (cs->caller);
>  


[committed] cxx-status: Update -std= instructions for C++20.

2020-02-21 Thread Marek Polacek
We merged support for -std=c++20 to trunk, so -std=c++2a is only
needed in GCC 9 and earlier.
---
 htdocs/projects/cxx-status.html | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 1b54ea97..47278613 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -37,7 +37,8 @@
   standard, which is expected to be published in 2020.
 
   C++2a features are available since GCC 8. To enable C++2a
-  support, add the command-line parameter -std=c++2a
+  support, add the command-line parameter -std=c++20
+  (use -std=c++2a in GCC 9 and earlier)
   to your g++ command line. Or, to enable GNU
   extensions in addition to C++2a features,
 add -std=gnu++2a.

base-commit: b2543147f043404eb90ebefa744d210fa7ca6566
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



[committed] cxx-status: Update with Prague 2020 WG21 meeting papers.

2020-02-21 Thread Marek Polacek
I'm assuming the new papers have not been implemented yet.
(P1957R1 is implemented though, I'll update the page separately.)

Pushed.
---
 htdocs/projects/cxx-status.html | 161 +---
 1 file changed, 126 insertions(+), 35 deletions(-)

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 3f69d443..1b54ea97 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -110,20 +110,47 @@

 
 
-   Concepts 
-  http://wg21.link/p0734r0";>P0734R0
-http://wg21.link/p0857r0";>P0857R0
-   http://wg21.link/p1084r2";>P1084R2
-   http://wg21.link/p1141r2";>P1141R2
-   http://wg21.link/p0848r3";>P0848R3
-   http://wg21.link/p1616r1";>P1616R1
-   http://wg21.link/p1452r2";>P1452R2
-   
-   https://wg21.link/p1972r0";>P1972R0
-   
-   https://wg21.link/p1980r0";>P1980R0
-   10 
-   __cpp_concepts >= 201907 
+   Concepts 
+  http://wg21.link/p0734r0";>P0734R0
+   10 
+   __cpp_concepts >= 201907 
+
+
+  http://wg21.link/p0857r0";>P0857R0
+
+
+  http://wg21.link/p1084r2";>P1084R2
+
+
+  http://wg21.link/p1141r2";>P1141R2
+
+
+  http://wg21.link/p0848r3";>P0848R3
+
+
+  http://wg21.link/p1616r1";>P1616R1
+
+
+  http://wg21.link/p1452r2";>P1452R2
+
+
+  
+  https://wg21.link/p1972r0";>P1972R0
+
+
+  
+  https://wg21.link/p1980r0";>P1980R0
+
+
+
+  https://wg21.link/p2103r0";>P2103R0
+  No
+
+
+  https://wg21.link/p2092r0";>P2092R0
+
+
+  https://wg21.link/p2113r0";>P2113R0
 
 
Range-based for statements with initializer 
@@ -156,18 +183,42 @@
__cpp_constexpr_in_decltype >= 201711 
 
 
-   Consistent comparison (operator<=>) 
-  http://wg21.link/p0515r3";>P0515R3
-   http://wg21.link/p0905r1";>P0905R1
-   http://wg21.link/p1120r0";>P1120R0
-   http://wg21.link/p1185r2";>P1185R2
-   http://wg21.link/p1186r3";>P1186R3
-   http://wg21.link/p1630r1";>P1630R1
-   
-   https://wg21.link/p1946r0";>P1946R0
-   
-  10
-   __cpp_impl_three_way_comparison >= 201711 
+   Consistent comparison 
(operator<=>)
+  http://wg21.link/p0515r3";>P0515R3
+  10
+   __cpp_impl_three_way_comparison >= 201711 
+
+
+  http://wg21.link/p0905r1";>P0905R1
+
+
+  http://wg21.link/p1120r0";>P1120R0
+
+
+  http://wg21.link/p1185r2";>P1185R2
+
+
+  http://wg21.link/p1186r3";>P1186R3
+
+
+  http://wg21.link/p1630r1";>P1630R1
+
+
+  
+  https://wg21.link/p1946r0";>P1946R0
+
+
+  
+  https://wg21.link/p1959r0";>P1959R0
+
+
+  
+  https://wg21.link/p2002r1";>P2002R1
+  No
+
+
+  
+  https://wg21.link/p2085r0";>P2085R0
 
 
Access checking on specializations 
@@ -286,11 +337,16 @@
__cpp_char8_t >= 201811 
 
 
-   Immediate functions (consteval) 
+   Immediate functions (consteval) 
   http://wg21.link/p1073r3";>P1073R3
10
 (partial, no consteval virtual support) 
-   __cpp_consteval >= 201811 
+   __cpp_consteval >= 201811 
+
+
+  
+  https://wg21.link/p1937r2";>P1937R2
+  No
 
 
std::is_constant_evaluated 
@@ -343,10 +399,10 @@

 
 
-   Modules 
+   Modules 
   http://wg21.link/p1103r3";>P1103R3
-  No (Modules Wiki) 
-   
+  No (Modules Wiki) 
+   
 
 
   http://wg21.link/p1766r1";>P1766R1
@@ -365,6 +421,22 @@
   
   https://wg21.link/p1979r0";>P1979R0
 
+
+  
+  https://wg21.link/p1779r3";>P1779R3
+
+
+  
+  https://wg21.link/p1857r3";>P1857R3
+
+
+  
+  https://wg21.link/p2115r0";>P2115R0
+
+
+  
+  https://wg21.link/p1815r2";>P1815R2
+
 
Coroutines 
   http://wg21.link/p0912r5";>P0912R5
@@ -389,6 +461,13 @@
   No (https://gcc.gnu.org/PR93529";>PR93529)
   
 
+
+  
+  DR: Converting from T* to bool should be 
considered narrowing
+  https://wg21.link/p1957r2";>P1957R2
+  No
+  
+
 
Stronger Unicode requirements 
   http://wg21.link/p1041r4";>P1041R4
@@ -430,10 +509,15 @@
   
 
 
-  Class template argument deduction for aggregates
+  Class template argument deduction for aggregates
   http://wg21.link/p1816r0";>P1816R0
   10
-  __cpp_deduction_guides >= 201907L
+  __cpp_deduction_guides >= 201907L
+
+
+  
+  https://wg21.link/p2082r1";>P2082R1
+  No
 
 
   Class template argument deduction for alias templates
@@ -453,9 +537,16 @@
   __cpp_constinit >= 201907
 
 
-  More implicit moves (merge P0527R1 and P1155R3)
+  DR: More implicit move

[PATCH] libstdc++: Define <=> for tuple, optional and variant

2020-02-21 Thread Jonathan Wakely
Another piece of P1614R2.

* include/std/optional (operator<=>(optional, optional))
(operator<=>(optional, nullopt), operator<=>(optional, U)):
Define for C++20.
* include/std/tuple (__tuple_cmp): New helper function for <=>.
(operator<=>(tuple, tuple...)): Define for C++20.
* include/std/variant (operator<=>(variant, variant))
(operator<=>(monostate, monostate)): Define for C++20.
* testsuite/20_util/optional/relops/three_way.cc: New test.
* testsuite/20_util/tuple/comparison_operators/three_way.cc: New test.
* testsuite/20_util/variant/89851.cc: Move to ...
* testsuite/20_util/variant/relops/89851.cc: ... here.
* testsuite/20_util/variant/90008.cc: Move to ...
* testsuite/20_util/variant/relops/90008.cc: ... here.
* testsuite/20_util/variant/relops/three_way.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 9e58988061f4175896de11af0caf9bdd48c9b046
Author: Jonathan Wakely 
Date:   Fri Feb 21 12:02:15 2020 +

libstdc++: Define <=> for tuple, optional and variant

Another piece of P1614R2.

* include/std/optional (operator<=>(optional, optional))
(operator<=>(optional, nullopt), operator<=>(optional, U)):
Define for C++20.
* include/std/tuple (__tuple_cmp): New helper function for <=>.
(operator<=>(tuple, tuple...)): Define for C++20.
* include/std/variant (operator<=>(variant, variant))
(operator<=>(monostate, monostate)): Define for C++20.
* testsuite/20_util/optional/relops/three_way.cc: New test.
* testsuite/20_util/tuple/comparison_operators/three_way.cc: New 
test.
* testsuite/20_util/variant/89851.cc: Move to ...
* testsuite/20_util/variant/relops/89851.cc: ... here.
* testsuite/20_util/variant/90008.cc: Move to ...
* testsuite/20_util/variant/relops/90008.cc: ... here.
* testsuite/20_util/variant/relops/three_way.cc: New test.

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index b920a1453ba..37c2ba7a025 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -41,6 +41,9 @@
 #include 
 #include 
 #include 
+#if __cplusplus > 201703L
+# include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -1027,12 +1030,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return !__rhs || (static_cast(__lhs) && *__lhs >= *__rhs);
 }
 
+#ifdef __cpp_lib_three_way_comparison
+  template _Up>
+constexpr compare_three_way_result_t<_Tp, _Up>
+operator<=>(const optional<_Tp>& __x, const optional<_Up>& __y)
+{
+  return __x && __y ? *__x <=> *__y : bool(__x) <=> bool(__y);
+}
+#endif
+
   // Comparisons with nullopt.
   template
 constexpr bool
 operator==(const optional<_Tp>& __lhs, nullopt_t) noexcept
 { return !__lhs; }
 
+#ifdef __cpp_lib_three_way_comparison
+  template
+constexpr strong_ordering
+operator<=>(const optional<_Tp>& __x, nullopt_t) noexcept
+{ return bool(__x) <=> false; }
+#else
   template
 constexpr bool
 operator==(nullopt_t, const optional<_Tp>& __rhs) noexcept
@@ -1087,6 +1105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr bool
 operator>=(nullopt_t, const optional<_Tp>& __rhs) noexcept
 { return !__rhs; }
+#endif // three-way-comparison
 
   // Comparisons with value type.
   template
@@ -1161,6 +1180,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 -> __optional_relop_t() >= declval<_Tp>())>
 { return !__rhs || __lhs >= *__rhs; }
 
+#ifdef __cpp_lib_three_way_comparison
+  template
+constexpr compare_three_way_result_t<_Tp, _Up>
+operator<=>(const optional<_Tp>& __x, const _Up& __v)
+{ return bool(__x) ? *__x <=> __v : strong_ordering::less; }
+#endif
+
   // Swap and creation functions.
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 3829d844e0b..808947781ae 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -39,6 +39,9 @@
 #include 
 #include 
 #include 
+#if __cplusplus > 201703L
+# include 
+#endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -1397,6 +1400,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __compare::__eq(__t, __u);
 }
 
+#if __cpp_lib_three_way_comparison
+  template
+constexpr _Cat
+__tuple_cmp(const _Tp&, const _Up&, index_sequence<>)
+{ return _Cat::equivalent; }
+
+  template
+constexpr _Cat
+__tuple_cmp(const _Tp& __t, const _Up& __u,
+   index_sequence<_Idx0, _Idxs...>)
+{
+  auto __c
+   = __detail::__synth3way(std::get<_Idx0>(__t), std::get<_Idx0>(__u));
+  if (__c != 0)
+   return __c;
+  return std::__tuple_cmp<_Cat>(__t, __u, index_sequence<_Idxs...>());
+}
+
+  template
+constexpr
+common_comparison_category_t<

PING [PATCH] drop weakref attribute on function definitions (PR 92799)

2020-02-21 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00883.html

On 2/14/20 3:41 PM, Martin Sebor wrote:

Because attribute weakref introduces a kind of a definition, it can
only be applied to declarations of symbols that are not defined.  GCC
normally issues a warning when the attribute is applied to a defined
symbol, but PR 92799 shows that it misses some cases on which it then
leads to an ICE.

The ICE was introduced in GCC 4.5.  Prior to then, GCC accepted such
invalid definitions and silently dropped the weakref attribute.

The attached patch avoids the ICE while again dropping the invalid
attribute from the definition, except with the (now) usual warning.

Tested on x86_64-linux.

I also looked for code bases that make use of attribute weakref to
rebuild them as another test but couldn't find any.  (There are
a couple of instances in the Linux kernel but they look #ifdef'd
out).  Does anyone know of any that do use it that I could try to
build on Linux?

Martin




[PATCH] Avoid collect2 calling signal unsafe functions and/or unlink, with uninitialized memory (for gcc-9 branch)

2020-02-21 Thread Bernd Edlinger
Hi,

this fixes the signal handler calling signal unsafe vfprintf and/or passing
uninitialized memory to unlink in signal handler.

This time it is the patch for the gcc-9 branch.

The difference to the gcc-8 branch is in tool_cleanup:
The variable that suppress the vfprintf is verbose = false;
not debug = false, to match the different logic in maybe_unlink.


Bootstrapped and reg-tested with x86_64-pc-linux-gnu.
Is it OK for the gcc-9 branch?


Thanks
Bernd.
From d52ac2c0394f0432e183511f8a6d4f96b49f88a5 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Mon, 17 Feb 2020 17:40:07 +0100
Subject: [PATCH] Avoid collect2 calling signal unsafe functions and/or unlink
 with uninitialized memory

2020-02-20  Bernd Edlinger  

	* collect2.c (tool_cleanup): Avoid calling not signal-safe
	functions.
	(maybe_run_lto_and_relink): Avoid possible signal handler
	access to unintialzed memory (lto_o_files).
---
 gcc/collect2.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index eb84f84..8f092e7 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -384,6 +384,10 @@ static void scan_prog_file (const char *, scanpass, scanfilter);
 void
 tool_cleanup (bool from_signal)
 {
+  /* maybe_unlink may call notice, which is not signal safe.  */
+  if (from_signal)
+verbose = false;
+
   if (c_file != 0 && c_file[0])
 maybe_unlink (c_file);
 
@@ -743,7 +747,10 @@ maybe_run_lto_and_relink (char **lto_ld_argv, char **object_lst,
 	  ++num_files;
 	  }
 
-	lto_o_files = XNEWVEC (char *, num_files + 1);
+	/* signal handler may access uninitialized memory
+	   and delete whatever it points to, if lto_o_files
+	   is not allocated with calloc.  */
+	lto_o_files = XCNEWVEC (char *, num_files + 1);
 	lto_o_files[num_files] = NULL;
 	start = XOBFINISH (&temporary_obstack, char *);
 	for (i = 0; i < num_files; ++i)
-- 
2.7.4



Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-21 Thread Kyrill Tkachov

Hi Roman,

On 2/21/20 3:49 PM, Roman Zhuykov wrote:

11.02.2020 14:00, Richard Earnshaw (lists) wrote:

+(define_insn "*doloop_end"
+  [(parallel [(set (pc)
+   (if_then_else
+   (ne (reg:SI LR_REGNUM) (const_int 1))
+ (label_ref (match_operand 0 "" ""))
+ (pc)))
+  (set (reg:SI LR_REGNUM)
+   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
+  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
+  "le\tlr, %l0")

Is it deliberate that this pattern name has a '*' prefix?  doloop_end
is a named expansion pattern according to md.texi.

R.

21.02.2020 18:30, Kyrill Tkachov wrote:

+;; Originally expanded by 'doloop_end'.
+(define_insn "doloop_end_internal"

We usually prefer to name these patterns with a '*' in front to
prevent the gen* machinery from generating gen_* unneeded expanders
for them if they're not used.


It seems you and Richard asking Andrea to do the opposite things.
:) LOL.patch



Almost, but not exactly incompatible things ;)

doloop_end is a standard name and if we wanted to use it directly it 
cannot have a '*', which Richard is right to point out.


Once "doloop_end" is moved to its own expander and the define_insn is 
doloop_end_internal, there is no reason for it to not have a '*' as its 
gen_* form is never called.


Thanks,

Kyrill



Roman

PS. I don't have an idea what approach is correct.


Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-21 Thread Roman Zhuykov
11.02.2020 14:00, Richard Earnshaw (lists) wrote:
> +(define_insn "*doloop_end"
> +  [(parallel [(set (pc)
> +   (if_then_else
> +   (ne (reg:SI LR_REGNUM) (const_int 1))
> + (label_ref (match_operand 0 "" ""))
> + (pc)))
> +  (set (reg:SI LR_REGNUM)
> +   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
> +  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
> +  "le\tlr, %l0")
>
> Is it deliberate that this pattern name has a '*' prefix?  doloop_end
> is a named expansion pattern according to md.texi.
>
> R.

21.02.2020 18:30, Kyrill Tkachov wrote:
> +;; Originally expanded by 'doloop_end'.
> +(define_insn "doloop_end_internal"
>
> We usually prefer to name these patterns with a '*' in front to
> prevent the gen* machinery from generating gen_* unneeded expanders
> for them if they're not used.
>

It seems you and Richard asking Andrea to do the opposite things.
:) LOL.patch

Roman

PS. I don't have an idea what approach is correct.


Re: [Ping][PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-21 Thread Dennis Zhang

Hi Kyrill,

On 21/02/2020 11:47, Kyrill Tkachov wrote:

Hi Dennis,

On 2/11/20 12:03 PM, Dennis Zhang wrote:

Hi all,

On 16/12/2019 13:45, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the Arm Armv8.6-A CLI patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html.
> It also depends on the Armv8.6-A effective target checking patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
> It also depends on the ARMv8.6-A I8MM dot product patch for using the
> same builtin qualifier
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html.
>
> This patch adds intrinsics for matrix multiply-accumulate operations
> including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
>
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for arm-none-linux-gnueabi-armv8.2-a.
>
> Is it OK for trunk please?
>


This is ok.

Thanks,

Kyrill



Thanks a lot for the approval.
The patch has been pushed as 436016f45694c7236e2e9f9db2adb0b4d9bf6b94.

Bests
Dennis


Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-21 Thread Roman Zhuykov
Andrea Corallo writes:

> With this patch the first insn of the low loop overhead 'doloop_begin'
> is expanded by 'doloop_modify' in loop-doloop.c.  The same does not
> happen with SMS.

That certainly works correct as in your first patch, doloop_begin
pattern also have "!flag_modulo_sched" condition.

> My understanding is that to have it working in that
> case too the machine dependent reorg pass should add it later.  Am I
> correct on this?

IMHO, this is not needed is your case.  Currently, list of platforms
(actually, gcc/config subfolders) which have doloop_end is rather big:
aarch64*, arc, arm*, bfin, c6x, ia64, pdp11, pru, rs6000, s390, sh,
tilegx*, tilepro, v850 and xtensa.  I marked three of them with a star -
they actually have a fake pattern, which is applied only with SMS. 
Reorg_loops from hw-doloop.c (see also
https://gcc.gnu.org/ml/gcc-patches/2011-06/msg01593.html and
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg00133.html) is used only
in arc, bfin, c6x, and xtensa.  Certainly some other platforms may have
additional loop reorg steps in target-specific part (e.q. pru), but not
all of them.  And that reorg is actually needed independently, whether
SMS is on or off.

Actually, the question was: what goes wrong if you remove that
"!flag_modulo_sched" condition from three new patterns?  I had actually
made one step forward, removed that "!flag_modulo_sched" parts in your
patch, and made the following simplest testing for such modified patch. 
I've build and then compared regtest results of two ARM cross-compilers:
first was built from clean trunk, second with patch.  Both compilers
were configured -with-march=armv8.1-m.main and had modified common.opt
to enable -fmodulo-sched and -fmodulo-sched-allow-regmoves by default. 
Regtest results are identical.
> Second version of the patch here addressing comments.

Thank you, now I see in second patch that aspect was solved.

> SMS is disabled in tests not to break them when SMS does loop versioning.

And I'm not really sure about this.  First of all, there are a lot of
scan-assembler-times tests which fail when modulo-scheduler is enabled,
probably the same happens when some unrolling parameters are not
default.  It seems that any non-default optimization which creates more
instruction copies can break scan-assembler-times check.  IMHO, it is
not necessary to workaround this in few particular tests.  Second, I'm
not sure how dg-skip-if directive works.  When one enables SMS setting
"Init(1)" directly into common.opt this won't be catched, would it?

Roman


Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-21 Thread Kyrill Tkachov

Hi Andrea,

On 2/19/20 1:01 PM, Andrea Corallo wrote:

Hi all,

Second version of the patch here addressing comments.

This patch enables the Armv8.1-M Mainline LOB (low overhead branch) 
extension

low overhead loops (LOL) feature by using the 'loop-doloop' pass.

Given the following function:

void
loop (int *a)
{
  for (int i = 0; i < 1000; i++)
    a[i] = i;
}

'doloop_begin' and 'doloop_end' patterns translates into 'dls' and 'le'
giving:

 loop:
 movw    r2, #1
 movs    r3, #0
 subs    r0, r0, #4
 push    {lr}
 dls lr, r2
 .L2:
 str r3, [r0, #4]!
 adds    r3, r3, #1
 le  lr, .L2
 ldr pc, [sp], #4

SMS is disabled in tests not to break them when SMS does loop versioning.

bootstrapped arm-none-linux-gnueabihf, do not introduce testsuite 
regressions.



This should be aimed at GCC 11 at this point.

Some comments inline...




Andrea

gcc/ChangeLog:

2020-??-??  Andrea Corallo  
    Mihail-Calin Ionescu 
    Iain Apreotesei  

    * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
    (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
    * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
    * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
    Add new patterns.
    * config/arm/unspecs.md: Add new unspec.

gcc/testsuite/ChangeLog:

2020-??-??  Andrea Corallo  
    Mihail-Calin Ionescu 
    Iain Apreotesei  

    * gcc.target/arm/lob.h: New header.
    * gcc.target/arm/lob1.c: New testcase.
    * gcc.target/arm/lob2.c: Likewise.
    * gcc.target/arm/lob3.c: Likewise.
    * gcc.target/arm/lob4.c: Likewise.
    * gcc.target/arm/lob5.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.



lol.patch

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index e07cf03538c5..1269f40bd77c 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -586,6 +586,9 @@ extern int arm_arch_bf16;
 
 /* Target machine storage Layout.  */
 
+/* Nonzero if this chip provides Armv8.1-M Mainline

+   LOB (low overhead branch features) extension instructions.  */
+#define TARGET_HAVE_LOB (arm_arch8_1m_main)
 
 /* Define this macro if it is advisable to hold scalars in registers

in a wider mode than that declared by the program.  In such cases,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9cc7bc0e5621..7c2a7b7e9e97 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -833,6 +833,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
 
+#undef TARGET_INVALID_WITHIN_DOLOOP

+#define TARGET_INVALID_WITHIN_DOLOOP arm_invalid_within_doloop
+
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
@@ -32937,6 +32940,27 @@ arm_ge_bits_access (void)

   return true;
 }
 
+/* NULL if INSN insn is valid within a low-overhead loop.

+   Otherwise return why doloop cannot be applied.  */
+
+static const char *
+arm_invalid_within_doloop (const rtx_insn *insn)
+{
+  if (!TARGET_HAVE_LOB)
+return default_invalid_within_doloop (insn);
+
+  if (CALL_P (insn))
+return "Function call in the loop.";
+
+  if (tablejump_p (insn, NULL, NULL) || computed_jump_p (insn))
+return "Computed branch in the loop.";
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, LR_REGNUM), insn))
+return "LR is used inside loop.";
+
+  return NULL;
+}
+
 #if CHECKING_P
 namespace selftest {
 
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md

index b0d3bd1cf1c4..4aff1a0838d8 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1555,8 +1555,11 @@
   using a certain 'count' register and (2) the loop count can be
   adjusted by modifying this register prior to the loop.
   ??? The possible introduction of a new block to initialize the
-  new IV can potentially affect branch optimizations.  */
-   if (optimize > 0 && flag_modulo_sched)
+  new IV can potentially affect branch optimizations.
+
+  Also used to implement the low over head loops feature, which is part of
+  the Armv8.1-M Mainline Low Overhead Branch (LOB) extension.  */
+   if (optimize > 0 && (flag_modulo_sched || TARGET_HAVE_LOB))
{
  rtx s0;
  rtx bcomp;
@@ -1569,6 +1572,11 @@
FAIL;
 
  s0 = operands [0];

+
+ /* Low over head loop instructions require the first operand to be LR.  */
+ if (TARGET_HAVE_LOB)
+   s0 = gen_rtx_REG (SImode, LR_REGNUM);
+
  if (TARGET_THUMB2)
insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1)));
  else
@@ -1650,3 +1658,30 @@
   "TARGET_HAVE_MVE"
   "lsrl%?\\t%Q0, %R0, %1"
   [(set_attr "predicable" "yes")])
+
+;; Originally expanded by 'doloop_end'.
+(define_insn "doloop_end_internal"

We usually prefer to name these patterns with a '*' in front to prevent the 
gen* mach

[committed] testsuite: Add -fcommon to gcc.target/i386/pr69052.c

2020-02-21 Thread Uros Bizjak
This testcase is susceptible to memory location details and start to fail
with default to -fno-common.  Use -fcommon to set expected testing conditions.

Tested on x86_64-linux-gnu {,-m32}.

2020-02-21  Uroš Bizjak  

* gcc.target/i386/pr69052.c: Require target ia32.
(dg-options): Add -fcommon and remove -pie.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr69052.c 
b/gcc/testsuite/gcc.target/i386/pr69052.c
index 6f491e9ab539..19bc3c8a77dd 100644
--- a/gcc/testsuite/gcc.target/i386/pr69052.c
+++ b/gcc/testsuite/gcc.target/i386/pr69052.c
@@ -1,6 +1,6 @@
-/* { dg-do compile } */
+/* { dg-do compile { target ia32 } } */
 /* { dg-require-effective-target pie } */
-/* { dg-options "-O2 -fPIE -pie" } */
+/* { dg-options "-O2 -fPIE -fcommon" } */
 
 int look_nbits[256], loop_sym[256];
 const int ind[] = {
@@ -51,4 +51,4 @@ void foo (int *l1, int *l2, int *v, int *v1, int *m1, int i)
 }
 }
 
-/* { dg-final { scan-assembler-not "leal\[ \t\]ind@GOTOFF\\(%\[^,\]*\\), %" { 
target ia32 } } } */
+/* { dg-final { scan-assembler-not "leal\[ \t\]ind@GOTOFF\\(%\[^,\]*\\), %" } 
} */


Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-02-21 Thread Delia Burduv

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how the 
aarch64 versions are implemented too.


Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output registers as
> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>> have commit rights, so if this is ok can someone please commit it 
for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,

    return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the main 
arm_neon.h file, right?

I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+



[PATCH, committed][OpenACC] Adapt libgomp acc_get_property.f90 test

2020-02-21 Thread Harwath, Frederik
Hi,
The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

This obvious patch fixes that problem. Committed as 
r10-6782-g83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a.

Best regards,
Frederik
From 83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 21 Feb 2020 15:26:02 +0100
Subject: [PATCH] Adapt libgomp acc_get_property.f90 test

The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

2020-02-21  Frederik Harwath  

	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
	integer(acc_device_property) for the type of the return value of
	acc_get_property.
---
 libgomp/ChangeLog  | 7 +++
 .../testsuite/libgomp.oacc-fortran/acc_get_property.f90| 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 3c640c7350b..bff3ae58c9a 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-21  Frederik Harwath  
+
+	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
+	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
+	integer(acc_device_property) for the type of the return value of
+	acc_get_property.
+
 2020-02-19  Tobias Burnus  
 
 	* .gitattributes: New; whitespace handling for Fortran's openacc_lib.h.
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
index 80ae292f41f..1af7cc3b988 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
@@ -26,13 +26,14 @@ end program test
 ! and do basic device independent validation.
 subroutine print_device_properties (device_type)
   use openacc
+  use iso_c_binding, only: c_size_t
   implicit none
 
   integer, intent(in) :: device_type
 
   integer :: device_count
   integer :: device
-  integer(acc_device_property) :: v
+  integer(c_size_t) :: v
   character*256 :: s
 
   device_count = acc_get_num_devices(device_type)
-- 
2.17.1



[committed] amdgcn: Use correct offset mode for gather/scatter

2020-02-21 Thread Andrew Stubbs
When I forward ported the scatter/gather patterns from my GCC 9 
implementation I didn't notice that GCC 10 has a different naming 
scheme. :-(


The patterns were being used because all GCN vector loads end up being 
scatter/gather, but not by the actual vectorizer. The test fails were 
there to see, but there are still a lot of those to work through.


This patch uses the new two-mode naming scheme and implements the 
offsets correctly. This is actually a step forward for GCN because the 
offsets are always SImode, regardless of the primary mode.


Andrew
amdgcn: Use correct offset mode for gather/scatter

The scatter/gather pattern names changed for GCC 10, but I hadn't noticed.
This switches the patterns to the new offset mode scheme.

2020-02-21  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (gather_load): Rename to ...
	(gather_loadv64si): ... this and set operand 2 to V64SI.
	(scatter_store): Rename to ...
	(scatter_storev64si): ... this and set operand 1 to V64SI.
	(scatter_exec): Delete. Move contents ...
	(mask_scatter_store): ... here, and rename that to ...
	(mask_gather_loadv64si): ... this. Set operand 2 to V64SI.
	Remove mode conversion.
	(mask_gather_load): Rename to ...
	(mask_scatter_storev64si): ... this. Set operand 1 to V64SI.
	Remove mode conversion.
	* config/gcn/gcn.c (gcn_expand_scaled_offsets): Remove mode conversion.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index d5e6d0a625a..a0cc9a2d8fc 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -679,10 +679,10 @@
 ;;   fields normally found in a MEM.
 ;; - Multiple forms of address expression are supported, below.
 
-(define_expand "gather_load"
+(define_expand "gather_loadv64si"
   [(match_operand:VEC_ALLREG_MODE 0 "register_operand")
(match_operand:DI 1 "register_operand")
-   (match_operand 2 "register_operand")
+   (match_operand:V64SI 2 "register_operand")
(match_operand 3 "immediate_operand")
(match_operand:SI 4 "gcn_alu_operand")]
   ""
@@ -811,9 +811,9 @@
   [(set_attr "type" "flat")
(set_attr "length" "12")])
 
-(define_expand "scatter_store"
+(define_expand "scatter_storev64si"
   [(match_operand:DI 0 "register_operand")
-   (match_operand 1 "register_operand")
+   (match_operand:V64SI 1 "register_operand")
(match_operand 2 "immediate_operand")
(match_operand:SI 3 "gcn_alu_operand")
(match_operand:VEC_ALLREG_MODE 4 "register_operand")]
@@ -833,34 +833,6 @@
 DONE;
   })
 
-(define_expand "scatter_exec"
-  [(match_operand:DI 0 "register_operand")
-   (match_operand 1 "register_operand")
-   (match_operand 2 "immediate_operand")
-   (match_operand:SI 3 "gcn_alu_operand")
-   (match_operand:VEC_ALLREG_MODE 4 "register_operand")
-   (match_operand:DI 5 "gcn_exec_reg_operand")]
-  ""
-  {
-operands[5] = force_reg (DImode, operands[5]);
-
-rtx addr = gcn_expand_scaled_offsets (DEFAULT_ADDR_SPACE, operands[0],
-	  operands[1], operands[3],
-	  INTVAL (operands[2]), operands[5]);
-
-if (GET_MODE (addr) == V64DImode)
-  emit_insn (gen_scatter_insn_1offset_exec (addr, const0_rtx,
-		  operands[4], const0_rtx,
-		  const0_rtx,
-		  operands[5]));
-else
-  emit_insn (gen_scatter_insn_2offsets_exec (operands[0], addr,
-		   const0_rtx, operands[4],
-		   const0_rtx, const0_rtx,
-		   operands[5]));
-DONE;
-  })
-
 ; Allow any address expression
 (define_expand "scatter_expr"
   [(set (mem:BLK (scratch))
@@ -2795,10 +2767,10 @@
 DONE;
   })
 
-(define_expand "mask_gather_load"
+(define_expand "mask_gather_loadv64si"
   [(match_operand:VEC_ALLREG_MODE 0 "register_operand")
(match_operand:DI 1 "register_operand")
-   (match_operand 2 "register_operand")
+   (match_operand:V64SI 2 "register_operand")
(match_operand 3 "immediate_operand")
(match_operand:SI 4 "gcn_alu_operand")
(match_operand:DI 5 "")]
@@ -2806,16 +2778,6 @@
   {
 rtx exec = force_reg (DImode, operands[5]);
 
-/* TODO: more conversions will be needed when more types are vectorized. */
-if (GET_MODE (operands[2]) == V64DImode)
-  {
-	rtx tmp = gen_reg_rtx (V64SImode);
-	emit_insn (gen_truncv64div64si2_exec (tmp, operands[2],
-	  gcn_gen_undef (V64SImode),
-	  exec));
-	operands[2] = tmp;
-  }
-
 rtx addr = gcn_expand_scaled_offsets (DEFAULT_ADDR_SPACE, operands[1],
 	  operands[2], operands[4],
 	  INTVAL (operands[3]), exec);
@@ -2836,9 +2798,9 @@
 DONE;
   })
 
-(define_expand "mask_scatter_store"
+(define_expand "mask_scatter_storev64si"
   [(match_operand:DI 0 "register_operand")
-   (match_operand 1 "register_operand")
+   (match_operand:V64SI 1 "register_operand")
(match_operand 2 "immediate_operand")
(match_operand:SI 3 "gcn_alu_operand")
(match_operand:VEC_ALLREG_MODE 4 "register_operand")
@@ -2847,18 +2809,20 @@
   {
 rtx exec = force_reg (DImode, operands[5]);
 
-/* TODO: more conversions will be needed when mo

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-02-21 Thread Kyrill Tkachov

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output registers as
> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>> have commit rights, so if this is ok can someone please commit it 
for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
   return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
 }
 
+#pragma GCC push_options

+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the main 
arm_neon.h file, right?
I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


 +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+



Re: [Patch, fortran] PR fortran/92621 Problems with memory handling with allocatable intent(out) arrays with bind(c)

2020-02-21 Thread Tobias Burnus

On 2/21/20 1:56 PM, José Rui Faustino de Sousa wrote:
Since the cfi.n pointer is uninitialized in some infrequent situations 
(using -static-libgfortran seems to do the trick) the pointer seems to 
contain garbage and a segmentation fault is generated


Hmm, that sounds like papering over a real bug. At a glance, I do not 
see anything in the test case which is undefined behaviour – and if it 
is not undefined behaviour, it should work with all optimization options 
and with libgfortran linked both statically and dynamically.


Tobias

PS: I am woefully aware that there a still several patches to be reviewed.



[Patch, fortran] PR fortran/92621 Problems with memory handling with allocatable intent(out) arrays with bind(c)

2020-02-21 Thread José Rui Faustino de Sousa

Hi all!

Proposed patch to solve problems with memory handling with allocatable 
intent(out) arrays with bind(c).


The patch also seems to affect PR92189.

Patch tested only on x86_64-pc-linux-gnu.

The code currently generated tries to deallocate the artificial cfi.n 
pointer before it is associated with the allocatable array.


Since the cfi.n pointer is uninitialized in some infrequent situations 
(using -static-libgfortran seems to do the trick) the pointer seems to 
contain garbage and a segmentation fault is generated.


Since the deallocation is done prior to the cfi.n pointer being 
associated with the allocatable array the memory is never freed and the 
array will be passed still allocated and consequently attempts to 
allocate it will fail.


A diff of only the main code changes without spacing changes is attached 
to facilitate human reviewing.


Thank you very much.

Best regards,
José Rui

2020-2-21  José Rui Faustino de Sousa  

 PR fortran/92621
 * trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Add code to deallocate
 allocatable intent(out) dummy array arguments, slightly rearrange code.
 (gfc_conv_procedure_call): Split if conditional in two branches removes
 unnecessary checks for is_bind_c and obsolete comments from second
 branch.

2020-02-21  José Rui Faustino de Sousa  

 PR fortran/92621
 * bind-c-intent-out.f90: Changes dg-do compile to run, changes regex to
 match the changes in code generation.

2020-02-21  José Rui Faustino de Sousa  

 PR fortran/92621
 * PR92621.f90: New test.


diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 5825a4b..70dd9be 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -5248,6 +5248,39 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, 
gfc_expr *e, gfc_symbol *fsym)

   if (POINTER_TYPE_P (TREE_TYPE (parmse->expr)))
parmse->expr = build_fold_indirect_ref_loc (input_location,
parmse->expr);
+}
+  else
+gfc_conv_expr (parmse, e);
+
+  if (POINTER_TYPE_P (TREE_TYPE (parmse->expr)))
+parmse->expr = build_fold_indirect_ref_loc (input_location,
+   parmse->expr);
+
+  /* If an ALLOCATABLE dummy argument has INTENT(OUT) and is
+ allocated on entry, it must be deallocated.  */
+  if (fsym && fsym->attr.allocatable
+  && fsym->attr.intent == INTENT_OUT)
+{
+  tmp = parmse->expr;
+
+  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp)))
+   tmp = gfc_conv_descriptor_data_get (tmp);
+  tmp = gfc_deallocate_with_status (tmp, NULL_TREE, NULL_TREE,
+   NULL_TREE, NULL_TREE, true,
+   e,
+   GFC_CAF_COARRAY_NOCOARRAY);
+  if (fsym->attr.optional
+ && e->expr_type == EXPR_VARIABLE
+ && e->symtree->n.sym->attr.optional)
+   tmp = fold_build3_loc (input_location, COND_EXPR,
+  void_type_node,
+  gfc_conv_expr_present (e->symtree->n.sym),
+  tmp, build_empty_stmt (input_location));
+  gfc_add_expr_to_block (&parmse->pre, tmp);
+}
+   
+  if (e->rank != 0)
+{
   bool is_artificial = (INDIRECT_REF_P (parmse->expr)
? DECL_ARTIFICIAL (TREE_OPERAND (parmse->expr, 0))
: DECL_ARTIFICIAL (parmse->expr));
@@ -5293,16 +5326,8 @@ gfc_conv_gfc_desc_to_cfi_desc (gfc_se *parmse, 
gfc_expr *e, gfc_symbol *fsym)

}
 }
   else
-{
-  gfc_conv_expr (parmse, e);
-
-  if (POINTER_TYPE_P (TREE_TYPE (parmse->expr)))
-   parmse->expr = build_fold_indirect_ref_loc (input_location,
-   parmse->expr);
-
-  parmse->expr = gfc_conv_scalar_to_descriptor (parmse,
-   parmse->expr, attr);
-}
+parmse->expr = gfc_conv_scalar_to_descriptor (parmse,
+ parmse->expr, attr);

   /* Set the CFI attribute field through a temporary value for the
  gfc attribute.  */
@@ -6170,113 +6195,113 @@ gfc_conv_procedure_call (gfc_se * se, 
gfc_symbol * sym,

/* Implement F2018, C.12.6.1: paragraph (2).  */
gfc_conv_gfc_desc_to_cfi_desc (&parmse, e, fsym);

- else if (e->expr_type == EXPR_VARIABLE
-   && is_subref_array (e)
-   && !(fsym && fsym->attr.pointer))
-   /* The actual argument is a component reference to an
-  array of derived types.  In this case, the argument
-  is converted to a temporary, which is passed and then
-  written back after the procedure call.  */
-   gfc_conv_subref_array_arg (&parmse, e, nodesc_arg,
-   fsym ? fsym->attr.intent : INTENT_INOUT,
-

Re: [PATCH] Add c++2a binary_semaphore

2020-02-21 Thread Sebastian Huber

On 18/02/2020 15:30, Jonathan Wakely wrote:

On 18/02/20 14:48 +0100, Sebastian Huber wrote:

Hello,

On 18/02/2020 07:46, Thomas Rodgers wrote:
This patch adds the c++2a semaphore header and binary_semaphore type. 
The implementation is not complete, this patch is just to solicit 
initial feedback.


how do you plan to implement the binary semaphores? For example, do 
you want to add the binary semaphores to gthr.h or via a mutex and a 
condition variable or via some futex stuff? I ask because I would like 
to support this in RTEMS.


Futexes where possible.


It would be nice if this could be done through an extension of the 
gthr.h API similar to the condition variables (__GTHREAD_HAS_COND).



Using POSIX semaphores might be a good alternative for the
std::counting_semaphore type, 


Yes.


but for std::binary_semaphore we talked
about just using a spinlock based on std::atomic.


I would like to use the binary semaphores for task/interrupt 
synchronization. For this a blocking solution is required.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: [PATCH] sra: Only verify sizes of scalar accesses (PR 93845)

2020-02-21 Thread Richard Biener
On Fri, 21 Feb 2020, Martin Jambor wrote:

> Hi,
> 
> the testcase is another example - in addition to recent PR 93516 - where
> the SRA access verifier is confused by the fact that get_ref_base_extent
> can return different sizes for the same type, depending whether they are
> COMPONENT_REF or not.  In the previous bug I decided to keep the
> verifier check for aggregate type even though it is not really important
> and instead avoid easily detectable type-within-the-same-type situation.
> This testcase is however a result of a fairly random looking type cast
> and so cannot be handled in the same way.
> 
> Because the check is not really important for aggregates, this patch
> simply disables it for non-register types.
> 
> Bootstrapped and tested on x86_64-linux.  OK for trunk?

OK.

Richard.

> Thanks,
> 
> Martin
> 
> 2020-02-20  Martin Jambor  
> 
>   PR tree-optimization/93845
>   * tree-sra.c (verify_sra_access_forest): Only test access size of
>   scalar types.
> 
>   testsuite/
>   * g++.dg/tree-ssa/pr93845.C: New test.
> ---
>  gcc/ChangeLog   |  6 +
>  gcc/testsuite/ChangeLog |  5 +
>  gcc/testsuite/g++.dg/tree-ssa/pr93845.C | 30 +
>  gcc/tree-sra.c  |  3 ++-
>  4 files changed, 43 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr93845.C
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr93845.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr93845.C
> new file mode 100644
> index 000..72e473fffcd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr93845.C
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1" } */
> +
> +struct g;
> +struct h {
> +  g *operator->();
> +};
> +class i {
> +  void *a;
> +  int b;
> +
> +public:
> +  template  f j() { return *static_cast(this); }
> +};
> +struct k : i {};
> +struct l : k {};
> +struct m {
> +  i n();
> +  i o(l, l, int);
> +};
> +struct g {
> +  void m_fn4(k);
> +};
> +h a;
> +i b;
> +i m::n() {
> +  l c, d, e = o(d, c, 0).j();
> +  a->m_fn4(e);
> +  return b;
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 0cfac0a8192..1439f11f15a 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -2339,7 +2339,8 @@ verify_sra_access_forest (struct access *root)
>gcc_assert (offset == access->offset);
>gcc_assert (access->grp_unscalarizable_region
> || size == max_size);
> -  gcc_assert (max_size == access->size);
> +  gcc_assert (!is_gimple_reg_type (access->type)
> +   || max_size == access->size);
>gcc_assert (reverse == access->reverse);
>  
>if (access->first_child)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH, GCC/ARM] Fix MVE scalar shift tests

2020-02-21 Thread Kyrill Tkachov



On 2/21/20 11:51 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 2/19/20 4:27 PM, Mihail Ionescu wrote:

Hi Christophe,

On 01/23/2020 09:34 AM, Christophe Lyon wrote:
> On Mon, 20 Jan 2020 at 19:01, Mihail Ionescu
>  wrote:
>>
>> Hi,
>>
>> This patch fixes the scalar shifts tests added in:
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01195.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01196.html
>> By adding mthumb and ensuring that the target supports
>> thumb2 instructions.
>>
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2020-01-20  Mihail-Calin Ionescu 
>>
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c: 
Add mthumb and target check.
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c: 
Likewise.

>>
>>
>> Is this ok for trunk?
>>
>
> Why not add a new entry in check_effective_target_arm_arch_FUNC_ok?
> (there are already plenty, including v8m_main for instance)
>

Sorry for the delay, we were going to add the check_effective_target
to the MVE framework patches and then update this one. But I came
across some big endian issues and decided to update this now.

I've added the target check and changed the patch so it also
disables the scalar shift patterns when generating big endian
code. At the moment they are broken because the MVE shift instructions
have the restriction of having an even gp register specified first,
followed by the odd one, which requires swapping the data twice in
big endian. In this case, the previous code gen is preferred.



*** gcc/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * config/arm/arm.md (ashldi3, ashrdi3, lshrdi3): Prevent scalar
    shifts from being used on when big endian is enabled.

*** gcc/testsuite/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks.
    * gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise.
    * lib/target-supports.exp
    (check_effective_target_arm_v8_1m_mve_ok_nocache): New.
    (check_effective_target_arm_v8_1m_mve_ok): New.
    (add_options_for_v8_1m_mve): New.

Is this ok for trunk?



This is ok, but please do a follow up patch to add the new effective 
target check to sourcebuild.texi (I know, we tend to forget to do it!)




I should say that such a patch is pre-approved.

Thanks,

Kyrill







> Christophe
>
>>
>> Regards,
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply    
###

>>
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> index 
5ffa3769e6ba42466242d3038857734e87b2f1fc..9822f59643c662c9302ad43c09057c59f3cbe07a 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>
>>   long long longval1;
>>   long long unsigned longval2;
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> index 
a97e9d687ef66e9642dd1d735125c8ee941fb151..a9aa7ed3ad9204c03d2c15dc6920ca3159403fa0 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok  } */
>>
>>   long long longval2;
>>   int intval2;
>>

Regards,
Mihail


[committed] amdgcn: fix mode in vec_series

2020-02-21 Thread Andrew Stubbs

This patch fixes any obvious typo in the definition of vec_seriesv64di.

It's never worked, so the fact it's taken this long for me to notice 
shows how little the middle-end takes advantage of this pattern. :-(


Andrew

amdgcn: fix mode in vec_series

2020-02-20  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (vec_seriesv64di): Use gen_vec_duplicatev64di.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index ecdd60b8190..edac362fd46 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3156,7 +3156,7 @@
 rtx op1vec = gen_reg_rtx (V64DImode);
 
 emit_insn (gen_mulv64di3_zext_dup2 (tmp, v1, operands[2]));
-emit_insn (gen_vec_duplicatev64si (op1vec, operands[1]));
+emit_insn (gen_vec_duplicatev64di (op1vec, operands[1]));
 emit_insn (gen_addv64di3 (operands[0], tmp, op1vec));
 DONE;
   })



[committed] amdgcn: Align VGPR pairs

2020-02-21 Thread Andrew Stubbs
This patch changes the way VGPR register pairs (for 64-bit values) are 
allocated.


There are no hardware restrictions on the alignment of such pairs 
(unlike for scalar registers), but there's also not a full set of 64-bit 
instructions, meaning that many operations get decomposed into two or 
more real instructions.


This creates an early-clobber problem when the inputs and outputs 
partially overlap, so up to now we've been adding the early-clobber 
constraint and fixing it that way.


To complicate matters, most of the patterns don't have any trouble if 
the inputs and output registers match exactly, and often having them do 
so reduces register pressure, so we've been adding '0' match-constraints 
to allow that.


All this works, but there have been several bugs with missed 
early-clobber cases, and several more where the match-constraints 
conflict with other "real" match constraints, or generally confuse LRA. 
The fix is usually to explode the number of alternatives. The presence 
of these constraints also tends to mess up the alternative scoring 
system, which can make for suboptimal decisions. To make things worse 
the exact effects tend to change over time, creating an ongoing 
maintenance burden.


This patch forces registers pairs to be allocated aligned, and removes 
all the early-clobber work-arounds, leaving only actual early-clobber cases.


Andrew

amdgcn: Align VGPR pairs

Aligning the registers is not needed by the architecture, but doing so
allows us to remove the requirement for bug-prone early-clobber
constraints from many split patterns (and avoid adding more in future).

2020-02-20  Andrew Stubbs  

	gcc/
	* config/gcn/gcn.c (gcn_hard_regno_mode_ok): Align VGPR pairs.
	* config/gcn/gcn-valu.md (addv64di3): Remove early-clobber.
	(addv64di3_exec): Likewise.
	(subv64di3): Likewise.
	(subv64di3_exec): Likewise.
	(addv64di3_zext): Likewise.
	(addv64di3_zext_exec): Likewise.
	(addv64di3_zext_dup): Likewise.
	(addv64di3_zext_dup_exec): Likewise.
	(addv64di3_zext_dup2): Likewise.
	(addv64di3_zext_dup2_exec): Likewise.
	(addv64di3_sext_dup2): Likewise.
	(addv64di3_sext_dup2_exec): Likewise.
	(v64di3): Likewise.
	(v64di3_exec): Likewise.
	(*_dpp_shr_v64di): Likewise.
	(*plus_carry_dpp_shr_v64di): Likewise.
	* config/gcn/gcn.md (adddi3): Likewise.
	(addptrdi3): Likewise.
	(di3): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 6b774b1ef4c..5519c12d03c 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -1143,10 +1143,10 @@
(set_attr "length" "4,8,4,8")])
 
 (define_insn_and_split "addv64di3"
-  [(set (match_operand:V64DI 0 "register_operand"   "= &v,  &v")
+  [(set (match_operand:V64DI 0 "register_operand"   "=  v")
 	(plus:V64DI
-	  (match_operand:V64DI 1 "register_operand" "%vDb,vDb0")
-	  (match_operand:V64DI 2 "gcn_alu_operand"  "vDb0, vDb")))
+	  (match_operand:V64DI 1 "register_operand" "%vDb")
+	  (match_operand:V64DI 2 "gcn_alu_operand"  " vDb")))
(clobber (reg:DI VCC_REG))]
   ""
   "#"
@@ -1172,14 +1172,13 @@
(set_attr "length" "8")])
 
 (define_insn_and_split "addv64di3_exec"
-  [(set (match_operand:V64DI 0 "register_operand"	 "= &v,  &v, &v")
+  [(set (match_operand:V64DI 0 "register_operand"		  "=  v")
 	(vec_merge:V64DI
 	  (plus:V64DI
-	(match_operand:V64DI 1 "register_operand"	 "%vDb,vDb0,vDb")
-	(match_operand:V64DI 2 "gcn_alu_operand"	 "vDb0, vDb,vDb"))
-	  (match_operand:V64DI 3 "gcn_register_or_unspec_operand"
-			 "   U,   U,  0")
-	  (match_operand:DI 4 "gcn_exec_reg_operand"	 "   e,   e,  e")))
+	(match_operand:V64DI 1 "register_operand"		  "%vDb")
+	(match_operand:V64DI 2 "gcn_alu_operand"		  " vDb"))
+	  (match_operand:V64DI 3 "gcn_register_or_unspec_operand" "  U0")
+	  (match_operand:DI 4 "gcn_exec_reg_operand"		  "   e")))
(clobber (reg:DI VCC_REG))]
   ""
   "#"
@@ -1210,10 +1209,10 @@
(set_attr "length" "8")])
 
 (define_insn_and_split "subv64di3"
-  [(set (match_operand:V64DI 0 "register_operand"  "=&v,  &v,  &v, &v")
-	(minus:V64DI 
-	  (match_operand:V64DI 1 "gcn_alu_operand" "vDb,vDb0,   v, v0")
-	  (match_operand:V64DI 2 "gcn_alu_operand" " v0,   v,vDb0,vDb")))
+  [(set (match_operand:V64DI 0 "register_operand"  "= v,  v")
+	(minus:V64DI
+	  (match_operand:V64DI 1 "gcn_alu_operand" "vDb,  v")
+	  (match_operand:V64DI 2 "gcn_alu_operand" "  v,vDb")))
(clobber (reg:DI VCC_REG))]
   ""
   "#"
@@ -1239,14 +1238,13 @@
(set_attr "length" "8")])
 
 (define_insn_and_split "subv64di3_exec"
-  [(set (match_operand:V64DI 0 "register_operand""= &v,   &v,   &v,  &v")
+  [(set (match_operand:V64DI 0 "register_operand"		 "=  v,   v")
 	(vec_merge:V64DI 
 	  (minus:V64DI   
-	(match_operand:V64DI 1 "gcn_alu_operand" "vSvB,vSvB0,v,  v0")

[PATCH] sra: Only verify sizes of scalar accesses (PR 93845)

2020-02-21 Thread Martin Jambor
Hi,

the testcase is another example - in addition to recent PR 93516 - where
the SRA access verifier is confused by the fact that get_ref_base_extent
can return different sizes for the same type, depending whether they are
COMPONENT_REF or not.  In the previous bug I decided to keep the
verifier check for aggregate type even though it is not really important
and instead avoid easily detectable type-within-the-same-type situation.
This testcase is however a result of a fairly random looking type cast
and so cannot be handled in the same way.

Because the check is not really important for aggregates, this patch
simply disables it for non-register types.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin

2020-02-20  Martin Jambor  

PR tree-optimization/93845
* tree-sra.c (verify_sra_access_forest): Only test access size of
scalar types.

testsuite/
* g++.dg/tree-ssa/pr93845.C: New test.
---
 gcc/ChangeLog   |  6 +
 gcc/testsuite/ChangeLog |  5 +
 gcc/testsuite/g++.dg/tree-ssa/pr93845.C | 30 +
 gcc/tree-sra.c  |  3 ++-
 4 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr93845.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr93845.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr93845.C
new file mode 100644
index 000..72e473fffcd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr93845.C
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+struct g;
+struct h {
+  g *operator->();
+};
+class i {
+  void *a;
+  int b;
+
+public:
+  template  f j() { return *static_cast(this); }
+};
+struct k : i {};
+struct l : k {};
+struct m {
+  i n();
+  i o(l, l, int);
+};
+struct g {
+  void m_fn4(k);
+};
+h a;
+i b;
+i m::n() {
+  l c, d, e = o(d, c, 0).j();
+  a->m_fn4(e);
+  return b;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 0cfac0a8192..1439f11f15a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2339,7 +2339,8 @@ verify_sra_access_forest (struct access *root)
   gcc_assert (offset == access->offset);
   gcc_assert (access->grp_unscalarizable_region
  || size == max_size);
-  gcc_assert (max_size == access->size);
+  gcc_assert (!is_gimple_reg_type (access->type)
+ || max_size == access->size);
   gcc_assert (reverse == access->reverse);
 
   if (access->first_child)
-- 
2.25.0



Re: [PATCH, GCC/ARM] Fix MVE scalar shift tests

2020-02-21 Thread Kyrill Tkachov

Hi Mihail,

On 2/19/20 4:27 PM, Mihail Ionescu wrote:

Hi Christophe,

On 01/23/2020 09:34 AM, Christophe Lyon wrote:
> On Mon, 20 Jan 2020 at 19:01, Mihail Ionescu
>  wrote:
>>
>> Hi,
>>
>> This patch fixes the scalar shifts tests added in:
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01195.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01196.html
>> By adding mthumb and ensuring that the target supports
>> thumb2 instructions.
>>
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2020-01-20  Mihail-Calin Ionescu 
>>
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c: Add 
mthumb and target check.
>>  * gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c: 
Likewise.

>>
>>
>> Is this ok for trunk?
>>
>
> Why not add a new entry in check_effective_target_arm_arch_FUNC_ok?
> (there are already plenty, including v8m_main for instance)
>

Sorry for the delay, we were going to add the check_effective_target
to the MVE framework patches and then update this one. But I came
across some big endian issues and decided to update this now.

I've added the target check and changed the patch so it also
disables the scalar shift patterns when generating big endian
code. At the moment they are broken because the MVE shift instructions
have the restriction of having an even gp register specified first,
followed by the odd one, which requires swapping the data twice in
big endian. In this case, the previous code gen is preferred.



*** gcc/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * config/arm/arm.md (ashldi3, ashrdi3, lshrdi3): Prevent scalar
    shifts from being used on when big endian is enabled.

*** gcc/testsuite/ChangeLog ***

2020-02-19  Mihail-Calin Ionescu 

    * gcc.target/arm/armv8_1m-shift-imm-1.c: Add MVE target checks.
    * gcc.target/arm/armv8_1m-shift-reg-1.c: Likewise.
    * lib/target-supports.exp
    (check_effective_target_arm_v8_1m_mve_ok_nocache): New.
    (check_effective_target_arm_v8_1m_mve_ok): New.
    (add_options_for_v8_1m_mve): New.

Is this ok for trunk?



This is ok, but please do a follow up patch to add the new effective 
target check to sourcebuild.texi (I know, we tend to forget to do it!)


Thanks,

Kyrill




> Christophe
>
>>
>> Regards,
>> Mihail
>>
>>
>> ### Attachment also inlined for ease of reply    
###

>>
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> index 
5ffa3769e6ba42466242d3038857734e87b2f1fc..9822f59643c662c9302ad43c09057c59f3cbe07a 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-imm-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" 
} */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>
>>   long long longval1;
>>   long long unsigned longval2;
>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> index 
a97e9d687ef66e9642dd1d735125c8ee941fb151..a9aa7ed3ad9204c03d2c15dc6920ca3159403fa0 
100644

>> --- a/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-shift-reg-1.c
>> @@ -1,5 +1,6 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -march=armv8.1-m.main+mve -mfloat-abi=softfp" 
} */
>> +/* { dg-options "-O2 -mthumb -march=armv8.1-m.main+mve 
-mfloat-abi=softfp" } */

>> +/* { dg-require-effective-target arm_thumb2_ok  } */
>>
>>   long long longval2;
>>   int intval2;
>>

Regards,
Mihail


Re: [Ping][PATCH][Arm] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-21 Thread Kyrill Tkachov

Hi Dennis,

On 2/11/20 12:03 PM, Dennis Zhang wrote:

Hi all,

On 16/12/2019 13:45, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on the Arm Armv8.6-A CLI patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html.
> It also depends on the Armv8.6-A effective target checking patch,
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html.
> It also depends on the ARMv8.6-A I8MM dot product patch for using the
> same builtin qualifier
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00945.html.
>
> This patch adds intrinsics for matrix multiply-accumulate operations
> including vmmlaq_s32, vmmlaq_u32, and vusmmlaq_s32.
>
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for arm-none-linux-gnueabi-armv8.2-a.
>
> Is it OK for trunk please?
>


This is ok.

Thanks,

Kyrill




> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2019-12-10  Dennis Zhang  
>
>  * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, vusmmlaq_s32): 
New.

>  * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
>  * config/arm/iterators.md (MATMUL): New.
>  (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
>  (mmla_sfx): New.
>  * config/arm/neon.md (neon_mmlav16qi): New.
>  * config/arm/unspecs.md (UNSPEC_MATMUL_S): New.
>  (UNSPEC_MATMUL_U, UNSPEC_MATMUL_US): New.
>
> gcc/testsuite/ChangeLog:
>
> 2019-12-10  Dennis Zhang  
>
>  * gcc.target/arm/simd/vmmla_1.c: New test.

This patch has been updated according to the feedback on related AArch64
version at https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01591.html

Regtested. OK to commit please?

Many thanks
Dennis

gcc/ChangeLog:

2020-02-11  Dennis Zhang  

    * config/arm/arm-builtins.c (USTERNOP_QUALIFIERS): New macro.
    * config/arm/arm_neon.h (vmmlaq_s32, vmmlaq_u32, 
vusmmlaq_s32): New.

    * config/arm/arm_neon_builtins.def (smmla, ummla, usmmla): New.
    * config/arm/iterators.md (MATMUL): New iterator.
    (sup): Add UNSPEC_MATMUL_S, UNSPEC_MATMUL_U, and UNSPEC_MATMUL_US.
    (mmla_sfx): New attribute.
    * config/arm/neon.md (neon_mmlav16qi): New.
    * config/arm/unspecs.md (UNSPEC_MATMUL_S, UNSPEC_MATMUL_U): New.
    (UNSPEC_MATMUL_US): New.

gcc/testsuite/ChangeLog:

2020-02-11  Dennis Zhang  

    * gcc.target/arm/simd/vmmla_1.c: New test.


Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD

2020-02-21 Thread Kyrill Tkachov

Hi Delia,

On 2/19/20 5:23 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor 
formatting changes that were brought up by Richard Sandiford in the 
AArch64 patches


Thanks,
Delia

On 1/31/20 3:23 PM, Delia Burduv wrote:
Here is the updated patch. The changes are minor, so let me know if 
there is anything else to fix or if it can be committed.


Thank you,
Delia

On 1/30/20 2:55 PM, Kyrill Tkachov wrote:

Hi Delia,


On 1/28/20 4:44 PM, Delia Burduv wrote:

Ping.
 


*From:* Delia Burduv 
*Sent:* 22 January 2020 17:26
*To:* gcc-patches@gcc.gnu.org 
*Cc:* ni...@redhat.com ; Richard Earnshaw 
; Ramana Radhakrishnan 
; Kyrylo Tkachov 

*Subject:* Re: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla 
and vfma for AArch32 AdvSIMD

Ping.

I have read Richard Sandiford's comments on the AArch64 patches and I
will apply what is relevant to this patch as well. Particularly, I 
will
change the tests to use the exact input and output registers and I 
will

change the types of the rtl patterns.



Please send the updated patches so that someone can commit them for 
you once they're reviewed.


Thanks,

Kyrill




On 12/20/19 6:44 PM, Delia Burduv wrote:
> This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and 
vfmat

> as part of the BFloat16 extension.
> (https://developer.arm.com/docs/101028/latest.)
> The intrinsics are declared in arm_neon.h and the RTL patterns are
> defined in neon.md.
> Two new tests are added to check assembler output and lane indices.
>
> This patch depends on the Arm back-end patche.
> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>
> Tested for regression on arm-none-eabi and armeb-none-eabi. I 
don't have

> commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-11-12� Delia Burduv 
>
>� ����* config/arm/arm_neon.h (vbfmmlaq_f32): New.
>� ����� (vbfmlalbq_f32): New.
>� ����� (vbfmlaltq_f32): New.
>� ����� (vbfmlalbq_lane_f32): New.
>� ����� (vbfmlaltq_lane_f32): New.
>� ������� (vbfmlalbq_laneq_f32): New.
>� ����� (vbfmlaltq_laneq_f32): New.
>� ����* config/arm/arm_neon_builtins.def (vbfmmla): New.
>� ��������� (vbfmab): New.
>� ��������� (vbfmat): New.
>� ��������� (vbfmab_lane): New.
>� ��������� (vbfmat_lane): New.
>� ��������� (vbfmab_laneq): New.
>� ��������� (vbfmat_laneq): New.
>� ���� * config/arm/iterators.md (BF_MA): New int iterator.
>� ��������� (bt): New int attribute.
>� ��������� (VQXBF): Copy of VQX with V8BF.
>� ��������� (V_HALF): Added V8BF.
>� ����� * config/arm/neon.md (neon_vbfmmlav8hi): New 
insn.

>� ��������� (neon_vbfmav8hi): New insn.
>� ��������� (neon_vbfma_lanev8hi): New insn.
>� ��������� (neon_vbfma_laneqv8hi): New 
expand.
>� ��������� (neon_vget_high): Changed 
iterator to VQXBF.

>� ����* config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC.
>� ��������� (UNSPEC_BFMAB): New UNSPEC.
>� ��������� (UNSPEC_BFMAT): New UNSPEC.
>
> 2019-11-12� Delia Burduv 
>
>� ������� * gcc.target/arm/simd/bf16_ma_1.c: New 
test.
>� ������� * gcc.target/arm/simd/bf16_ma_2.c: New 
test.
>� ������� * gcc.target/arm/simd/bf16_mmla_1.c: New 
test.


This looks good, a few minor things though...


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..81f8008ea6a5fb11eb09f6685ba24bb0c54fb248
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,64 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
   return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
 }
 
+#pragma GCC push_options

+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmmlav8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmabv8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_neon_vbfmatv8bf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribu

[committed] aarch64: Add SVE support for -mlow-precision-sqrt

2020-02-21 Thread Richard Sandiford
SVE was missing support for -mlow-precision-sqrt, which meant that
-march=armv8.2-a+sve -mlow-precision-sqrt could cause a performance
regression compared to -march=armv8.2-a -mlow-precision-sqrt.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-02-21  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Add SVE
support.  Use aarch64_emit_mult instead of emitting multiplication
instructions directly.
* config/aarch64/aarch64-sve.md (sqrt2, rsqrt2)
(@aarch64_rsqrte, @aarch64_rsqrts): New expanders.

gcc/testsuite/
* gcc.target/aarch64/sve/rsqrt_1.c: New test.
* gcc.target/aarch64/sve/rsqrt_1_run.c: Likewise.
* gcc.target/aarch64/sve/sqrt_1.c: Likewise.
* gcc.target/aarch64/sve/sqrt_1_run.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 56 +-
 gcc/config/aarch64/aarch64.c  | 58 +--
 gcc/config/aarch64/iterators.md   | 13 +
 .../gcc.target/aarch64/sve/rsqrt_1.c  | 27 +
 .../gcc.target/aarch64/sve/rsqrt_1_run.c  | 27 +
 gcc/testsuite/gcc.target/aarch64/sve/sqrt_1.c | 30 ++
 .../gcc.target/aarch64/sve/sqrt_1_run.c   | 27 +
 7 files changed, 219 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/rsqrt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/rsqrt_1_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sqrt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sqrt_1_run.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index e3b1da89c1a..a661b257109 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -76,6 +76,8 @@
 ;;  [INT] Logical inverse
 ;;  [FP<-INT] General unary arithmetic that maps to unspecs
 ;;  [FP] General unary arithmetic corresponding to unspecs
+;;  [FP] Square root
+;;  [FP] Reciprocal square root
 ;;  [PRED] Inverse
 
 ;; == Binary arithmetic
@@ -3246,7 +3248,7 @@ (define_insn "@aarch64_sve_"
 ;; - FRINTP
 ;; - FRINTX
 ;; - FRINTZ
-;; - FRSQRT
+;; - FRSQRTE
 ;; - FSQRT
 ;; -
 
@@ -3267,7 +3269,7 @@ (define_expand "2"
  [(match_dup 2)
   (const_int SVE_RELAXED_GP)
   (match_operand:SVE_FULL_F 1 "register_operand")]
- SVE_COND_FP_UNARY))]
+ SVE_COND_FP_UNARY_OPTAB))]
   "TARGET_SVE"
   {
 operands[2] = aarch64_ptrue_reg (mode);
@@ -3357,6 +3359,56 @@ (define_insn_and_rewrite "*cond__any"
   [(set_attr "movprfx" "*,yes,yes")]
 )
 
+;; -
+;;  [FP] Square root
+;; -
+
+(define_expand "sqrt2"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+   (unspec:SVE_FULL_F
+ [(match_dup 2)
+  (const_int SVE_RELAXED_GP)
+  (match_operand:SVE_FULL_F 1 "register_operand")]
+ UNSPEC_COND_FSQRT))]
+  "TARGET_SVE"
+{
+  if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
+DONE;
+  operands[2] = aarch64_ptrue_reg (mode);
+})
+
+;; -
+;;  [FP] Reciprocal square root
+;; -
+
+(define_expand "rsqrt2"
+  [(set (match_operand:SVE_FULL_SDF 0 "register_operand")
+   (unspec:SVE_FULL_SDF
+ [(match_operand:SVE_FULL_SDF 1 "register_operand")]
+ UNSPEC_RSQRT))]
+  "TARGET_SVE"
+{
+  aarch64_emit_approx_sqrt (operands[0], operands[1], true);
+  DONE;
+})
+
+(define_expand "@aarch64_rsqrte"
+  [(set (match_operand:SVE_FULL_SDF 0 "register_operand")
+   (unspec:SVE_FULL_SDF
+ [(match_operand:SVE_FULL_SDF 1 "register_operand")]
+ UNSPEC_RSQRTE))]
+  "TARGET_SVE"
+)
+
+(define_expand "@aarch64_rsqrts"
+  [(set (match_operand:SVE_FULL_SDF 0 "register_operand")
+   (unspec:SVE_FULL_SDF
+ [(match_operand:SVE_FULL_SDF 1 "register_operand")
+  (match_operand:SVE_FULL_SDF 2 "register_operand")]
+ UNSPEC_RSQRTS))]
+  "TARGET_SVE"
+)
+
 ;; -
 ;;  [PRED] Inverse
 ;; -
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c1bbc4917c7..703f69a8b42 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12790,6 +12790,9 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
 /* Caller assumes we cannot fail.  */
 gcc_assert (use_rsqrt_p (mode));
 
+  rtx pg = NULL_RTX;
+  if (aarch64_sve_mode_p (mode))
+pg = aarch64_ptrue_reg (aarch64_sve_pred_mode (mode));
   machine_mode mmsk = (VECTOR_

[committed] aarch64: Add SVE support for -mlow-precision-div

2020-02-21 Thread Richard Sandiford
SVE was missing support for -mlow-precision-div, which meant that
-march=armv8.2-a+sve -mlow-precision-div could cause a performance
regression compared to -march=armv8.2-a -mlow-precision-div.

I ended up doing this much later than originally intended, sorry...

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-02-21  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_emit_mult): New function.
(aarch64_emit_approx_div): Add SVE support.  Use aarch64_emit_mult
instead of emitting multiplication instructions directly.
* config/aarch64/iterators.md (SVE_COND_FP_BINARY_OPTAB): New iterator.
* config/aarch64/aarch64-sve.md (div3, @aarch64_frecpe)
(@aarch64_frecps): New expanders.

gcc/testsuite/
* gcc.target/aarch64/sve/recip_1.c: New test.
* gcc.target/aarch64/sve/recip_1_run.c: Likewise.
* gcc.target/aarch64/sve/recip_2.c: Likewise.
* gcc.target/aarch64/sve/recip_2_run.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 44 ++-
 gcc/config/aarch64/aarch64.c  | 29 ++--
 gcc/config/aarch64/iterators.md   | 11 +
 .../gcc.target/aarch64/sve/recip_1.c  | 27 
 .../gcc.target/aarch64/sve/recip_1_run.c  | 27 
 .../gcc.target/aarch64/sve/recip_2.c  | 27 
 .../gcc.target/aarch64/sve/recip_2_run.c  | 30 +
 7 files changed, 191 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index fa3852992e1..e3b1da89c1a 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -99,6 +99,7 @@
 ;;  [FP] Subtraction
 ;;  [FP] Absolute difference
 ;;  [FP] Multiplication
+;;  [FP] Division
 ;;  [FP] Binary logical operations
 ;;  [FP] Sign copying
 ;;  [FP] Maximum and minimum
@@ -4719,7 +4720,7 @@ (define_expand "3"
   (const_int SVE_RELAXED_GP)
   (match_operand:SVE_FULL_F 1 "")
   (match_operand:SVE_FULL_F 2 "")]
- SVE_COND_FP_BINARY))]
+ SVE_COND_FP_BINARY_OPTAB))]
   "TARGET_SVE"
   {
 operands[3] = aarch64_ptrue_reg (mode);
@@ -5455,6 +5456,47 @@ (define_insn "@aarch64_mul_lane_"
   "fmul\t%0., %1., %2.[%3]"
 )
 
+;; -
+;;  [FP] Division
+;; -
+;; The patterns in this section are synthetic.
+;; -
+
+(define_expand "div3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+   (unspec:SVE_FULL_F
+ [(match_dup 3)
+  (const_int SVE_RELAXED_GP)
+  (match_operand:SVE_FULL_F 1 "nonmemory_operand")
+  (match_operand:SVE_FULL_F 2 "register_operand")]
+ UNSPEC_COND_FDIV))]
+  "TARGET_SVE"
+  {
+if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+  DONE;
+
+operands[1] = force_reg (mode, operands[1]);
+operands[3] = aarch64_ptrue_reg (mode);
+  }
+)
+
+(define_expand "@aarch64_frecpe"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+   (unspec:SVE_FULL_F
+ [(match_operand:SVE_FULL_F 1 "register_operand")]
+ UNSPEC_FRECPE))]
+  "TARGET_SVE"
+)
+
+(define_expand "@aarch64_frecps"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+   (unspec:SVE_FULL_F
+ [(match_operand:SVE_FULL_F 1 "register_operand")
+  (match_operand:SVE_FULL_F 2 "register_operand")]
+ UNSPEC_FRECPS))]
+  "TARGET_SVE"
+)
+
 ;; -
 ;;  [FP] Binary logical operations
 ;; -
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0acaa06b91c..c1bbc4917c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12739,6 +12739,25 @@ aarch64_builtin_reciprocal (tree fndecl)
   gcc_unreachable ();
 }
 
+/* Emit code to perform the floating-point operation:
+
+ DST = SRC1 * SRC2
+
+   where all three operands are already known to be registers.
+   If the operation is an SVE one, PTRUE is a suitable all-true
+   predicate.  */
+
+static void
+aarch64_emit_mult (rtx dst, rtx ptrue, rtx src1, rtx src2)
+{
+  if (ptrue)
+emit_insn (gen_aarch64_pred (UNSPEC_COND_FMUL, GET_MODE (dst),
+dst, ptrue, src1, src2,
+gen_int_mode (SVE_RELAXED_GP, SImode)));
+  else
+emit_set_insn (dst,

[wwwdocs] Document more libstdc++ changes

2020-02-21 Thread Jonathan Wakely
Committed to wwwdocs git.


commit 578a32e2f9215ccf96bd580d275fa12c22aa45a5
Author: Jonathan Wakely 
Date:   Fri Feb 21 10:27:39 2020 +

Document more libstdc++ changes

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 5a959a10..2920714a 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -348,13 +348,22 @@ a work-in-progress.
   
std::ssize, std::to_array. 
   
- and some parts of
-.
+Library concepts in  and
+.
   
-   . 
-   . 
+  
+Constrained algorithms in  and
+ (thanks to Patrick Palka).
+  
+   Three-way comparisons in . 
+  
+std::construct_at, std::destroy,
+constexpr std::allocator.
+  
+   Mathematical constants in . 
 
   
+  Support for RDSEED in std::random_device.
   
 Reduced header dependencies, leading to faster compilation for some code.
   


[committed] aarch64: Bump AARCH64_APPROX_MODE to 64 bits

2020-02-21 Thread Richard Sandiford
We now have more than 32 scalar and vector float modes, so the
32-bit AARCH64_APPROX_MODE would invoke UB for some of them.
Bumping to a 64-bit mask fixes that... for now.

Ideally we'd have a static assert to trap this, but logically
it would go at file scope.  I think it would be better to wait
until the switch to C++11, so that we can use static_assert
directly.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-02-21  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (AARCH64_APPROX_MODE): Operate
on and produce uint64_ts rather than ints.
(AARCH64_APPROX_NONE, AARCH64_APPROX_ALL): Change to uint64_ts.
(cpu_approx_modes): Change the fields from unsigned int to uint64_t.
---
 gcc/config/aarch64/aarch64-protos.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d29975a8921..d6d668ea920 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -212,20 +212,20 @@ struct cpu_branch_cost
 /* Control approximate alternatives to certain FP operators.  */
 #define AARCH64_APPROX_MODE(MODE) \
   ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
-   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   ? ((uint64_t) 1 << ((MODE) - MIN_MODE_FLOAT)) \
: (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
- ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
- + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+ ? ((uint64_t) 1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
++ MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
  : (0))
-#define AARCH64_APPROX_NONE (0)
-#define AARCH64_APPROX_ALL (-1)
+#define AARCH64_APPROX_NONE ((uint64_t) 0)
+#define AARCH64_APPROX_ALL (~(uint64_t) 0)
 
 /* Allowed modes for approximations.  */
 struct cpu_approx_modes
 {
-  const unsigned int division; /* Division.  */
-  const unsigned int sqrt; /* Square root.  */
-  const unsigned int recip_sqrt;   /* Reciprocal square root.  */
+  const uint64_t division; /* Division.  */
+  const uint64_t sqrt; /* Square root.  */
+  const uint64_t recip_sqrt;   /* Reciprocal square root.  */
 };
 
 /* Cache prefetch settings for prefetch-loop-arrays.  */


[committed] aarch64: Avoid creating an unused register

2020-02-21 Thread Richard Sandiford
The rsqrt path of aarch64_emit_approx_sqrt created a pseudo
register that it never used.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-02-21  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Don't create
an unused xmsk register when handling approximate rsqrt.
---
 gcc/config/aarch64/aarch64.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6fb567ae4bf..0acaa06b91c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12774,14 +12774,17 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
   machine_mode mmsk = (VECTOR_MODE_P (mode)
   ? related_int_vector_mode (mode).require ()
   : int_mode_for_mode (mode).require ());
-  rtx xmsk = gen_reg_rtx (mmsk);
+  rtx xmsk = NULL_RTX;
   if (!recp)
-/* When calculating the approximate square root, compare the
-   argument with 0.0 and create a mask.  */
-emit_insn (gen_rtx_SET (xmsk,
-   gen_rtx_NEG (mmsk,
-gen_rtx_EQ (mmsk, src,
-CONST0_RTX (mode);
+{
+  /* When calculating the approximate square root, compare the
+argument with 0.0 and create a mask.  */
+  xmsk = gen_reg_rtx (mmsk);
+  emit_insn (gen_rtx_SET (xmsk,
+ gen_rtx_NEG (mmsk,
+  gen_rtx_EQ (mmsk, src,
+  CONST0_RTX (mode);
+}
 
   /* Estimate the approximate reciprocal square root.  */
   rtx xdst = gen_reg_rtx (mode);


[committed] aarch64: Fix inverted approx_sqrt condition

2020-02-21 Thread Richard Sandiford
The fix for PR80530 included an accidental flipping of the
flag_finite_math_only check, so that -ffinite-math-only (and thus
-ffast-math) disabled approximate sqrt rather than enabling it.

This is tested by later patches but seemed worth splitting out.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-02-21  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Fix inverted
flag_finite_math_only condition.
---
 gcc/config/aarch64/aarch64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4a34dce5d79..6fb567ae4bf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12761,7 +12761,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
& AARCH64_APPROX_MODE (mode
return false;
 
-  if (flag_finite_math_only
+  if (!flag_finite_math_only
  || flag_trapping_math
  || !flag_unsafe_math_optimizations
  || optimize_function_for_size_p (cfun))


[PATCH] fix -fdebug-prefix-map without gas .file support

2020-02-21 Thread Richard Biener
This applies file mapping when emitting the directory table
directly instead of using the assemblers .file directive where
we already correctly apply the map.  Notably the non-assembler
path is used for the early debug emission for LTO.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

2020-02-21  Mark Williams  

* dwarf2out.c (file_name_acquire): Call remap_debug_filename.
---
 gcc/dwarf2out.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 577be3d52d3..ba9da0f2cc2 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -12205,8 +12205,9 @@ file_name_acquire (dwarf_file_data **slot, 
file_name_acquire_data *fnad)
 
   fi = fnad->files + fnad->used_files++;
 
+  f = remap_debug_filename (d->filename);
+
   /* Skip all leading "./".  */
-  f = d->filename;
   while (f[0] == '.' && IS_DIR_SEPARATOR (f[1]))
 f += 2;
 
-- 
2.16.4


Re: [PATCH] rs6000: Fix infinite loop building ghostscript and icu [PR93658]

2020-02-21 Thread Richard Biener
On Thu, Feb 20, 2020 at 6:33 PM Peter Bergner  wrote:
>
> On 2/20/20 1:47 AM, Segher Boessenkool wrote:
> > On Wed, Feb 19, 2020 at 09:17:26PM -0600, Peter Bergner wrote:
> >> This passed bootstrap and regtesting on powerpc64le-linux and 
> >> powerpc64-linux
> >> (in both 32-bit and 64-bit modes) with no regressions.  Ok for trunk?
> >> The same bug exists in FSF 9 anf FSF 8 branches.  Ok for those too after
> >> some burn in on trunk and clean regtests on the backports?
> >
> > Okay for all.  You may want to check it into 9 a bit faster than usual,
> > to meet the release schedule.  It should be perfectly safe enough for
> > that.  Do run the regstraps, of course ;-)
>
> Ok, I pushed the trunk fix now.  I'll kick off the release branch
> backports now.
>
> Jakub, I know you're getting the GCC 8.4 release ready.  Is this fix ok
> for FSF 8 now or do you want me to wait until after 8.4 is out?

It's OK for 8.4.

Richard.

>
> Peter
>
>


Re: [PATCH] tree-optimization/93586 - bogus path-based disambiguation

2020-02-21 Thread Richard Biener
On Thu, 20 Feb 2020, Jan Hubicka wrote:

> > This fixes bogus path-based disambiguation of mismatched array shape
> > accesses.
> > 
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > 
> > Honza, is this the correct place to detect this or were we not
> > supposed to arrive there?
> > 
> > Thanks,
> > Richard.
> > 
> > 2020-02-17  Richard Biener  
> > 
> > PR tree-optimization/93586
> > * tree-ssa-alias.c (nonoverlapping_array_refs_p): Fail when
> > we're obviously not looking at same-shaped array accesses.
> > 
> > * gcc.dg/torture/pr93586.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/torture/pr93586.c | 21 +
> >  gcc/tree-ssa-alias.c   |  5 +
> >  2 files changed, 26 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr93586.c
> > 
> > diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> > index b8df66ac1f2..e7caf9bee77 100644
> > --- a/gcc/tree-ssa-alias.c
> > +++ b/gcc/tree-ssa-alias.c
> > @@ -1291,6 +1291,11 @@ nonoverlapping_array_refs_p (tree ref1, tree ref2)
> >  
> >tree elmt_type1 = TREE_TYPE (TREE_TYPE (TREE_OPERAND (ref1, 0)));
> >tree elmt_type2 = TREE_TYPE (TREE_TYPE (TREE_OPERAND (ref2, 0)));
> > +  /* If one element is an array but not the other there's an obvious
> > + mismatch in dimensionality.  */
> > +  if ((TREE_CODE (elmt_type1) == ARRAY_TYPE)
> > +  != (TREE_CODE (elmt_type2) == ARRAY_TYPE))
> > +return -1;
> 
> The problem happens earlier.  The function is not supposed to give
> meaningful results when bases of ref1 and ref2 are not same or
> completely disjoint and here it is called on c[0][j_2][0] and c[0][1] so
> bases in sence of this functions are "c[0][j_2]" and "c[0]" which do
> partially overlap.
> 
> The problem is in nonoverlapping_array_refs that walks
> pair of array references and in this case it misses to note the fact
> that if it walked across first mismatched pair it is no longer safe to
> compare rest.
> 
> The reason why it continues matching is because it hopes it will
> eventually get pair of COMPONENT_REFs from types of same size and use
> TBAA to conclude that their addresses must be either same or completely
> disjoint.
> 
> This patch makes the loop to terminate early but popping all the
> remaining pairs so walking can continue.  We could re-synchronize on
> arrays of same size with TBAA but this is bit fishy (because we try to
> support some sort of partial array overlaps) and hard to implement
> (because of zero sized arrays and VLAs) so I think it is not worth the
> effort.
> 
> In addition I notied that the function is not !flag_strict_aliasing safe
> and added early exits on places we set seen_unmatched_ref_p since later
> we do not check that in:
> 
>/* If we skipped array refs on type of different sizes, we can
>no longer be sure that there are not partial overlaps.  */
>if (seen_unmatched_ref_p
> && !operand_equal_p (TYPE_SIZE (type1), TYPE_SIZE (type2), 0))
>   {
> ++alias_stats
>   .nonoverlapping_refs_since_match_p_may_alias;
>   }
> 
> Bootstrapped/regtested ppc64-linux, OK?

OK.

Thanks,
Richard.

>   * tree-ssa-alias.c (nonoverlapping_array_refs_p): Finish array walk
>   after mismatched array refs; do not sure type size information to
>   recover from unmatched referneces with !flag_strict_aliasing_p.
>   * gcc.dg/torture/pr93586.c: New testcase.
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index fd78105..8509f75 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -1486,9 +1489,27 @@ nonoverlapping_refs_since_match_p (tree match1, tree 
> ref1,
>   .nonoverlapping_refs_since_match_p_no_alias;
> return 1;
>   }
> -   partial_overlap = false;
> if (cmp == -1)
> - seen_unmatched_ref_p = true;
> + {
> +   seen_unmatched_ref_p = true;
> +   /* We can not maintain the invariant that bases are either
> +  same or completely disjoint.  However we can still recover
> +  from type based alias analysis if we reach referneces to
> +  same sizes.  We do not attempt to match array sizes, so
> +  just finish array walking and look for component refs.  */
> +   if (!flag_strict_aliasing)
> + {
> +   ++alias_stats.nonoverlapping_refs_since_match_p_may_alias;
> +   return -1;
> + }
> +   for (i++; i < narray_refs1; i++)
> + {
> +   component_refs1.pop ();
> +   component_refs2.pop ();
> + }
> +   break;
> + }
> +   partial_overlap = false;
>   }
>   }
>  
> @@ -1503,7 +1524,14 @@ nonoverlapping_refs_since_match_p (tree match1, tree 
> ref1,
>   }
> ref1 = comp

Re: [PATCH] Fix stack pointer handling in ms_hook_prologue functions for i386 target.

2020-02-21 Thread Paul Gofman
Hello,

    ping for patch https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00554.html.

Thanks,
    Paul.

On 2/10/20 19:22, Paul Gofman wrote:
> ChangeLog:
> PR target/91489
> * config/i386/i386.md (simple_return): Also check
> for ms_hook_prologue function attribute.
> * config/i386/i386.c (ix86_can_use_return_insn_p):
> Also check for ms_hook_prologue function attribute.
>
> testsuite/ChangeLog:
> PR target/91489
> * gcc.target/i386/ms_hook_prologue.c: Expand testcase
> to reproduce PR target/91489 issue.
>
> Signed-off-by: Paul Gofman 
> ---
>  gcc/config/i386/i386-protos.h|  1 +
>  gcc/config/i386/i386.c   |  3 +++
>  gcc/config/i386/i386.md  |  5 -
>  gcc/testsuite/gcc.target/i386/ms_hook_prologue.c | 13 -
>  4 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 266381ca5a6..966ce426a18 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -26,6 +26,7 @@ extern bool ix86_handle_option (struct gcc_options *opts,
>  /* Functions in i386.c */
>  extern bool ix86_target_stack_probe (void);
>  extern bool ix86_can_use_return_insn_p (void);
> +extern bool ix86_function_ms_hook_prologue (const_tree fn);
>  extern void ix86_setup_frame_addresses (void);
>  extern bool ix86_rip_relative_addr_p (struct ix86_address *parts);
>  
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 44bc0e0176a..68e2a7519f4 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -4954,6 +4954,9 @@ symbolic_reference_mentioned_p (rtx op)
>  bool
>  ix86_can_use_return_insn_p (void)
>  {
> +  if (ix86_function_ms_hook_prologue (current_function_decl))
> +return false;
> +
>if (ix86_function_naked (current_function_decl))
>  return false;
>  
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index f14683cd14f..a7302b886c6 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -13445,10 +13445,13 @@
>  ;; static chain pointer - the first instruction has to be pushl %esi
>  ;; and it can't be moved around, as we use alternate entry points
>  ;; in that case.
> +;; Also disallow for ms_hook_prologue functions which have frame
> +;; pointer set up in function label which is correctly handled in
> +;; ix86_expand_{prologue|epligoue}() only.
>  
>  (define_expand "simple_return"
>[(simple_return)]
> -  "!TARGET_SEH && !ix86_static_chain_on_stack"
> +  "!TARGET_SEH && !ix86_static_chain_on_stack && 
> !ix86_function_ms_hook_prologue (cfun->decl)"
>  {
>if (crtl->args.pops_args)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/ms_hook_prologue.c 
> b/gcc/testsuite/gcc.target/i386/ms_hook_prologue.c
> index e11bcc049cb..12e54c0e4ad 100644
> --- a/gcc/testsuite/gcc.target/i386/ms_hook_prologue.c
> +++ b/gcc/testsuite/gcc.target/i386/ms_hook_prologue.c
> @@ -4,6 +4,8 @@
>  /* { dg-require-effective-target ms_hook_prologue } */
>  /* { dg-options "-O2 -fomit-frame-pointer" } */
>  
> +#include 
> +
>  int __attribute__ ((__ms_hook_prologue__)) foo ()
>  {
>unsigned char *ptr = (unsigned char *) foo;
> @@ -32,7 +34,16 @@ int __attribute__ ((__ms_hook_prologue__)) foo ()
>return 0;
>  }
>  
> +unsigned int __attribute__ ((noinline, __ms_hook_prologue__)) test_func()
> +{
> +  static int value;
> +
> +  if (value++) puts("");
> +
> +  return 0;
> +}
> +
>  int main ()
>  {
> -  return foo();
> +  return foo() || test_func();
>  }




[PATCH v2] RISC-V: Adjust floating point code gen for LTGT compare

2020-02-21 Thread Kito Cheng
 - Using gcc.dg/torture/pr91323.c as testcase, so no new testcase
   introduced.

 - We use 3 eq compare for LTGT compare before, in order to prevent exception
   flags setting when any input is NaN.

 - According latest GCC document LTGT and discussion on pr91323
   LTGT should signals on NaNs, like GE/GT/LE/LT.

 - So we expand (LTGT a b) to ((LT a b) | (GT a b)) for fit the document.

 - Tested rv64gc/rv32gc bare-metal/linux on qemu and
   rv64gc on HiFive unleashed board with linux.

ChangeLog

gcc/

Kito Cheng  

* config/riscv/riscv.c (riscv_emit_float_compare): Change the code gen
for LTGT.
(riscv_rtx_costs): Update cost model for LTGT.
---
 gcc/config/riscv/riscv.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 54de0a667a4..d45b19d861b 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -1703,12 +1703,17 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case UNEQ:
-case LTGT:
   /* (FEQ(A, A) & FEQ(B, B)) compared against FEQ(A, B).  */
   mode = GET_MODE (XEXP (x, 0));
   *total = tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (3);
   return false;
 
+case LTGT:
+  /* (FLT(A, A) || FGT(B, B)).  */
+  mode = GET_MODE (XEXP (x, 0));
+  *total = tune_info->fp_add[mode == DFmode] + COSTS_N_INSNS (2);
+  return false;
+
 case UNGE:
 case UNGT:
 case UNLE:
@@ -2239,9 +2244,8 @@ riscv_emit_float_compare (enum rtx_code *code, rtx *op0, 
rtx *op1)
   break;
 
 case UNEQ:
-case LTGT:
   /* ordered(a, b) > (a == b) */
-  *code = fp_code == LTGT ? GTU : EQ;
+  *code = EQ;
   tmp0 = riscv_force_binary (word_mode, EQ, cmp_op0, cmp_op0);
   tmp1 = riscv_force_binary (word_mode, EQ, cmp_op1, cmp_op1);
   *op0 = riscv_force_binary (word_mode, AND, tmp0, tmp1);
@@ -2293,6 +2297,13 @@ riscv_emit_float_compare (enum rtx_code *code, rtx *op0, 
rtx *op1)
   *op1 = const0_rtx;
   break;
 
+case LTGT:
+  /* (a < b) | (a > b) */
+  *code = IOR;
+  *op0 = riscv_force_binary (word_mode, LT, cmp_op0, cmp_op1);
+  *op1 = riscv_force_binary (word_mode, GT, cmp_op0, cmp_op1);
+  break;
+
 default:
   gcc_unreachable ();
 }
-- 
2.25.0