Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c

2019-03-28 Thread Alan Modra
On Thu, Mar 28, 2019 at 01:08:55PM -0500, Segher Boessenkool wrote:
> > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order
> > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS
> > as an allocno class.  It is similar to the aarch64 version but without
> > any selection by regno mode if the best class is a union class.
> 
> It would be nice if we could do without such hacks.  Alas.

Yeah.  We have some serious problems with register pressure
calculations in sched1 and ira.  This is merely a workaround for the
regression.  I intend to poke a little more at the scheduler to see
whether I can figure out a proper fix, but for now this is the best I
have.

> > OK assuming Pat's latest spec test run doesn't contain any nasty
> > surprises?
> 
> Not for stage 4, no.  Sorry.  But it should be great in stage 1 :-)

Good.  I'm happy to leave this until next stage 1.

> > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
> > index 9fb36e41e7d..20c59f89c8f 100644
> > --- a/gcc/config/rs6000/darwin.h
> > +++ b/gcc/config/rs6000/darwin.h
> > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands;
> >&& reg_class_subset_p (BASE_REGS, (CLASS)))  \
> > ? BASE_REGS \
> > : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT\
> > -  && (CLASS) == GEN_OR_FLOAT_REGS) \
> > +  && ((CLASS) == GEN_OR_FLOAT_REGS \
> > + || (CLASS) == GEN_OR_VSX_REGS))   \
> > ? GENERAL_REGS  \
> > : (CLASS))
> 
> Darwin doesn't do VSX at all...  But maybe there is something that can get
> allocated to both FPRs and VRs, sure.  And GPRs.
> 
> This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places
> where we should care about this change for correctness.

For sure there are other places.  A new union register class trips all
those places where union classes fail.  For example,
ira.c:setup_class_translate_array doesn't give useful answers for
GEN_OR_VSX_REGS, or GEN_OR_FLOAT_REGS for that matter.  That makes
uses of ira_allocno_class_translate and ira_pressure_class_translate
for pseudos suspect whenever one of the union classes is involved.

rs6000_ira_change_pseudo_allocno_class helps of course, in that
GEN_OR_VSX_REGS mostly won't be used.

> > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum 
> > reg_class rclass)
> >return NO_REGS;
> >  }
> >  
> > -  if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
> > +  if (GET_MODE_CLASS (mode) == MODE_INT
> > +  && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS))
> >  return GENERAL_REGS;
> 
> Maybe do this whenever rclass contains the GPRs?

Well, you caught me out here.  Like the darwin.h change I made this
change early on in the process of fixing register_move_cost, on the
grounds that whatever we did for GEN_OR_FLOAT_REGS probably should be
done for GEN_OR_VSX_REGS.  That's not a really good reason
particularly since this code is so old (git a99459e46d, svn r35162).
Maybe it's just a reload hack?

So you might be better questioning the need for this change at all.
I'll see how the test results look if I remove it.  Int modes can live
in VSX regs, after all.

> > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode,
> >reg_class_t from, reg_class_t to)
> >  {
> >int ret;
> > +  reg_class_t rclass;
> >  
> >if (TARGET_DEBUG_COST)
> >  dbg_cost_ctrl++;
> >  
> > +  /* If we have VSX, we can easily move between FPR or Altivec registers,
> > + otherwise we can only easily move within classes.
> > + Do this first so we give best-case answers for union classes
> > + containing both gprs and vsx regs.  */
> > +  HARD_REG_SET to_vsx, from_vsx;
> > +  COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]);
> > +  AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]);
> > +  COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]);
> > +  AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]);
> 
> This is a bit expensive to run at every call of rs6000_register_move_cost.
> Can it be precomputed easily?

register_move_cost tends to be cached by callers of this function, so
I'm inclined to go for simple and correct rather than fast and
complicated.

> > +   {
> > + if (TARGET_DIRECT_MOVE
> > + && (rs6000_tune == PROCESSOR_POWER8
> > + || rs6000_tune == PROCESSOR_POWER9))
> 
> TARGET_DIRECT_MOVE is always on for these CPUs.  Should this also use the
> m*vsr* cost with say -mcpu=power7 -mtune=power9?

No, because if we don't generate m*vsr*, and we shouldn't, then that
would be telling a lie.

> This list makes me think...  Should there be a GEN_OR_ALTIVEC as well?

That might make sense for pre-vsx processors, if you can find a
testcase where the GENERAL_REGS cost is

[PR89862] Fix ARM lto bootstrap

2019-03-28 Thread Kugan Vivekanandarajah
Hi All,
LTO bootstrap for ARM fails with the commit

commit 67c18bce7054934528ff5930cca283b4ac967dca
* combine.c (record_dead_and_set_regs_1): Record the source unmodified
for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.

It fails with an  internal compiler error: in operator+=, at
profile-count.h:792.

With the commit now we are not  generating gen_lowpart for CONST_INT as in
(set (subreg:SI (reg:QI 1434) 0)
(const_int 224 [0xe0])) and likes.

As discussed in the PR, attached patch fixes this and fixes the
bootstrap failure. I am not able to create a reduced testcase for
this. However, it is being tested with LTO bootstrap for ARM. I
therefore believe that it is OK.
I have also tested the patch with x86_64-linux-gnu with no new regressions.
Is this OK for trunk?

Thanks,
Kugan
diff --git a/gcc/rtl.h b/gcc/rtl.h
index f991919..52ecd5a 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4401,6 +4401,7 @@ word_register_operation_p (const_rtx x)
 {
   switch (GET_CODE (x))
 {
+case CONST_INT:
 case ROTATE:
 case ROTATERT:
 case SIGN_EXTRACT:


log
Description: Binary data


Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c

2019-03-28 Thread Iain Sandoe


> On 28 Mar 2019, at 18:08, Segher Boessenkool  
> wrote:
> 

>> diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
>> index 9fb36e41e7d..20c59f89c8f 100644
>> --- a/gcc/config/rs6000/darwin.h
>> +++ b/gcc/config/rs6000/darwin.h
>> @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands;
>>   && reg_class_subset_p (BASE_REGS, (CLASS)))\
>>? BASE_REGS   \
>>: (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT  \
>> -  && (CLASS) == GEN_OR_FLOAT_REGS)  \
>> +  && ((CLASS) == GEN_OR_FLOAT_REGS  \
>> +  || (CLASS) == GEN_OR_VSX_REGS))   \
>>? GENERAL_REGS\
>>: (CLASS))
> 
> Darwin doesn't do VSX at all…  

Well.. Darwin doesn’t currently run on any CPU with VSX hardware…
However, in their wisdom, the folks who were implementing it way back when
made Darwin have a soft implementation of V2DF and V2DI (in case that
matters, which seems unlikely in this context).

> But maybe there is something that can get
> allocated to both FPRs and VRs, sure.  And GPRs.

not sure what is being asked or stated here - is there a question about ABI, or
just the assertion that some quantities could be placed in GPRs, FPRs or VRs?

(the latter seems reasonable to me, the former I’d need to think some more
about).

Iain



Re: [PATCH] [ARC] PR 88409: miscompilation due to missing cc clobber in longlong.h macros

2019-03-28 Thread Vineet Gupta
On 3/28/19 5:07 PM, Marc Glisse wrote:
> On Thu, 28 Mar 2019, Vineet Gupta wrote:
>
>> simple test such as below was failing.
>>
>> | void main(int argc, char *argv[])
>> | {
>> |size_t total_time = 115424;   // expected 115.424
>> |double secs = (double)total_time/(double)1000;
>> |printf("%s %d %lf\n", "secs", total_time, secs);  // prints 113.504
>> |printf("%d\n", (size_t)secs);
>> | }
>>
>> The printf eventually called into glibc stdlib/divrem.c:__mpn_divrem()
>> which uses the __arc__ specific inline asm macros from longlong.h which
>> were causing miscompilation.
> Hello,
>
> do you intend to post similar patches for glibc and gmp, which both have a 
> similar longlong.h?

Yeah, glibc typically "syncs" longlong.h from gcc so once gcc patch is merged,
I'll post a sync patch to glibc.
Good tip about gmp, I wasn't aware of that. I suppose I could post there too 
once
dust settles on gcc side.

Thx,
-Vinet


Re: [PATCH] [ARC] PR 88409: miscompilation due to missing cc clobber in longlong.h macros

2019-03-28 Thread Marc Glisse

On Thu, 28 Mar 2019, Vineet Gupta wrote:


simple test such as below was failing.

| void main(int argc, char *argv[])
| {
|size_t total_time = 115424;   // expected 115.424
|double secs = (double)total_time/(double)1000;
|printf("%s %d %lf\n", "secs", total_time, secs);  // prints 113.504
|printf("%d\n", (size_t)secs);
| }

The printf eventually called into glibc stdlib/divrem.c:__mpn_divrem()
which uses the __arc__ specific inline asm macros from longlong.h which
were causing miscompilation.


Hello,

do you intend to post similar patches for glibc and gmp, which both have a 
similar longlong.h?


--
Marc Glisse


Re: [C++ debug PATCH] [PR88534] accept VAR_DECL in class literal template parms

2019-03-28 Thread Martin Sebor

On 3/28/19 11:45 AM, Jason Merrill wrote:

On 3/27/19 6:56 PM, Martin Sebor wrote:

On 3/27/19 3:11 PM, Martin Sebor wrote:

On 3/27/19 4:44 AM, Jonathan Wakely wrote:

On 21/03/19 15:03 -0400, Jason Merrill wrote:

On 3/20/19 6:06 PM, Marek Polacek wrote:

On Wed, Mar 20, 2019 at 10:58:32PM +0100, Jakub Jelinek wrote:

On Wed, Mar 20, 2019 at 05:55:04PM -0400, Marek Polacek wrote:

On Wed, Mar 20, 2019 at 04:56:33PM -0300, Alexandre Oliva wrote:

On Mar 20, 2019, Marek Polacek  wrote:


This test fails with
pr88534.C:58:1: sorry, unimplemented: string literal in 
function template signature


Interesting...  gcc-8 rejected it with an error message 
rejecting the
template parameter, but my latest trunk build (dated Mar 13, 
r269641)
compiles it all right.  Was there a subsequent fix, maybe?  I 
didn't

realize it was supposed to be rejected.


Ah, that problem only started with r269814, namely this hunk:


Maybe this is done too early and should be postponed to 
genericization

(perhaps except for TREE_STATIC vars)?


Or skip when DECL is template_parm_object_p.


Or handle it in mangle.c.  I don't see any reason we shouldn't accept

struct A
{
 char c[4];
};

template  struct B { };
B b;

Probably we should use the same mangling whether the initializer 
for c was spelled as a string literal or list of integers.


The thing we still don't want to allow is mangling the *address* of 
a string literal.


Will that help PR 47488 as well?


What I have (attached) accepts all three test cases from PR 47488
(comment #0, #1, #2, and #5) but with different mangling.

The difference in the mangling of the function in the test case
in comment #0 is

   Clang: _Z1gIiEDTcl1fcvT__ELA1_KcEEERKS0_
   GCC 9: _Z1gIiEDTcl1fcvT__ELKc0EEERKS0_

I'm not very familiar with the C++ ABI mangling but from what
I can tell, Clang treats the type as a literal array of 1 const
char element (LA1_Kc) without actually encoding its value, while
with my patch GCC encodes it as a constant literal initializer
consisting of 1 null char (LKc0).  In other words, Clang would
mangle these two the same for the same T:

   template < typename T >
   decltype (f (T(), "123")) g (const T&);

   template < typename T >
   decltype (f (T(), "abc")) g (const T&);

while GCC would mangle them differently.

Which is correct?  Clang's seems correct in this case but would
it also be correct to mangle Jason's B the same as
B?


After some poking around I found the issue below that seems
to answer the question at least for Jason's example:

   https://github.com/itanium-cxx-abi/cxx-abi/issues/64

But this is an open issue that leaves some other questions
unanswered, mainly that of the exact encoding of the literal
(the length of the directly encoded subsequence and the hash
algorithm to compute the hash value of the string).

In any event, using the same encoding as for initializer
lists (i.e., (elt-type, elt-value)...) doesn't follow this
approach.

Should I prototype the solution outlined in the issue for
GCC 9?


I think let's leave that for the future, and use the aggregate encoding 
for GCC 9.



I suppose there's still the question of compatibility between
initializer lists and string literals, namely whether this

   B

is supposed to mangle the same as

   B

The ABI issue only talks about string literals and not braced
initializer lists.  I see in Revision History [150204] Add
mangling for braced initializer lists, so presumably there
is a difference, but I can't find what it is.


The issue for mangling template arguments of class type is

   https://github.com/itanium-cxx-abi/cxx-abi/issues/63


Whoops.  Off-by-one error.  Thanks!



The mangling of braced-init-lists is

# type {expr-list}, conversion with braced-init-list argument
::= tl  * E

# {expr-list}, braced-init-list in any other context
::= il * E

Your patch doesn't use this mangling.  Here's a variant of my example 
that actually causes these to show up in the mangling:


struct A { char c[4]; };
template  struct B { };
void f(B) {}
void g(B) {}


Right, that's also what I'm testing.  There are few interesting
variations.  Given

  struct A { char c[5]; };

all of these are equivalent and so should mangle the same:

  void f (B) {}
  void f (B) {}
  void f (B) {}
  void f (B) {}
  void f (B) {}

Interior nuls need to be mangled but the trailing ones, either
explicit or implicit, don't.  That also needs to be specified
in the ABI.

I believe we also need to handle these the same, both during
overloading and mangling:

  void f (B) {}
  void f (B) {}
  void f (B) {}
  void f (B) {}

GCC currently treats them as distinct irrespective of A's member's
type.  I opened bug 89878 for this since I don't expect to deal
with the overloading part in this patch.

Martin


[PATCH] [ARC] PR 88409: miscompilation due to missing cc clobber in longlong.h macros

2019-03-28 Thread Vineet Gupta
simple test such as below was failing.

| void main(int argc, char *argv[])
| {
|size_t total_time = 115424;   // expected 115.424
|double secs = (double)total_time/(double)1000;
|printf("%s %d %lf\n", "secs", total_time, secs);  // prints 113.504
|printf("%d\n", (size_t)secs);
| }

The printf eventually called into glibc stdlib/divrem.c:__mpn_divrem()
which uses the __arc__ specific inline asm macros from longlong.h which
were causing miscompilation.

include/
2019-03-28  Vineet Gupta 

PR 89877

* longlong.h [__arc__] (add_ss): Add cc clobber
(sub_ddmmss): Likewise.

Signed-off-by: Vineet Gupta 
---
 include/ChangeLog  | 7 +++
 include/longlong.h | 6 --
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/ChangeLog b/include/ChangeLog
index be08141deeb9..96d967d10a52 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,10 @@
+2019-03-28  Vineet Gupta 
+
+   PR 89877
+
+   * longlong.h [__arc__] (add_ss): Add cc clobber
+   (sub_ddmmss): Likewise.
+
 2019-02-11  Philippe Waroquiers  
 
* splay-tree.h (splay_tree_delete_key_fn): Update comment.
diff --git a/include/longlong.h b/include/longlong.h
index 3dd8dc3aa80c..1f0ce4204255 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -199,7 +199,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, 
UDItype);
   : "%r" ((USItype) (ah)), \
 "rICal" ((USItype) (bh)),  \
 "%r" ((USItype) (al)), \
-"rICal" ((USItype) (bl)))
+"rICal" ((USItype) (bl))   \
+  : "cc")
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub.f  %1, %4, %5\n\tsbc   %0, %2, %3" \
   : "=r" ((USItype) (sh)), \
@@ -207,7 +208,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, 
UDItype);
   : "r" ((USItype) (ah)),  \
 "rICal" ((USItype) (bh)),  \
 "r" ((USItype) (al)),  \
-"rICal" ((USItype) (bl)))
+"rICal" ((USItype) (bl))   \
+  : "cc")
 
 #define __umulsidi3(u,v) ((UDItype)(USItype)u*(USItype)v)
 #ifdef __ARC_NORM__
-- 
2.7.4



[committed] Fix omp lowering VLA ICE (PR middle-end/89621)

2019-03-28 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because remap_type remaps all VLA types, even
those that shouldn't be and e.g. the Fortran FE then asserts
TYPE_MAIN_VARIANT equality, which is not true because of that spurious
remapping.

During normal inlining and most of other remap_type uses, id->copy_decl is
copy_decl_no_change or something that calls it unconditionally and so it is
guaranteed we really need to remap all VLA types.  Similarly, during OpenMP
lowering of certain constructs like parallel, task, target we really better
remap all VLA types, while the id->copy_decl hook doesn't guarantee
returning a new decl, it is usually just for for global vars and global vars
typically aren't used in VLA types after gimplification, we want to record
the value at certain point rather than change the bounds of the type if
the global var changes.  In other OpenMP constructs, such as worksharing,
distribute, simd etc., we return new decls only for decls that have explicit
clauses on the constructs, everything else is shared and just uses decls
from outer context.  If we remap_type some type in such context, we create a
new type, but all remap_decls return the decl passed to it, so we end up
with multiple VLA types which aren't compatible, but have the same
dimensions etc.

The following patch makes sure we don't really remap such types.  In order
not to slow down normal inlining etc., there is an extra copy_body_data flag
to request this extra behavior and we then first check if we need to
actually remap and only remap if we need to.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-03-28  Jakub Jelinek  

PR middle-end/89621
* tree-inline.h (struct copy_body_data): Add
dont_remap_vla_if_no_change flag.
* tree-inline.c (remap_type_3, remap_type_2): New functions.
(remap_type): Don't remap vla types if id->dont_remap_vla_if_no_change
and remap_type_2 returns false.
* omp-low.c (new_omp_context): Set ctx->cb.dont_remap_vla_if_no_change.
Move ctx->cb.adjust_array_error_bounds setting to the outermost ctx
only from where it is copied to nested contexts.

* gfortran.dg/gomp/pr89621.f90: New test.

--- gcc/tree-inline.h.jj2019-03-27 18:13:40.446862001 +0100
+++ gcc/tree-inline.h   2019-03-27 23:09:20.576018070 +0100
@@ -123,6 +123,13 @@ struct copy_body_data
  an uninitialized VAR_DECL temporary.  */
   bool adjust_array_error_bounds;
 
+  /* Usually copy_decl callback always creates new decls, in that case
+ we want to remap all variably_modified_type_p types.  If this flag
+ is set, remap_type will do further checks to see if remap_decl
+ of any decls mentioned in the type will remap to anything but itself
+ and only in that case will actually remap the type.  */
+  bool dont_remap_vla_if_no_change;
+
   /* A function to be called when duplicating BLOCK nodes.  */
   void (*transform_lang_insert_block) (tree);
 
--- gcc/tree-inline.c.jj2019-03-27 18:13:40.408862615 +0100
+++ gcc/tree-inline.c   2019-03-28 14:11:14.421419730 +0100
@@ -598,6 +598,92 @@ remap_type_1 (tree type, copy_body_data
   return new_tree;
 }
 
+/* Helper function for remap_type_2, called through walk_tree.  */
+
+static tree
+remap_type_3 (tree *tp, int *walk_subtrees, void *data)
+{
+  copy_body_data *id = (copy_body_data *) data;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+
+  else if (DECL_P (*tp) && remap_decl (*tp, id) != *tp)
+return *tp;
+
+  return NULL_TREE;
+}
+
+/* Return true if TYPE needs to be remapped because remap_decl on any
+   needed embedded decl returns something other than that decl.  */
+
+static bool
+remap_type_2 (tree type, copy_body_data *id)
+{
+  tree t;
+
+#define RETURN_TRUE_IF_VAR(T) \
+  do   \
+{  \
+  tree _t = (T);   \
+  if (_t)  \
+   {   \
+ if (DECL_P (_t) && remap_decl (_t, id) != _t) \
+   return true;\
+ if (!TYPE_SIZES_GIMPLIFIED (type) \
+ && walk_tree (&_t, remap_type_3, id, NULL))   \
+   return true;\
+   }   \
+}  \
+  while (0)
+
+  switch (TREE_CODE (type))
+{
+case POINTER_TYPE:
+case REFERENCE_TYPE:
+case FUNCTION_TYPE:
+case METHOD_TYPE:
+  return remap_type_2 (TREE_TYPE (type), id);
+
+case INTEGER_TYPE:
+case REAL_TYPE:
+case FIXED_POINT_TYPE:
+case ENUMERAL_TYPE:
+case BOOLEAN_TYPE:
+  RETURN_TRUE_IF_VAR (TYPE_MIN_VALUE (type));
+  RETURN_TRUE_IF_VAR (TYPE_MAX_VALUE (type

Re: [PATCH] Fix PR71598, aliasing between enums and compatible types

2019-03-28 Thread Jason Merrill

On 3/26/19 4:40 AM, Richard Biener wrote:

On Mon, 18 Mar 2019, Richard Biener wrote:


On Fri, 15 Mar 2019, Jason Merrill wrote:


On 3/15/19 9:33 AM, Richard Biener wrote:


The following is an attempt to fix PR71598 where C (and C++?) have
an implementation-defined compatible integer type for each enum
and the TBAA rules mandate that accesses using a compatible type
are allowed.


This does not apply to C++; an enum does not alias its underlying type.


Thus the following different patch, introducing c_get_alias_set and
only doing the special handling for C family frontends (I assume
that at least ObjC is also affected).

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK?


Ping.  Also consider the additional testcase below to be added.

Richard.

2019-03-18  Richard Biener  

 PR c/71598
 * gimple.c: Include langhooks.h.
 (gimple_get_alias_set): Treat enumeral types as the underlying
 integer type.


Won't this affect all languages?

Jason


Re: C++ PATCH for c++/89612 - ICE with member friend template with noexcept

2019-03-28 Thread Jason Merrill

On 3/28/19 2:59 PM, Marek Polacek wrote:

On Thu, Mar 21, 2019 at 04:00:41PM -0400, Jason Merrill wrote:

On 3/19/19 11:45 AM, Marek Polacek wrote:

On Thu, Mar 14, 2019 at 04:22:41PM -0400, Jason Merrill wrote:

On 3/7/19 4:52 PM, Marek Polacek wrote:

This was one of those PRs where the more you poke, the more ICEs turn up.
This patch fixes the ones I could find.  The original problem was that
maybe_instantiate_noexcept got a TEMPLATE_DECL created for the member
friend template in do_friend.  Its noexcept-specification was deferred,
so we went to the block with push_access_scope, but that crashes on a
TEMPLATE_DECL.  One approach could be to somehow not defer noexcept-specs
for friend templates, I guess, but I didn't want to do that.



How does it make sense to instantiate the noexcept-specifier of a template?
We should only get there for fully-instantiated function decls.


Hmm, but duplicate_decls calls check_redeclaration_exception_specification even
for DECL_FUNCTION_TEMPLATE_Ps.
...
Note the crash happens in tsubst_friend_function.  I wouldn't know when to
check the noexcept-specifier of such a TEMPLATE_DECL for a template friend
if not there.


Hmm, true, I guess we do need to do a partial instantiation of the
noexcept-specifier in order to compare it.


*nod*


That broke in register_parameter_specializations but we don't need this
code anyway, so let's do away with it -- the current_class_{ref,ptr}
code is enough to fix the PR that register_parameter_specializations was
introduced for.


What about uses of non-'this' parameters in the noexcept-specification?

template 
struct C {
   template 
   friend void foo(T t) noexcept(sizeof(decltype(t)) > 1);
};

template 
void foo(int i) noexcept { }

C c;


Still works.  I extended the test to see if we detect the scenario when the
noexcept-specifiers don't match, and we do.  It's noexcept39.C.


Lastly, I found an invalid testcase that was breaking because a template code
leaked to constexpr functions.  This I fixed similarly to the recent explicit
PR fix (r269131).


This spot should probably also use build_converted_constant_expr.


Ok, I'll address this.


I'm finding this repeated pattern awkward.  Earlier you changed
check_narrowing to use maybe_constant_value instead of
fold_non_dependent_expr, but perhaps whatever that fixed should have been
fixed instead with a processing_template_decl_sentinel in the enclosing code
that already instantiated the expression.  That ought to avoid any need to
change this spot or r269131.


So this also came up in the other patch.  Why don't I drop this part (and the
noexcept1.C test) and open a new PR for this issue, so that we don't conflate
two problems?  The following patch fixes the original issue.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.

Jason



2019-03-28  Marek Polacek  

PR c++/89612 - ICE with member friend template with noexcept.
* pt.c (maybe_instantiate_noexcept): For function templates, use their
template result (function decl).  Don't set up local specializations.
Temporarily turn on processing_template_decl.  Update the template type
too.

* g++.dg/cpp0x/noexcept38.C: New test.
* g++.dg/cpp0x/noexcept39.C: New test.
* g++.dg/cpp1z/noexcept-type21.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index 05d5371d8a6..fa30a7f00c8 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -24192,6 +24192,17 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  
if (DECL_CLONED_FUNCTION_P (fn))

  fn = DECL_CLONED_FUNCTION (fn);
+
+  tree orig_fn = NULL_TREE;
+  /* For a member friend template we can get a TEMPLATE_DECL.  Let's use
+ its FUNCTION_DECL for the rest of this function -- push_access_scope
+ doesn't accept TEMPLATE_DECLs.  */
+  if (DECL_FUNCTION_TEMPLATE_P (fn))
+{
+  orig_fn = fn;
+  fn = DECL_TEMPLATE_RESULT (fn);
+}
+
fntype = TREE_TYPE (fn);
spec = TYPE_RAISES_EXCEPTIONS (fntype);
  
@@ -24228,37 +24239,41 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain)

  push_deferring_access_checks (dk_no_deferred);
  input_location = DECL_SOURCE_LOCATION (fn);
  
-	  /* A new stack interferes with pop_access_scope.  */

- {
-   /* Set up the list of local specializations.  */
-   local_specialization_stack lss (lss_copy);
-
-   tree save_ccp = current_class_ptr;
-   tree save_ccr = current_class_ref;
-   /* If needed, set current_class_ptr for the benefit of
-  tsubst_copy/PARM_DECL.  */
-   tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn));
-   if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl))
- {
-   tree this_parm = DECL_ARGUMENTS (tdecl);
-   current_class_ptr = NULL_TREE;
-   current_class_ref = cp_build_fold_indirect_ref (this_parm);
-   current_class_ptr = this_parm;
- }
+ tree save_

Re: C++ PATCH for c++/87145 - bogus error converting class type in template argument list

2019-03-28 Thread Jason Merrill

On 3/20/19 4:12 PM, Marek Polacek wrote:

The fix for 77656 caused us to call convert_nontype_argument even for
value-dependent arguments, to perform the conversion in order to avoid
a bogus warning.

In this case, the argument is Pod{N}.  The call to build_converted_constant_expr
in convert_nontype_argument produces Pod::operator Enum(&{N}).  It doesn't crash
because we're in a template and build_address no longer crashes on CONSTRUCTORs
in a template.


Yeah, we shouldn't be preserving lower level codes like this ADDR_EXPR; 
we should probably return an IMPLICIT_CONV_EXPR rather than make the 
call explicit.


Jason


Re: [PATCH, RFC] Wrong warning message fix for gcc 9

2019-03-28 Thread Martin Sebor

On 3/28/19 11:49 AM, Roman Zhuykov wrote:

Ping

A week ago I decided it is so minor that I can report here with a
patch without creating bugzilla PR.
Technically it is a "9 regression" in new functionality added back in summer.

Patch was succesfully bootstrapped and regtested on x86_64.

Ok for trunk?


Thanks for the patch!  The change makes sense to me.  Can you
please also add a test case to the test suite?

I can't approve patches, even obvious ones, so please wait for
an approval from a maintainer before committing it.

Martin



чт, 21 мар. 2019 г. в 18:53, Roman Zhuykov :


Hello, I have found a minor diagnostic issue.

Since r262910 we got a little odd error messages in cases like this:
$ gcc -flto-compression-level=2-O2 -c empty.c
gcc: error: argument to '-flto-compression-level=' is not between 0 and 9
$ g++ -fabi-version=2-O2 -c empty.cpp
cc1plus: error: '-fabi-version=1' is no longer supported

Old messages were more correct:
$ gcc-8 -flto-compression-level=2-O2 -c empty.c
gcc: error: argument to '-flto-compression-level=' should be a
non-negative integer
$ g++-8 -fabi-version=2-O2 -c empty.cpp
g++: error: argument to '-fabi-version=' should be a non-negative integer

When UInteger option value string is not a number, but it's first char
is a digit, integral_argument function returns -1 without setting
*err.

Proposed untested patch attached.

--
Best Regards,
Roman Zhuykov

gcc/ChangeLog:

2019-03-21  Roman Zhuykov  

* opts-common.c (integral_argument): Set errno properly in one case.

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -205,8 +205,10 @@ integral_argument (const char *arg, int *err,
bool byte_size_suffix)
 value = strtoull (arg, &end, 0);
 if (*end)
   {
-   /* errno is most likely EINVAL here.  */
-   *err = errno;
+   if (errno)
+ *err = errno;
+   else
+ *err = EINVAL;
 return -1;
   }




[PATCH, i386]: Correct RMW operation with LEA peephole (PR 89865)

2019-03-28 Thread Uros Bizjak
Attached patch corrects RMW operation with LEA peephole pattern. The
mode of the LEA is either SImode (for QImode, HImode or SImode
operation) or DImode.

2019-03-28  Uroš Bizjak  

PR target/89865
* config/i386/i386.md (RMW operation with LEA peephole):
Use LEAMODE mode attribute instead of SWI mode iterator for
LEA pattern.

The patch triggers

FAIL: gcc.target/i386/pr49095.c scan-assembler-times ), % 45

testsuite failure also for x86_64-linux-gnu. The adjusted number of
found patterns was wrong from the beginning and hid the uncovered
problem with LEA operation.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 269995)
+++ config/i386/i386.md (working copy)
@@ -18684,14 +18684,16 @@
 (define_peephole2
   [(set (match_operand:SWI 0 "register_operand")
(match_operand:SWI 1 "memory_operand"))
-   (set (match_operand:SWI 3 "register_operand")
-   (plus:SWI (match_dup 0)
- (match_operand:SWI 2 "")))
-   (set (match_dup 1) (match_dup 3))
-   (set (reg FLAGS_REG) (compare (match_dup 3) (const_int 0)))]
+   (set (match_operand: 3 "register_operand")
+   (plus: (match_operand: 4 "register_operand")
+   (match_operand: 2 "")))
+   (set (match_dup 1) (match_operand:SWI 5 "register_operand"))
+   (set (reg FLAGS_REG) (compare (match_dup 5) (const_int 0)))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && REGNO (operands[4]) == REGNO (operands[0])
+   && REGNO (operands[5]) == REGNO (operands[3])
&& peep2_reg_dead_p (4, operands[3])
-   && (rtx_equal_p (operands[0], operands[3])
+   && ((REGNO (operands[0]) == REGNO (operands[3]))
|| peep2_reg_dead_p (2, operands[0]))
&& !reg_overlap_mentioned_p (operands[0], operands[1])
&& !reg_overlap_mentioned_p (operands[3], operands[1])
@@ -18700,17 +18702,17 @@
|| immediate_operand (operands[2], QImode)
|| any_QIreg_operand (operands[2], QImode))
&& ix86_match_ccmode (peep2_next_insn (3), CCGOCmode)"
-  [(parallel [(set (match_dup 4) (match_dup 6))
- (set (match_dup 1) (match_dup 5))])]
+  [(parallel [(set (match_dup 6) (match_dup 8))
+ (set (match_dup 1) (match_dup 7))])]
 {
-  operands[4] = SET_DEST (PATTERN (peep2_next_insn (3)));
-  operands[5]
+  operands[6] = SET_DEST (PATTERN (peep2_next_insn (3)));
+  operands[7]
 = gen_rtx_PLUS (mode,
copy_rtx (operands[1]),
-   operands[2]);
-  operands[6]
-= gen_rtx_COMPARE (GET_MODE (operands[4]),
-  copy_rtx (operands[5]),
+   gen_lowpart (mode, operands[2]));
+  operands[8]
+= gen_rtx_COMPARE (GET_MODE (operands[6]),
+  copy_rtx (operands[7]),
   const0_rtx);
 })
 


Re: [PATCH v3] luoxhu - backport r250477, r255555, r257253 and r258137

2019-03-28 Thread Segher Boessenkool
Hi Xiong,

Sorry this took so long.

On Mon, Mar 04, 2019 at 07:15:31PM -0600, luo...@linux.ibm.com wrote:
> This is a backport of r250477, r25, r257253 and r258137 from trunk to
> gcc-7-branch to support built-in functions:
> vec_extract_fp_from_shorth, vec_extract_fp_from_shortl,
> vec_extract_fp32_from_shorth and vec_extract_fp32_from_shortl, etc.
> The patches were on trunk before GCC 8 forked already.  r257253 and r258137
> are dependent testcases require vsx support need merge to avoid regression.
> 
> The discussion for the patch r250477 that went into trunk is:
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00624.html
> The discussion for the patch r25 that went into trunk is:
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00394.html
> VSX support for patch r257253 and r258137:
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02391.html
> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01506.html
> 
> Regression-tested on Linux POWER8 LE.

Okay for GCC 7.  Thank you!


Segher


[PATCH, i386]: Fix PR89848, ICE: in convert_op

2019-03-28 Thread Uros Bizjak
On Tue, Mar 26, 2019 at 8:05 PM Uros Bizjak  wrote:
>
> Attached patch fixes a corner case in STV pass where the shift operand
> register equals shift count register. The specialization for shift
> insns marked register as processed, but didn't process shift input
> operand, leaving an unprocessed DImode register.

There is another instance of the same problem in make_vector_copies.

2019-03-28  Uroš Bizjak  

PR target/89848
* config/i386/i386.c (dimode_scalar_chain::make_vector_copies):
Also process XEXP (src, 0) of a shift insn.

testsuite/ChangeLog:

2019-03-28  Uroš Bizjak  

PR target/89848
* gcc.target/i386/pr89848.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN, will be backported to gcc-8 branch.

Uros.
Index: testsuite/gcc.target/i386/pr89848.c
===
--- testsuite/gcc.target/i386/pr89848.c (nonexistent)
+++ testsuite/gcc.target/i386/pr89848.c (revision 270003)
@@ -0,0 +1,11 @@
+/* PR target/89848 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -msse2 -mtune=pentium3m" } */
+
+long long
+foo (long long x)
+{
+  x >>= 3;
+  x <<= x;
+  return x;
+}
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 270002)
+++ config/i386/i386.c  (revision 270003)
@@ -1901,7 +1901,10 @@
 || GET_CODE (src) == LSHIFTRT)
&& !CONST_INT_P (XEXP (src, 1))
&& reg_or_subregno (XEXP (src, 1)) == regno)
- XEXP (src, 1) = vreg;
+ {
+   XEXP (src, 0) = replace_with_subreg (XEXP (src, 0), reg, reg);
+   XEXP (src, 1) = vreg;
+ }
  }
else
  replace_with_subreg_in_insn (insn, reg, vreg);


Re: C++ PATCH for c++/89612 - ICE with member friend template with noexcept

2019-03-28 Thread Marek Polacek
On Thu, Mar 21, 2019 at 04:00:41PM -0400, Jason Merrill wrote:
> On 3/19/19 11:45 AM, Marek Polacek wrote:
> > On Thu, Mar 14, 2019 at 04:22:41PM -0400, Jason Merrill wrote:
> > > On 3/7/19 4:52 PM, Marek Polacek wrote:
> > > > This was one of those PRs where the more you poke, the more ICEs turn 
> > > > up.
> > > > This patch fixes the ones I could find.  The original problem was that
> > > > maybe_instantiate_noexcept got a TEMPLATE_DECL created for the member
> > > > friend template in do_friend.  Its noexcept-specification was deferred,
> > > > so we went to the block with push_access_scope, but that crashes on a
> > > > TEMPLATE_DECL.  One approach could be to somehow not defer 
> > > > noexcept-specs
> > > > for friend templates, I guess, but I didn't want to do that.
> 
> > > How does it make sense to instantiate the noexcept-specifier of a 
> > > template?
> > > We should only get there for fully-instantiated function decls.
> > 
> > Hmm, but duplicate_decls calls check_redeclaration_exception_specification 
> > even
> > for DECL_FUNCTION_TEMPLATE_Ps.
> > ...
> > Note the crash happens in tsubst_friend_function.  I wouldn't know when to
> > check the noexcept-specifier of such a TEMPLATE_DECL for a template friend
> > if not there.
> 
> Hmm, true, I guess we do need to do a partial instantiation of the
> noexcept-specifier in order to compare it.

*nod*

> > That broke in register_parameter_specializations but we don't need this
> > code anyway, so let's do away with it -- the current_class_{ref,ptr}
> > code is enough to fix the PR that register_parameter_specializations was
> > introduced for.
> 
> What about uses of non-'this' parameters in the noexcept-specification?
> 
> template 
> struct C {
>   template 
>   friend void foo(T t) noexcept(sizeof(decltype(t)) > 1);
> };
> 
> template 
> void foo(int i) noexcept { }
> 
> C c;

Still works.  I extended the test to see if we detect the scenario when the
noexcept-specifiers don't match, and we do.  It's noexcept39.C.

> > > > Lastly, I found an invalid testcase that was breaking because a 
> > > > template code
> > > > leaked to constexpr functions.  This I fixed similarly to the recent 
> > > > explicit
> > > > PR fix (r269131).
> > > 
> > > This spot should probably also use build_converted_constant_expr.
> > 
> > Ok, I'll address this.
> 
> I'm finding this repeated pattern awkward.  Earlier you changed
> check_narrowing to use maybe_constant_value instead of
> fold_non_dependent_expr, but perhaps whatever that fixed should have been
> fixed instead with a processing_template_decl_sentinel in the enclosing code
> that already instantiated the expression.  That ought to avoid any need to
> change this spot or r269131.

So this also came up in the other patch.  Why don't I drop this part (and the
noexcept1.C test) and open a new PR for this issue, so that we don't conflate
two problems?  The following patch fixes the original issue.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-03-28  Marek Polacek  

PR c++/89612 - ICE with member friend template with noexcept.
* pt.c (maybe_instantiate_noexcept): For function templates, use their
template result (function decl).  Don't set up local specializations.
Temporarily turn on processing_template_decl.  Update the template type
too.

* g++.dg/cpp0x/noexcept38.C: New test.
* g++.dg/cpp0x/noexcept39.C: New test.
* g++.dg/cpp1z/noexcept-type21.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index 05d5371d8a6..fa30a7f00c8 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -24192,6 +24192,17 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
 
   if (DECL_CLONED_FUNCTION_P (fn))
 fn = DECL_CLONED_FUNCTION (fn);
+
+  tree orig_fn = NULL_TREE;
+  /* For a member friend template we can get a TEMPLATE_DECL.  Let's use
+ its FUNCTION_DECL for the rest of this function -- push_access_scope
+ doesn't accept TEMPLATE_DECLs.  */
+  if (DECL_FUNCTION_TEMPLATE_P (fn))
+{
+  orig_fn = fn;
+  fn = DECL_TEMPLATE_RESULT (fn);
+}
+
   fntype = TREE_TYPE (fn);
   spec = TYPE_RAISES_EXCEPTIONS (fntype);
 
@@ -24228,37 +24239,41 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  push_deferring_access_checks (dk_no_deferred);
  input_location = DECL_SOURCE_LOCATION (fn);
 
- /* A new stack interferes with pop_access_scope.  */
- {
-   /* Set up the list of local specializations.  */
-   local_specialization_stack lss (lss_copy);
-
-   tree save_ccp = current_class_ptr;
-   tree save_ccr = current_class_ref;
-   /* If needed, set current_class_ptr for the benefit of
-  tsubst_copy/PARM_DECL.  */
-   tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn));
-   if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl))
- {
-   tree this_parm = DECL_ARGUMENTS (tdecl);
-   curren

Re: C++ PATCH for c++/89852 - ICE with C++11 functional cast with { }

2019-03-28 Thread Jason Merrill

On 3/27/19 5:45 PM, Marek Polacek wrote:

Here we have a non-dependent constructor in a template:

   { VIEW_CONVERT_EXPR(j) }

In digest_init we call massage_init_elt, which calls digest_init_r on the
element.  We convert the element, but we're in a template, so
perform_implicit_conversion added an IMPLICIT_CONV_EXPR around it.  And then
massage_init_elt calls maybe_constant_init on the element and the usual sadness
ensues.


Only after fold_non_dependent_expr.  Perhaps we want a 
fold_non_dependent_init?


Jason


Re: C++ PATCH for c++/89836 - bool constant expression and explicit conversions

2019-03-28 Thread Marek Polacek
On Thu, Mar 28, 2019 at 02:02:47PM -0400, Jason Merrill wrote:
> On 3/27/19 3:08 PM, Marek Polacek wrote:
> > I noticed that this test doesn't compile because 
> > build_converted_constant_expr
> > failed to consider explicit conversion functions.  That's wrong: while Core
> > Issue 1981 specifies that explicit conversion functions are not considered 
> > for
> > contextual conversions, they are considered in contextual conversions to 
> > bool,
> > as defined in Core 2039.
> > 
> > Fixed by adding a new wrapper that uses LOOKUP_NORMAL instead of
> > LOOKUP_IMPLICIT.
> > 
> > Besides [dcl.fct.spec], these: [except.spec], [dcl.dcl]/6 (static_assert),
> > [stmt.if] (constexpr if) all talk about "a contextually converted constant
> > expression of type bool", so it would seem to make sense to use
> > build_converted_constant_bool_expr for them also.  E.g. use that instead of
> > perform_implicit_conversion_flags in build_noexcept_spec.  But that doesn't
> > work for this test:
> > 
> > int e();
> > int fn() noexcept(e);
> > 
> > because build_converted_constant_expr would issue a conversion error.  We're
> > converting "e" to "bool".  We have a ck_lvalue conversion from "int f()" to
> > "int (*f)()" and then a ck_std conversion from "int (*f)()" to bool.  Those
> > should work fine but we issue an error for the ck_std conversion.
> 
> Converting a pointer to bool is a "boolean conversion", which is not allowed
> under the rules for a converted constant expression ([expr.const]p7).  So we
> should reject that testcase.

Oh, right (though clang also accepts it).  I'll see if there's anything else
that breaks if I use build_converted_constant_expr instead of
perform_implicit_conversion.

> > Also build_converted_constant_expr doesn't have the processing_template_decl
> > handling that perform_implicit_conversion_flags does, which (I believe) 
> > lead me
> > to changing check_narrowing to use maybe_constant_value instead of
> > fold_non_dependent_expr.
> 
> As I was saying elsewhere, I think that change was probably a mistake, and
> that we should have fixed that bug by changing the caller instead.

I need to address that.  Unfortunately adding a sentinel in the caller broke
so much stuff :(.

> > Anyway, this patch should be safe.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> OK.

Thanks, committed.

Marek


Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c

2019-03-28 Thread Segher Boessenkool
On Thu, Mar 28, 2019 at 09:15:54PM +1030, Alan Modra wrote:
> On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote:
> > Also, using a register move cost of 2 for for power9 direct moves
> > gives these fails, even with the .md file tweaks:
> > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler-not mtvsrwz
> > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler vspltisb
> > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler xxspltib
> > +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler-not mtvsrwz
> > +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler vspltisw
> > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler-not mtvsrwz
> > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler vspltish
> > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler xxspltib
> > +FAIL: gcc.target/powerpc/vsx-himode3.c scan-assembler lxsihzx
> > +FAIL: gcc.target/powerpc/vsx-qimode3.c scan-assembler lxsibzx
> > These can be all be fixed by removing "?"s disparaging vector
> > alternatives in movsi_internal1 and mov_internal.
> 
> Like this.  Bootstrapped and regression tested powerpc64le-linux.
> OK for stage1?

Yup, together with the previous patch.  Thanks,


Segher


Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c

2019-03-28 Thread Segher Boessenkool
Hi!

On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote:
> This patch makes a number of corrections to rs6000_register_move_cost,
> adds a new register union class, GEN_OR_VSX_REGS, and adjusts insn
> alternative to suit.

Cool beans.

[ snip various great explanations, thanks! ]

> TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order
> to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS
> as an allocno class.  It is similar to the aarch64 version but without
> any selection by regno mode if the best class is a union class.

It would be nice if we could do without such hacks.  Alas.

> OK assuming Pat's latest spec test run doesn't contain any nasty
> surprises?

Not for stage 4, no.  Sorry.  But it should be great in stage 1 :-)

> diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
> index 9fb36e41e7d..20c59f89c8f 100644
> --- a/gcc/config/rs6000/darwin.h
> +++ b/gcc/config/rs6000/darwin.h
> @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands;
>&& reg_class_subset_p (BASE_REGS, (CLASS)))\
> ? BASE_REGS   \
> : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT  \
> -  && (CLASS) == GEN_OR_FLOAT_REGS)   \
> +  && ((CLASS) == GEN_OR_FLOAT_REGS   \
> +   || (CLASS) == GEN_OR_VSX_REGS))   \
> ? GENERAL_REGS\
> : (CLASS))

Darwin doesn't do VSX at all...  But maybe there is something that can get
allocated to both FPRs and VRs, sure.  And GPRs.

This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places
where we should care about this change for correctness.

> @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
> rclass)
>return NO_REGS;
>  }
>  
> -  if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
> +  if (GET_MODE_CLASS (mode) == MODE_INT
> +  && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS))
>  return GENERAL_REGS;

Maybe do this whenever rclass contains the GPRs?

> @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode,
>  reg_class_t from, reg_class_t to)
>  {
>int ret;
> +  reg_class_t rclass;
>  
>if (TARGET_DEBUG_COST)
>  dbg_cost_ctrl++;
>  
> +  /* If we have VSX, we can easily move between FPR or Altivec registers,
> + otherwise we can only easily move within classes.
> + Do this first so we give best-case answers for union classes
> + containing both gprs and vsx regs.  */
> +  HARD_REG_SET to_vsx, from_vsx;
> +  COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]);
> +  AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]);
> +  COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]);
> +  AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]);

This is a bit expensive to run at every call of rs6000_register_move_cost.
Can it be precomputed easily?

> + {
> +   if (TARGET_DIRECT_MOVE
> +   && (rs6000_tune == PROCESSOR_POWER8
> +   || rs6000_tune == PROCESSOR_POWER9))

TARGET_DIRECT_MOVE is always on for these CPUs.  Should this also use the
m*vsr* cost with say -mcpu=power7 -mtune=power9?  If so you can simplify
this condition.  Or maybe it give problems elsewhere?  It's not really
worth spending to much time on, separate -mtune isn't used much at all.

> +static reg_class_t
> +rs6000_ira_change_pseudo_allocno_class (int regno ATTRIBUTE_UNUSED,
> + reg_class_t allocno_class,
> + reg_class_t best_class)
> +{
> +  switch (allocno_class)
> +{
> +default:
> +  break;

Please put default cases at the end.

> +  gcc_checking_assert (best_class == GEN_OR_VSX_REGS
> +|| best_class == GEN_OR_FLOAT_REGS
> +|| best_class == VSX_REGS
> +|| best_class == ALTIVEC_REGS
> +|| best_class == FLOAT_REGS
> +|| best_class == GENERAL_REGS
> +|| best_class == BASE_REGS);

This list makes me think...  Should there be a GEN_OR_ALTIVEC as well?


Segher


Re: C++ PATCH for c++/89214 - ICE when initializing aggregates with bases

2019-03-28 Thread Marek Polacek
On Thu, Mar 28, 2019 at 01:30:46PM -0400, Jason Merrill wrote:
> On 3/26/19 11:06 AM, Marek Polacek wrote:
> > On Tue, Mar 26, 2019 at 12:34:03PM +0100, Andreas Schwab wrote:
> > > I don't see any of the warnings in the tests on ia64.
> > > 
> > > FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 33)
> > > FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 34)
> > > FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 35)
> > > FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 45)
> > > FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 46)
> > > FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 47)
> > > PASS: g++.dg/cpp1z/aggr-base8.C  -std=c++17 (test for excess errors)
> > > 
> > > FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 22)
> > > FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 23)
> > > FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 31)
> > > FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 32)
> > > PASS: g++.dg/cpp1z/aggr-base9.C  -std=c++17 (test for excess errors)
> > 
> > I don't have access to an ia64 box -- I see none on Compile Farm.  But if
> > it doesn't ICE, can we just add ! { target ia64-*-* } to the dg-warnings?
> 
> Hmm, this behavior shouldn't be dependent on the target.

True.  But I can't reproduce the missing warning on e.g. ppc64.

Andreas, could you please find out why we're not hitting this code in
digest_init_r:

1210   tree elt = CONSTRUCTOR_ELT (stripped_init, 0)->value;
1211   if (reference_related_p (type, TREE_TYPE (elt)))

or if we are, is the reference_related_p condition not true?

Marek


Re: C++ PATCH for c++/89836 - bool constant expression and explicit conversions

2019-03-28 Thread Jason Merrill

On 3/27/19 3:08 PM, Marek Polacek wrote:

I noticed that this test doesn't compile because build_converted_constant_expr
failed to consider explicit conversion functions.  That's wrong: while Core
Issue 1981 specifies that explicit conversion functions are not considered for
contextual conversions, they are considered in contextual conversions to bool,
as defined in Core 2039.

Fixed by adding a new wrapper that uses LOOKUP_NORMAL instead of
LOOKUP_IMPLICIT.

Besides [dcl.fct.spec], these: [except.spec], [dcl.dcl]/6 (static_assert),
[stmt.if] (constexpr if) all talk about "a contextually converted constant
expression of type bool", so it would seem to make sense to use
build_converted_constant_bool_expr for them also.  E.g. use that instead of
perform_implicit_conversion_flags in build_noexcept_spec.  But that doesn't
work for this test:

int e();
int fn() noexcept(e);

because build_converted_constant_expr would issue a conversion error.  We're
converting "e" to "bool".  We have a ck_lvalue conversion from "int f()" to
"int (*f)()" and then a ck_std conversion from "int (*f)()" to bool.  Those
should work fine but we issue an error for the ck_std conversion.


Converting a pointer to bool is a "boolean conversion", which is not 
allowed under the rules for a converted constant expression 
([expr.const]p7).  So we should reject that testcase.



Also build_converted_constant_expr doesn't have the processing_template_decl
handling that perform_implicit_conversion_flags does, which (I believe) lead me
to changing check_narrowing to use maybe_constant_value instead of
fold_non_dependent_expr.


As I was saying elsewhere, I think that change was probably a mistake, 
and that we should have fixed that bug by changing the caller instead.



Anyway, this patch should be safe.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.

Jason


Re: [PATCH][Tree-optimization/PR89730]grant always_inline when -flive-patching=inline-only-static

2019-03-28 Thread Qing Zhao


> On Mar 28, 2019, at 11:30 AM, Qing Zhao  wrote:
> 
> Thanks.
> 
> I updated ipa-inline.c as you suggested, and tested the gcc with all 
> live-patching-*.c testing cases, all were fine.
> bootstrapped on aarch64, and now the regression testing is running.

the regression test passed without any issue.

qing
> 
> the new patch is as following:
> 
> Okay for trunk?
> 
> thanks.
> 
> Qing
> 
> gcc/ChangeLog:
> 
> 2019-03-28  qing zhao  
> 
>   PR tree-optimization/89730
>   * ipa-inline.c (can_inline_edge_p): Delete the checking for
>   -flive-patching=inline-only-static.
>   (can_inline_edge_by_limits_p): Add the checking for 
>   -flive-patching=inline-only-static and grant always_inline
>   even when -flive-patching=inline-only-static is specified. 
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-03-28  qing zhao  
> 
>   PR tree-optimization/89730
>   * gcc.dg/live-patching-4.c: New test.
> 
> 



Re: [PATCH, RFC] Wrong warning message fix for gcc 9

2019-03-28 Thread Roman Zhuykov
Ping

A week ago I decided it is so minor that I can report here with a
patch without creating bugzilla PR.
Technically it is a "9 regression" in new functionality added back in summer.

Patch was succesfully bootstrapped and regtested on x86_64.

Ok for trunk?

чт, 21 мар. 2019 г. в 18:53, Roman Zhuykov :
>
> Hello, I have found a minor diagnostic issue.
>
> Since r262910 we got a little odd error messages in cases like this:
> $ gcc -flto-compression-level=2-O2 -c empty.c
> gcc: error: argument to '-flto-compression-level=' is not between 0 and 9
> $ g++ -fabi-version=2-O2 -c empty.cpp
> cc1plus: error: '-fabi-version=1' is no longer supported
>
> Old messages were more correct:
> $ gcc-8 -flto-compression-level=2-O2 -c empty.c
> gcc: error: argument to '-flto-compression-level=' should be a
> non-negative integer
> $ g++-8 -fabi-version=2-O2 -c empty.cpp
> g++: error: argument to '-fabi-version=' should be a non-negative integer
>
> When UInteger option value string is not a number, but it's first char
> is a digit, integral_argument function returns -1 without setting
> *err.
>
> Proposed untested patch attached.
>
> --
> Best Regards,
> Roman Zhuykov
>
> gcc/ChangeLog:
>
> 2019-03-21  Roman Zhuykov  
>
> * opts-common.c (integral_argument): Set errno properly in one case.
>
> diff --git a/gcc/opts-common.c b/gcc/opts-common.c
> --- a/gcc/opts-common.c
> +++ b/gcc/opts-common.c
> @@ -205,8 +205,10 @@ integral_argument (const char *arg, int *err,
> bool byte_size_suffix)
> value = strtoull (arg, &end, 0);
> if (*end)
>   {
> -   /* errno is most likely EINVAL here.  */
> -   *err = errno;
> +   if (errno)
> + *err = errno;
> +   else
> + *err = EINVAL;
> return -1;
>   }


Re: [C++ debug PATCH] [PR88534] accept VAR_DECL in class literal template parms

2019-03-28 Thread Jason Merrill

On 3/27/19 6:56 PM, Martin Sebor wrote:

On 3/27/19 3:11 PM, Martin Sebor wrote:

On 3/27/19 4:44 AM, Jonathan Wakely wrote:

On 21/03/19 15:03 -0400, Jason Merrill wrote:

On 3/20/19 6:06 PM, Marek Polacek wrote:

On Wed, Mar 20, 2019 at 10:58:32PM +0100, Jakub Jelinek wrote:

On Wed, Mar 20, 2019 at 05:55:04PM -0400, Marek Polacek wrote:

On Wed, Mar 20, 2019 at 04:56:33PM -0300, Alexandre Oliva wrote:

On Mar 20, 2019, Marek Polacek  wrote:


This test fails with
pr88534.C:58:1: sorry, unimplemented: string literal in 
function template signature


Interesting...  gcc-8 rejected it with an error message 
rejecting the
template parameter, but my latest trunk build (dated Mar 13, 
r269641)
compiles it all right.  Was there a subsequent fix, maybe?  I 
didn't

realize it was supposed to be rejected.


Ah, that problem only started with r269814, namely this hunk:


Maybe this is done too early and should be postponed to 
genericization

(perhaps except for TREE_STATIC vars)?


Or skip when DECL is template_parm_object_p.


Or handle it in mangle.c.  I don't see any reason we shouldn't accept

struct A
{
 char c[4];
};

template  struct B { };
B b;

Probably we should use the same mangling whether the initializer for 
c was spelled as a string literal or list of integers.


The thing we still don't want to allow is mangling the *address* of 
a string literal.


Will that help PR 47488 as well?


What I have (attached) accepts all three test cases from PR 47488
(comment #0, #1, #2, and #5) but with different mangling.

The difference in the mangling of the function in the test case
in comment #0 is

   Clang: _Z1gIiEDTcl1fcvT__ELA1_KcEEERKS0_
   GCC 9: _Z1gIiEDTcl1fcvT__ELKc0EEERKS0_

I'm not very familiar with the C++ ABI mangling but from what
I can tell, Clang treats the type as a literal array of 1 const
char element (LA1_Kc) without actually encoding its value, while
with my patch GCC encodes it as a constant literal initializer
consisting of 1 null char (LKc0).  In other words, Clang would
mangle these two the same for the same T:

   template < typename T >
   decltype (f (T(), "123")) g (const T&);

   template < typename T >
   decltype (f (T(), "abc")) g (const T&);

while GCC would mangle them differently.

Which is correct?  Clang's seems correct in this case but would
it also be correct to mangle Jason's B the same as
B?


After some poking around I found the issue below that seems
to answer the question at least for Jason's example:

   https://github.com/itanium-cxx-abi/cxx-abi/issues/64

But this is an open issue that leaves some other questions
unanswered, mainly that of the exact encoding of the literal
(the length of the directly encoded subsequence and the hash
algorithm to compute the hash value of the string).

In any event, using the same encoding as for initializer
lists (i.e., (elt-type, elt-value)...) doesn't follow this
approach.

Should I prototype the solution outlined in the issue for
GCC 9?


I think let's leave that for the future, and use the aggregate encoding 
for GCC 9.



I suppose there's still the question of compatibility between
initializer lists and string literals, namely whether this

   B

is supposed to mangle the same as

   B

The ABI issue only talks about string literals and not braced
initializer lists.  I see in Revision History [150204] Add
mangling for braced initializer lists, so presumably there
is a difference, but I can't find what it is.


The issue for mangling template arguments of class type is

  https://github.com/itanium-cxx-abi/cxx-abi/issues/63

The mangling of braced-init-lists is

# type {expr-list}, conversion with braced-init-list argument
::= tl  * E

# {expr-list}, braced-init-list in any other context
::= il * E

Your patch doesn't use this mangling.  Here's a variant of my example 
that actually causes these to show up in the mangling:


struct A { char c[4]; };
template  struct B { };
void f(B) {}
void g(B) {}

Jason


Re: C++ PATCH for c++/89214 - ICE when initializing aggregates with bases

2019-03-28 Thread Jason Merrill

On 3/26/19 11:06 AM, Marek Polacek wrote:

On Tue, Mar 26, 2019 at 12:34:03PM +0100, Andreas Schwab wrote:

I don't see any of the warnings in the tests on ia64.

FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 33)
FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 34)
FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 35)
FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 45)
FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 46)
FAIL: g++.dg/cpp1z/aggr-base8.C  -std=c++17  (test for warnings, line 47)
PASS: g++.dg/cpp1z/aggr-base8.C  -std=c++17 (test for excess errors)

FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 22)
FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 23)
FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 31)
FAIL: g++.dg/cpp1z/aggr-base9.C  -std=c++17  (test for warnings, line 32)
PASS: g++.dg/cpp1z/aggr-base9.C  -std=c++17 (test for excess errors)


I don't have access to an ia64 box -- I see none on Compile Farm.  But if
it doesn't ICE, can we just add ! { target ia64-*-* } to the dg-warnings?


Hmm, this behavior shouldn't be dependent on the target.

Jason


Re: [PATCH][Tree-optimization/PR89730]grant always_inline when -flive-patching=inline-only-static

2019-03-28 Thread Qing Zhao
Thanks.

I updated ipa-inline.c as you suggested, and tested the gcc with all 
live-patching-*.c testing cases, all were fine.
bootstrapped on aarch64, and now the regression testing is running.

the new patch is as following:

Okay for trunk?

thanks.

Qing

gcc/ChangeLog:

2019-03-28  qing zhao  

PR tree-optimization/89730
* ipa-inline.c (can_inline_edge_p): Delete the checking for
-flive-patching=inline-only-static.
(can_inline_edge_by_limits_p): Add the checking for 
-flive-patching=inline-only-static and grant always_inline
even when -flive-patching=inline-only-static is specified. 


gcc/testsuite/ChangeLog:

2019-03-28  qing zhao  

PR tree-optimization/89730
* gcc.dg/live-patching-4.c: New test.



fixing-PR89730.patch
Description: Binary data


Re: [C++ PATCH] Fix SWITCH_STMT handling in potential_constant_expression_1 (PR c++/89785)

2019-03-28 Thread Jason Merrill

On 3/26/19 6:28 PM, Jakub Jelinek wrote:

Hi!

As the testcase shows, the SWITCH_STMT handling in 
potential_constant_expression_1
is quite conservative, it doesn't recurse on the body of the switch stmt,
because at least for the case where the switch condition isn't some easily
determinable constant, we'd need to try all possible values of the switch
expression (or start walking from all possible case labels, see PR for
details), but right now potential_constant_expression_1 doesn't have the
careful stmt skipping logic cxx_eval_* has anyway.

So, the following patch still doesn't recurse on SWITCH_STMT_BODY using
potential_constant_expression_1, but instead checks if the body contains
a return or continue stmt (the latter only if not nested in some loop body
inside of the switch body).  If there is no return nor continue, assuming
there is no endless loop (which wouldn't be constant expression due to the
ops limit) the switch body must fall through to the code after it.
If there is a return or continue, we set *jump_target to it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.

Jason



Re: [PATCH] optinfo-emit-json.cc: don't call get_fnname_from_decl (PR middle-end/89725)

2019-03-28 Thread David Malcolm
On Thu, 2019-03-28 at 15:14 +0100, Jakub Jelinek wrote:
> On Thu, Mar 28, 2019 at 11:02:52AM -0400, David Malcolm wrote:
> > optrecord_json_writer::optinfo_to_json can in theory be called from
> > any
> > optimization pass, but currently uses get_fnname_from_decl, which
> > is RTL-specific, which can lead to an ICE (PR middle-end/89725).
> 
> The ICE is a separate problem.  The problem with DECL_RTL early is
> that it is undesirable to create RTL for decls during gimple passes,
> we then need to make sure to clear it when versioning the function
> etc.
> 

Ah, my bad; thanks for the clarification.

Dave


[PATCH] Updated patch for PR84101

2019-03-28 Thread Richard Biener


This is an update from last years attempt to tame down vectorization
when it runs in to ABI inefficiencies at function return.  I've
updated the patch to look for multi-reg returns instead of
!vector ones (due to word_mode vectorization) and handle a bit
more returns, esp. the common pattern of

  ret = vector;
  D.1234 = ret;
  ret = {CLOBBER};
  return D.1234;

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

I know this isn't the ultimate solution but we keep getting
duplicates of the PR.

Richard.

2019-03-28  Richard Biener  

PR tree-optimization/84101
* tree-vect-stmts.c: Include explow.h for hard_function_value,
regs.h for hard_regno_nregs.
(cfun_returns): New helper.
(vect_model_store_cost): When vectorizing a store to a decl
we return and the function ABI returns in a multi-reg location
account for the possible spilling that will happen.

* gcc.target/i386/pr84101.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 269987)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
 #include "tree-cfg.h"
 #include "tree-ssa-loop-manip.h"
 #include "cfgloop.h"
+#include "explow.h"
 #include "tree-ssa-loop.h"
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
@@ -52,6 +53,7 @@ along with GCC; see the file COPYING3.
 #include "vec-perm-indices.h"
 #include "tree-ssa-loop-niter.h"
 #include "gimple-fold.h"
+#include "regs.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -948,6 +950,37 @@ vect_model_promotion_demotion_cost (stmt
  "prologue_cost = %d .\n", inside_cost, prologue_cost);
 }
 
+/* Returns true if the current function returns DECL.  */
+
+static bool
+cfun_returns (tree decl)
+{
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+{
+  greturn *ret = safe_dyn_cast  (last_stmt (e->src));
+  if (!ret)
+   continue;
+  if (gimple_return_retval (ret) == decl)
+   return true;
+  /* We often end up with an aggregate copy to the result decl,
+ handle that case as well.  First skip intermediate clobbers
+though.  */
+  gimple *def = ret;
+  do
+   {
+ def = SSA_NAME_DEF_STMT (gimple_vuse (def));
+   }
+  while (gimple_clobber_p (def));
+  if (is_a  (def)
+ && gimple_assign_lhs (def) == gimple_return_retval (ret)
+ && gimple_assign_rhs1 (def) == decl)
+   return true;
+}
+  return false;
+}
+
 /* Function vect_model_store_cost
 
Models cost for stores.  In the case of grouped accesses, one access
@@ -1032,6 +1065,37 @@ vect_model_store_cost (stmt_vec_info stm
   vec_to_scalar, stmt_info, 0, vect_body);
 }
 
+  /* When vectorizing a store into the function result assign
+ a penalty if the function returns in a multi-register location.
+ In this case we assume we'll end up with having to spill the
+ vector result and do piecewise loads.  */
+  tree base = get_base_address (STMT_VINFO_DATA_REF (stmt_info)->ref);
+  if (base
+  && (TREE_CODE (base) == RESULT_DECL
+ || (DECL_P (base) && cfun_returns (base)))
+  && !aggregate_value_p (base, cfun->decl))
+{
+  rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
+  /* ???  Handle PARALLEL in some way.  */
+  if (REG_P (reg))
+   {
+ int nregs = hard_regno_nregs (REGNO (reg), GET_MODE (reg));
+ /* Assume that a reg-reg move is possible and cheap,
+do not account for vector to gp register move cost.  */
+ if (nregs > 1)
+   {
+ /* Spill.  */
+ prologue_cost += record_stmt_cost (cost_vec, ncopies,
+vector_store,
+stmt_info, 0, vect_epilogue);
+ /* Loads.  */
+ prologue_cost += record_stmt_cost (cost_vec, ncopies * nregs,
+scalar_load,
+stmt_info, 0, vect_epilogue);
+   }
+   }
+}
+
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
  "vect_model_store_cost: inside_cost = %d, "
Index: gcc/testsuite/gcc.target/i386/pr84101.c
===
--- gcc/testsuite/gcc.target/i386/pr84101.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr84101.c (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-slp2-details" } */
+
+typedef struct uint64_pair uint64_pair_t ;
+struct uint64_pair
+{
+  unsigned long w0 ;
+  unsigned long w1 ;
+} ;
+
+uint64_pair_t pair(int num)
+{
+  uint64_pair_t p ;
+
+  p.w0 = num << 1 ;

Re: [PATCH] Follow-up regcprop fixes

2019-03-28 Thread Jeff Law
On 3/27/19 4:24 PM, Jakub Jelinek wrote:
> On Wed, Mar 27, 2019 at 04:32:15PM +0100, Jakub Jelinek wrote:
>> On Wed, Mar 27, 2019 at 09:24:46AM -0600, Jeff Law wrote:
 2) another thing I've noticed today, we queue up the debug insn changes,
 but if we iterate the second time for any bb, we overwrite and thus lose 
 any
 of the debug insn queued changes from the first iteration, so those changes
 aren't applied (wrong-debug?)
>>> This is actually what worries me, both with the current bits and with
>>> any bits that change to a worklist of blocks to reprocess.  As is I
>>> think we preserve the debug stuff across the iterations which should
>>> keep debug info correct.  Managing that in a world where we queue up the
>>> iteration step is going to be tougher.
>>
>> The patch I've posted would do df_analyze, all bbs once, then df_analyze,
>> process queued debug changes, and if anything needs to be repeated (just
>> once), redo the bbs from worklist and repeat the above.
>> That seems to me the easiest way to get rid of the per-bb df_analyze calls
>> as well as fixing the debug stuff.  Because otherwise, we'd need to somehow
>> merge the queued debug insns from the first and second pass and figure out
>> how to deal with that.
> 
> Here is everything in patch on top of current trunk which contains your
> regcprop.c changes.
> 
> In addition to the previously mentioned changes, as you've requested it
> clears the visited sbitmap between the two passes, fixes clearing of the
> all_vd[bb->index].e[regno].debug_insn_changes pointers after
> cprop_hardreg_debug processes them and fixes up the n_debug_insn_changes
> updates, so that the invariant that it always is the sum of length of the
> debug_insn_changes chains for all hard regs in the bb is true (before that
> n_debug_insn_changes could be higher, and in the final debug insn processing
> case wasn't actually decremented at all, so the == 0 check was useless
> there).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if it
> succeeds on some other targets too?
It's run through my tester without any issues.

> 
> 2019-03-27  Jakub Jelinek  
> 
>   * regcprop.c (copyprop_hardreg_forward_1): Remove redundant INSN_P
>   test.
>   (cprop_hardreg_bb, cprop_hardreg_debug): New functions.
>   (pass_cprop_hardreg::execute): Use those.  Don't repeat bb processing
>   immediately after first one with df_analyze in between, but rather
>   process all bbs, queueing ones that need second pass in a worklist,
>   df_analyze, process queued debug insn changes and if second pass is
>   needed, process bbs from worklist, df_analyze, process queued debug
>   insns again.
OK.

Jeff




Re: [PATCH] optinfo-emit-json.cc: don't call get_fnname_from_decl (PR middle-end/89725)

2019-03-28 Thread Jakub Jelinek
On Thu, Mar 28, 2019 at 11:02:52AM -0400, David Malcolm wrote:
> optrecord_json_writer::optinfo_to_json can in theory be called from any
> optimization pass, but currently uses get_fnname_from_decl, which
> is RTL-specific, which can lead to an ICE (PR middle-end/89725).

The ICE is a separate problem.  The problem with DECL_RTL early is
that it is undesirable to create RTL for decls during gimple passes,
we then need to make sure to clear it when versioning the function etc.
> 
> In that PR, Jakub suggested using either DECL_ASSEMBLER_NAME or the
> "printable name" (via current_function_name).
> 
> This patch makes it use DECL_ASSEMBLER_NAME.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
>   PR middle-end/89725
>   * optinfo-emit-json.cc (optrecord_json_writer::optinfo_to_json):
>   Use DECL_ASSEMBLER_NAME rather than get_fnname_from_decl.

Ok, thanks.

Jakub


[PATCH] optinfo-emit-json.cc: don't call get_fnname_from_decl (PR middle-end/89725)

2019-03-28 Thread David Malcolm
optrecord_json_writer::optinfo_to_json can in theory be called from any
optimization pass, but currently uses get_fnname_from_decl, which
is RTL-specific, which can lead to an ICE (PR middle-end/89725).

In that PR, Jakub suggested using either DECL_ASSEMBLER_NAME or the
"printable name" (via current_function_name).

This patch makes it use DECL_ASSEMBLER_NAME.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
PR middle-end/89725
* optinfo-emit-json.cc (optrecord_json_writer::optinfo_to_json):
Use DECL_ASSEMBLER_NAME rather than get_fnname_from_decl.
---
 gcc/optinfo-emit-json.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
index 814446b..1cfcdfe 100644
--- a/gcc/optinfo-emit-json.cc
+++ b/gcc/optinfo-emit-json.cc
@@ -411,7 +411,8 @@ optrecord_json_writer::optinfo_to_json (const optinfo 
*optinfo)
 
   if (current_function_decl)
 {
-  const char *fnname = get_fnname_from_decl (current_function_decl);
+  const char *fnname
+   = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (current_function_decl));
   obj->set ("function", new json::string (fnname));
 }
 
-- 
1.8.5.3



Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities

2019-03-28 Thread Jakub Jelinek
On Thu, Mar 28, 2019 at 09:55:46AM +0100, Richard Biener wrote:
> On Wed, Mar 27, 2019 at 4:26 PM Jeff Law  wrote:
> >
> > On 3/27/19 8:36 AM, Jakub Jelinek wrote:
> > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote:
> > >> However, I'm increasingly of the opinion that MIPS targets need to drop
> > >> off the priority platform list.  Given the trajectory I see for MIPS
> > >> based processors in industry, it's really hard to justify spending this
> > >> much time on them, particularly for low priority code quality issues.
> > >
> > > Besides what has been discussed on IRC for the PR89826 fix, that we really
> > > need a df_analyze before processing the first block, because otherwise we
> > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, 
> > > but I
> > > admit I don't know much about df nor regcprop.
> > RIght.  I plan to commit that today along with the test reordering you
> > pointed out.
> >
> > >
> > > 1) the df_analyze () after every (successful) processing of a basic block
> > > is IMHO way too expensive, I would be very surprised if df_analyze () 
> > > isn't
> > > quadratic in number of basic blocks and so one could construct testcases
> > > with millions of basic blocks and at least one regcprop change in each bb
> > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the
> > > 95% bbs you said won't have any changes at all)
> > I'm going to look this further today.
> 
> Look at 
> https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest
> and you'll see multiple testcases with 'hard reg cprop' >10% compile-time.
> It's indeed a hog for no good reason.

I've tried https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071#c1
in --enable-checking=yes,rtl,extra bootstrapped cc1 at -O2, without and with
the patch.
The important times in -ftime-report with vanilla trunk:
 phase opt and generate : 250.76 (100%)   2.00 ( 96%) 253.36 (100%) 
 768860 kB ( 99%)
 df live regs   :  19.95 (  8%)   0.03 (  1%)  19.39 (  8%) 
  0 kB (  0%)
 df live&initialized regs   :  20.29 (  8%)   0.05 (  2%)  19.73 (  8%) 
  0 kB (  0%)
 df reg dead/unused notes   : 158.66 ( 63%)   0.02 (  1%) 160.12 ( 63%) 
   4665 kB (  1%)
 hard reg cprop :  21.03 (  8%)   0.01 (  0%)  21.39 (  8%) 
509 kB (  0%)
 TOTAL  : 250.85  2.09253.57
 776940 kB
(ignoring everything <2% in the first % column).
Configure with --enable-checking=release to disable checks.
With the https://gcc.gnu.org/ml/gcc-patches/2019-03/msg01335.html patch the
same testcase with -O2 -ftime-report results in identical assembly, but:
 phase opt and generate :  28.92 (100%)   1.82 ( 95%)  30.85 ( 99%) 
 768882 kB ( 99%)
 CFG verifier   :   1.66 (  6%)   0.02 (  1%)   1.69 (  5%) 
  0 kB (  0%)
 df live regs   :   0.63 (  2%)   0.00 (  0%)   0.61 (  2%) 
  0 kB (  0%)
 df live&initialized regs   :   1.01 (  3%)   0.03 (  2%)   1.00 (  3%) 
  0 kB (  0%)
 df must-initialized regs   :   1.51 (  5%)   0.93 ( 48%)   2.46 (  8%) 
  0 kB (  0%)
 tree SSA verifier  :   2.79 ( 10%)   0.01 (  1%)   2.78 (  9%) 
  0 kB (  0%)
 tree STMT verifier :   2.00 (  7%)   0.00 (  0%)   1.99 (  6%) 
  0 kB (  0%)
 dominance computation  :   0.61 (  2%)   0.00 (  0%)   0.59 (  2%) 
  0 kB (  0%)
 out of ssa :   0.61 (  2%)   0.04 (  2%)   0.65 (  2%) 
  1 kB (  0%)
 loop init  :   0.58 (  2%)   0.00 (  0%)   0.63 (  2%) 
 38 kB (  0%)
 combiner   :   0.44 (  2%)   0.02 (  1%)   0.47 (  2%) 
  17926 kB (  2%)
 integrated RA  :   2.24 (  8%)   0.08 (  4%)   2.35 (  8%) 
 205177 kB ( 26%)
 LRA non-specific   :   1.46 (  5%)   0.05 (  3%)   1.50 (  5%) 
  19172 kB (  2%)
 LRA create live ranges :   1.23 (  4%)   0.00 (  0%)   1.23 (  4%) 
   2589 kB (  0%)
 reload CSE regs:   0.54 (  2%)   0.00 (  0%)   0.51 (  2%) 
   8456 kB (  1%)
 scheduling 2   :   0.73 (  3%)   0.09 (  5%)   0.81 (  3%) 
   2715 kB (  0%)
 verify RTL sharing :   1.19 (  4%)   0.00 (  0%)   1.15 (  4%) 
  0 kB (  0%)
 TOTAL  :  29.02  1.92 31.07
 776962 kB
So 8.5x usr time speedup with that patch.

Jakub


Re: [v3 PATCH] Don't revisit a variant we are already visiting.

2019-03-28 Thread Jonathan Wakely

On 28/03/19 15:21 +0200, Ville Voutilainen wrote:

On Thu, 28 Mar 2019 at 15:07, Ville Voutilainen
 wrote:


This shaves off some unnecessary codegen. In the assignment
operators and swap, we are already visiting the rhs, so we shouldn't
visit it again to get to its guts, we already have them in-hand.

2019-03-28  Ville Voutilainen  

Don't revisit a variant we are already visiting.
* include/std/variant (__variant_construct_single): New.
(__variant_construct): Use it.
(_M_destructive_move): Likewise, turn into a template.
(_M_destructive_copy): Likewise.
(_Copy_assign_base::operator=): Adjust.
(_Move_assign_base::operator=): Likewise.
(swap): Likewise.


Minor adjustment, use direct-init in all cases:

-+  auto __tmp = std::move(__rhs_mem);
++  auto __tmp(std::move(__rhs_mem));



diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 3932a8a..fe96cc8 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -428,20 +428,25 @@ namespace __variant
using _Variant_storage_alias =
_Variant_storage<_Traits<_Types...>::_S_trivial_dtor, _Types...>;

+  template
+  void __variant_construct_single(_Tp&& __lhs, _Up&& __rhs_mem)


Please indent this line to match the opening brace, not the
template-head.


@@ -490,34 +495,38 @@ namespace __variant
  std::forward<_Move_ctor_base>(__rhs));
  }

-  void _M_destructive_move(_Move_ctor_base&& __rhs)
-  {
-   this->_M_reset();
-   __try
- {
-   __variant_construct<_Types...>(*this,
- std::forward<_Move_ctor_base>(__rhs));
- }
-   __catch (...)
- {
-   this->_M_index = variant_npos;
-   __throw_exception_again;
- }
-  }
+  template
+void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs)


Does __rhs_index need to be a template parameter in these function
templates? It seems to always be passed size_t, but the value will
always fit in unsigned short. Could it just use size_t? Or even
unsigned, since we don't really need to pass 64 bits?

I think typename _Base::__index_type is strictly correct, but that's a
lot to type and to read.

OK with the indentation tweak, and I'll leave changing the _Idx or not
to your judgment.



Re: [PATCH] PR c/79022 fix mismatch parameter order in declaratio

2019-03-28 Thread Richard Biener
On Thu, Mar 28, 2019 at 2:28 PM Jonathan Wakely  wrote:
>
> The declaration of create_nested_ptr_option in the header has the 'from'
> and 'to' parameters in the opposite order from the definition in
> gengtype.c:
>
>   /* Return an options structure for a "nested_ptr" option.  */
>   options_p
>   create_nested_ptr_option (options_p next, type_p t,
> const char *to, const char *from)
>
> and the only caller in gengtype-parse.c:
>
>   return create_nested_ptr_option (prev, ty, to, from);
>
> This patch swaps the parameter names in the declaration.
>
> PR c/79022
> * gengtype.h (create_nested_ptr_option): Fix parameter names to match
> definition.
>
>
> I've rebuilt the compiler, but not run the tests, because this doesn't
> change the meaning of any code and seems almost obvious.
>
> OK for trunk?

OK.

>
>


[PATCH] PR c/79022 fix mismatch parameter order in declaratio

2019-03-28 Thread Jonathan Wakely

The declaration of create_nested_ptr_option in the header has the 'from'
and 'to' parameters in the opposite order from the definition in
gengtype.c:

 /* Return an options structure for a "nested_ptr" option.  */
 options_p
 create_nested_ptr_option (options_p next, type_p t,
   const char *to, const char *from)

and the only caller in gengtype-parse.c:

 return create_nested_ptr_option (prev, ty, to, from);

This patch swaps the parameter names in the declaration.

PR c/79022
* gengtype.h (create_nested_ptr_option): Fix parameter names to match
definition.


I've rebuilt the compiler, but not run the tests, because this doesn't
change the meaning of any code and seems almost obvious.

OK for trunk?


commit 434ffd58b7ecb9e750d6cb2d955567ba1c0f4af2
Author: Jonathan Wakely 
Date:   Thu Mar 28 13:07:30 2019 +

PR c/79022 fix mismatch parameter order in declaratio

The declaration of create_nested_ptr_option in the header has the 'from'
and 'to' parameters in the opposite order from the definition in
gengtype.c:

  /* Return an options structure for a "nested_ptr" option.  */
  options_p
  create_nested_ptr_option (options_p next, type_p t,
const char *to, const char *from)

and the only caller in gengtype-parse.c:

  return create_nested_ptr_option (prev, ty, to, from);

This patch swaps the parameter names in the declaration.

PR c/79022
* gengtype.h (create_nested_ptr_option): Fix parameter names to 
match
definition.

diff --git a/gcc/gengtype.h b/gcc/gengtype.h
index db9cb0f401a..02be0c16b55 100644
--- a/gcc/gengtype.h
+++ b/gcc/gengtype.h
@@ -203,8 +203,8 @@ options_p create_nested_option (options_p next, const char* 
name,
struct nested_ptr_data* info);
 
 /* Create a nested pointer option.  */
-options_p create_nested_ptr_option (options_p, type_p t,
-const char *from, const char *to);
+options_p create_nested_ptr_option (options_p next, type_p t,
+   const char *to, const char *from);
 
 /* A name and a type.  */
 struct pair {


Re: [v3 PATCH] Don't revisit a variant we are already visiting.

2019-03-28 Thread Ville Voutilainen
On Thu, 28 Mar 2019 at 15:07, Ville Voutilainen
 wrote:
>
> This shaves off some unnecessary codegen. In the assignment
> operators and swap, we are already visiting the rhs, so we shouldn't
> visit it again to get to its guts, we already have them in-hand.
>
> 2019-03-28  Ville Voutilainen  
>
> Don't revisit a variant we are already visiting.
> * include/std/variant (__variant_construct_single): New.
> (__variant_construct): Use it.
> (_M_destructive_move): Likewise, turn into a template.
> (_M_destructive_copy): Likewise.
> (_Copy_assign_base::operator=): Adjust.
> (_Move_assign_base::operator=): Likewise.
> (swap): Likewise.

Minor adjustment, use direct-init in all cases:

-+  auto __tmp = std::move(__rhs_mem);
++  auto __tmp(std::move(__rhs_mem));
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 3932a8a..fe96cc8 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -428,20 +428,25 @@ namespace __variant
 using _Variant_storage_alias =
 	_Variant_storage<_Traits<_Types...>::_S_trivial_dtor, _Types...>;
 
+  template
+  void __variant_construct_single(_Tp&& __lhs, _Up&& __rhs_mem)
+{
+  void* __storage = std::addressof(__lhs._M_u);
+  using _Type = remove_reference_t;
+  if constexpr (!is_same_v<_Type, __variant_cookie>)
+::new (__storage)
+	  _Type(std::forward(__rhs_mem));
+}
+
   template
 void __variant_construct(_Tp&& __lhs, _Up&& __rhs)
 {
   __lhs._M_index = __rhs._M_index;
-  void* __storage = std::addressof(__lhs._M_u);
-  __do_visit([__storage](auto&& __rhs_mem) mutable
+  __do_visit([&__lhs](auto&& __rhs_mem) mutable
 		 -> __detail::__variant::__variant_cookie
 {
-	  using _Type = remove_reference_t;
-	  if constexpr (is_same_v<__remove_cvref_t,
-			  remove_cv_t<_Type>>
-			&& !is_same_v<_Type, __variant_cookie>)
-	::new (__storage)
-	  _Type(std::forward(__rhs_mem));
+	  __variant_construct_single(std::forward<_Tp>(__lhs),
+ std::forward(__rhs_mem));
 	  return {};
 	}, __variant_cast<_Types...>(std::forward(__rhs)));
 }
@@ -490,34 +495,38 @@ namespace __variant
 	  std::forward<_Move_ctor_base>(__rhs));
   }
 
-  void _M_destructive_move(_Move_ctor_base&& __rhs)
-  {
-	this->_M_reset();
-	__try
-	  {
-	__variant_construct<_Types...>(*this,
-	  std::forward<_Move_ctor_base>(__rhs));
-	  }
-	__catch (...)
-	  {
-	this->_M_index = variant_npos;
-	__throw_exception_again;
-	  }
-  }
+  template
+void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __try
+	{
+	  __variant_construct_single(*this,
+	 std::forward<_Up>(__rhs));
+	}
+	  __catch (...)
+	{
+	  this->_M_index = variant_npos;
+	  __throw_exception_again;
+	}
+	}
 
-  void _M_destructive_copy(const _Move_ctor_base& __rhs)
-  {
-	this->_M_reset();
-	__try
-	  {
-	__variant_construct<_Types...>(*this, __rhs);
-	  }
-	__catch (...)
-	  {
-	this->_M_index = variant_npos;
-	__throw_exception_again;
-	  }
-  }
+  template
+void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __try
+	{
+	  __variant_construct_single(*this, __rhs);
+	}
+	  __catch (...)
+	{
+	  this->_M_index = variant_npos;
+	  __throw_exception_again;
+	}
+	}
 
   _Move_ctor_base(const _Move_ctor_base&) = default;
   _Move_ctor_base& operator=(const _Move_ctor_base&) = default;
@@ -530,17 +539,21 @@ namespace __variant
   using _Base = _Copy_ctor_alias<_Types...>;
   using _Base::_Base;
 
-  void _M_destructive_move(_Move_ctor_base&& __rhs)
-  {
-	this->_M_reset();
-	__variant_construct<_Types...>(*this,
-  std::forward<_Move_ctor_base>(__rhs));
-  }
-  void _M_destructive_copy(const _Move_ctor_base& __rhs)
-  {
-	this->_M_reset();
-	__variant_construct<_Types...>(*this, __rhs);
-  }
+  template
+void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __variant_construct_single(*this,
+ std::forward<_Up>(__rhs));
+	}
+  template
+void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __variant_construct_single(*this, __rhs);
+	}
 };
 
   template
@@ -580,12 +593,13 @@ namespace __variant
 		if constexpr (is_nothrow_copy_constructible_v<__rhs_type>
 		  || !is_nothrow_move_constructible_v<__rhs_type>)
 		  {
-			this->_M_destructive_copy(__rhs);
+			this->_M_destructive_copy(__rhs._M_index, __rhs_mem);
 		  }
 		else
 		  {
-			_Copy_assign_base __tmp(__rhs);
-			this->_M_destructive_move(std::move(__tmp));
+			a

Re: [PATCH 2/3] Extend format of -fdbg-cnt: add aux_base filter.

2019-03-28 Thread Martin Liška
On 3/28/19 1:53 PM, Richard Biener wrote:
> On Wed, Mar 27, 2019 at 10:52 AM marxin  wrote:
>>
> 
> Hmm, I don't think we want this - it looks too much like a hack to me.
> 
> I guess for an easier way to figure out the min/max ranges (do we
> handle anti-ranges?) would be to specify a function name the
> debug counter would be enabled for.  That might also help in your
> situation.

Well, it's a bit hack, but a useful one. In my case I had a couple
files which triggered the debug counter limit. Function filter will
work, but note that you have also an IPA debug counters. Here function
context can be not clear.

> 
> What I've usually done is do the bisection via dbg-counter on
> a single file, manually link the spec binary and then
> do a runcpu w/o recompiling the binary.

Having a complex build system, adjusting just -dbg-cnt is much more convenient
than doing a manual setup.

Martin

> 
> Richard.
> 
>> gcc/ChangeLog:
>>
>> 2019-03-27  Martin Liska  
>>
>> * dbgcnt.c (dbg_cnt_set_limit_by_name): Add new argument
>> aux_base and filter based on aux_base_name.
>> (dbg_cnt_process_single_pair): Parse aux_base.
>> * doc/invoke.texi: Document new extended format.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-03-27  Martin Liska  
>>
>> * gcc.dg/dbg-cnt-1.c: New test.
>> ---
>>  gcc/dbgcnt.c | 11 ---
>>  gcc/doc/invoke.texi  |  8 ++--
>>  gcc/testsuite/gcc.dg/dbg-cnt-1.c |  6 ++
>>  3 files changed, 20 insertions(+), 5 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/dbg-cnt-1.c
>>



Re: [PATCH 1/3] Fix multiple values for -fdbg-cnt.

2019-03-28 Thread Richard Biener
On Thu, Mar 28, 2019 at 2:05 PM Martin Liška  wrote:
>
> On 3/28/19 1:51 PM, Richard Biener wrote:
> > On Wed, Mar 27, 2019 at 10:52 AM marxin  wrote:
> >>
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-03-27  Martin Liska  
> >>
> >> * dbgcnt.c (dbg_cnt_process_single_pair): Fix GNU coding style.
> >> (dbg_cnt_process_opt): Parse first tokens aas
> >
> > as
>
> Better than 'ass' :)
>
> >
> > so what was the issue you fixed?
>
> $ gcc -c -fdbg-cnt=dce:0,vect_slp:3 main.c
> dbg_cnt 'dce' set to 0-3
>
> while correct output is:
>
> $ gcc-8 -c -fdbg-cnt=dce:0,vect_slp:3 main.c
> dbg_cnt 'dce' set to 0
> dbg_cnt 'vect_slp' set to 3

Ah.

OK.
Richard.

> Martin
>
> >
> >> dbg_cnt_process_single_pair is also using strtok.
> >> ---
> >>  gcc/dbgcnt.c | 25 +++--
> >>  1 file changed, 15 insertions(+), 10 deletions(-)
> >>
>


[v3 PATCH] Don't revisit a variant we are already visiting.

2019-03-28 Thread Ville Voutilainen
This shaves off some unnecessary codegen. In the assignment
operators and swap, we are already visiting the rhs, so we shouldn't
visit it again to get to its guts, we already have them in-hand.

2019-03-28  Ville Voutilainen  

Don't revisit a variant we are already visiting.
* include/std/variant (__variant_construct_single): New.
(__variant_construct): Use it.
(_M_destructive_move): Likewise, turn into a template.
(_M_destructive_copy): Likewise.
(_Copy_assign_base::operator=): Adjust.
(_Move_assign_base::operator=): Likewise.
(swap): Likewise.
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 3932a8a..003cc15 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -428,20 +428,25 @@ namespace __variant
 using _Variant_storage_alias =
 	_Variant_storage<_Traits<_Types...>::_S_trivial_dtor, _Types...>;
 
+  template
+  void __variant_construct_single(_Tp&& __lhs, _Up&& __rhs_mem)
+{
+  void* __storage = std::addressof(__lhs._M_u);
+  using _Type = remove_reference_t;
+  if constexpr (!is_same_v<_Type, __variant_cookie>)
+::new (__storage)
+	  _Type(std::forward(__rhs_mem));
+}
+
   template
 void __variant_construct(_Tp&& __lhs, _Up&& __rhs)
 {
   __lhs._M_index = __rhs._M_index;
-  void* __storage = std::addressof(__lhs._M_u);
-  __do_visit([__storage](auto&& __rhs_mem) mutable
+  __do_visit([&__lhs](auto&& __rhs_mem) mutable
 		 -> __detail::__variant::__variant_cookie
 {
-	  using _Type = remove_reference_t;
-	  if constexpr (is_same_v<__remove_cvref_t,
-			  remove_cv_t<_Type>>
-			&& !is_same_v<_Type, __variant_cookie>)
-	::new (__storage)
-	  _Type(std::forward(__rhs_mem));
+	  __variant_construct_single(std::forward<_Tp>(__lhs),
+ std::forward(__rhs_mem));
 	  return {};
 	}, __variant_cast<_Types...>(std::forward(__rhs)));
 }
@@ -490,34 +495,38 @@ namespace __variant
 	  std::forward<_Move_ctor_base>(__rhs));
   }
 
-  void _M_destructive_move(_Move_ctor_base&& __rhs)
-  {
-	this->_M_reset();
-	__try
-	  {
-	__variant_construct<_Types...>(*this,
-	  std::forward<_Move_ctor_base>(__rhs));
-	  }
-	__catch (...)
-	  {
-	this->_M_index = variant_npos;
-	__throw_exception_again;
-	  }
-  }
+  template
+void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __try
+	{
+	  __variant_construct_single(*this,
+	 std::forward<_Up>(__rhs));
+	}
+	  __catch (...)
+	{
+	  this->_M_index = variant_npos;
+	  __throw_exception_again;
+	}
+	}
 
-  void _M_destructive_copy(const _Move_ctor_base& __rhs)
-  {
-	this->_M_reset();
-	__try
-	  {
-	__variant_construct<_Types...>(*this, __rhs);
-	  }
-	__catch (...)
-	  {
-	this->_M_index = variant_npos;
-	__throw_exception_again;
-	  }
-  }
+  template
+void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __try
+	{
+	  __variant_construct_single(*this, __rhs);
+	}
+	  __catch (...)
+	{
+	  this->_M_index = variant_npos;
+	  __throw_exception_again;
+	}
+	}
 
   _Move_ctor_base(const _Move_ctor_base&) = default;
   _Move_ctor_base& operator=(const _Move_ctor_base&) = default;
@@ -530,17 +539,21 @@ namespace __variant
   using _Base = _Copy_ctor_alias<_Types...>;
   using _Base::_Base;
 
-  void _M_destructive_move(_Move_ctor_base&& __rhs)
-  {
-	this->_M_reset();
-	__variant_construct<_Types...>(*this,
-  std::forward<_Move_ctor_base>(__rhs));
-  }
-  void _M_destructive_copy(const _Move_ctor_base& __rhs)
-  {
-	this->_M_reset();
-	__variant_construct<_Types...>(*this, __rhs);
-  }
+  template
+void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __variant_construct_single(*this,
+ std::forward<_Up>(__rhs));
+	}
+  template
+void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs)
+{
+	  this->_M_reset();
+	  this->_M_index = __rhs_index;
+	  __variant_construct_single(*this, __rhs);
+	}
 };
 
   template
@@ -580,12 +593,13 @@ namespace __variant
 		if constexpr (is_nothrow_copy_constructible_v<__rhs_type>
 		  || !is_nothrow_move_constructible_v<__rhs_type>)
 		  {
-			this->_M_destructive_copy(__rhs);
+			this->_M_destructive_copy(__rhs._M_index, __rhs_mem);
 		  }
 		else
 		  {
-			_Copy_assign_base __tmp(__rhs);
-			this->_M_destructive_move(std::move(__tmp));
+			auto __tmp(__rhs_mem);
+			this->_M_destructive_move(__rhs._M_index,
+		  std::move(__tmp));
 		  }
 		  }
 		else
@@ -641,8 +655,8 @@ namespace __variant
 	  }
 	else
 	  {
-		_Move_assign_base __tmp(std::move(__rhs));
-		t

Re: [PATCH 1/3] Fix multiple values for -fdbg-cnt.

2019-03-28 Thread Martin Liška
On 3/28/19 1:51 PM, Richard Biener wrote:
> On Wed, Mar 27, 2019 at 10:52 AM marxin  wrote:
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-03-27  Martin Liska  
>>
>> * dbgcnt.c (dbg_cnt_process_single_pair): Fix GNU coding style.
>> (dbg_cnt_process_opt): Parse first tokens aas
> 
> as

Better than 'ass' :)

> 
> so what was the issue you fixed?

$ gcc -c -fdbg-cnt=dce:0,vect_slp:3 main.c
dbg_cnt 'dce' set to 0-3

while correct output is:

$ gcc-8 -c -fdbg-cnt=dce:0,vect_slp:3 main.c
dbg_cnt 'dce' set to 0
dbg_cnt 'vect_slp' set to 3

Martin

> 
>> dbg_cnt_process_single_pair is also using strtok.
>> ---
>>  gcc/dbgcnt.c | 25 +++--
>>  1 file changed, 15 insertions(+), 10 deletions(-)
>>



Re: [PATCH][Tree-optimization/PR89730]grant always_inline when -flive-patching=inline-only-static

2019-03-28 Thread Richard Biener
On Tue, Mar 26, 2019 at 5:25 PM Qing Zhao  wrote:
>
> Hi, Richard,
>
> thanks for the suggestion.
>
> I tried it yesterday, but it did not work.
>
> the reason is:
>
> inside “can_inline_edge_by_limits_p”,  the “allowance for always_inline” is 
> guarded by the following condition:
>
> else if “caller_tree != callee_tree”
>
> this condition is ONLY true when when callee has different optimization level 
> than the caller.
>
> So, I think that the correct spot for the checking of 
> live-patching=inline-only-static should still be inside “can_inline_edge_p”.
>
> so my previous patch for this bug should be fine.
>
> what’s your opinion?

You could still move it after the following?

  /* Check if caller growth allows the inlining.  */
  if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
  && !disregard_limits
  && !lookup_attribute ("flatten",
 DECL_ATTRIBUTES (caller->decl))
  && !caller_growth_limits (e))
inlinable = false;

that said, looking at DECL_DISREGARD_INLINE_LIMITS in
can_inline_edge_p looks wrong.

Richard.

> thanks.
>
> Qing
>
> > On Mar 25, 2019, at 7:23 AM, Richard Biener  
> > wrote:
> >
> > On Wed, Mar 20, 2019 at 4:16 PM Qing Zhao  wrote:
> >>
> >> Hi,
> >>
> >> there is a bug in the current support for 
> >> -flive-patching=inline-only-static:
> >>
> >> it rejects inlining of external always_inline routine, therefore triggers 
> >> a compilation time error.
> >>
> >> we should always inline “always_inline” routines.
> >>
> >> please review the following simple patch, it has been bootstrapped and 
> >> regression tested on aarch64.
> >>
> >> Okay for committing?
> >
> > To me this seems like can_inline_edge_p was the wrong spot to disable 
> > inlining
> > for -flive-patching=inline-only-static and instead it should have been in
> > can_inline_edge_by_limits_p which already has proper allowance for
> > always-inline.
> >
> > So can you move the check there, like after
> >
> >  /* gcc.dg/pr43564.c.  Apply user-forced inline even at -O0.  */
> >  else if (always_inline)
> >;
> >
> > ?
> >
> >> thanks.
> >>
> >> Qing.
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-03-20  qing zhao  
> >>
> >>PR tree-optimization/89730
> >>   * ipa-inline.c (can_inline_edge_p): Grant always_inline even when
> >>   -flive-patching=inline-only-static.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-03-20  qing zhao  
> >>
> >>* gcc.dg/live-patching-4.c: New test.
> >>
>


Re: [PATCH 2/3] Extend format of -fdbg-cnt: add aux_base filter.

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 10:52 AM marxin  wrote:
>

Hmm, I don't think we want this - it looks too much like a hack to me.

I guess for an easier way to figure out the min/max ranges (do we
handle anti-ranges?) would be to specify a function name the
debug counter would be enabled for.  That might also help in your
situation.

What I've usually done is do the bisection via dbg-counter on
a single file, manually link the spec binary and then
do a runcpu w/o recompiling the binary.

Richard.

> gcc/ChangeLog:
>
> 2019-03-27  Martin Liska  
>
> * dbgcnt.c (dbg_cnt_set_limit_by_name): Add new argument
> aux_base and filter based on aux_base_name.
> (dbg_cnt_process_single_pair): Parse aux_base.
> * doc/invoke.texi: Document new extended format.
>
> gcc/testsuite/ChangeLog:
>
> 2019-03-27  Martin Liska  
>
> * gcc.dg/dbg-cnt-1.c: New test.
> ---
>  gcc/dbgcnt.c | 11 ---
>  gcc/doc/invoke.texi  |  8 ++--
>  gcc/testsuite/gcc.dg/dbg-cnt-1.c |  6 ++
>  3 files changed, 20 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/dbg-cnt-1.c
>


Re: [PATCH 1/3] Fix multiple values for -fdbg-cnt.

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 10:52 AM marxin  wrote:
>
>
> gcc/ChangeLog:
>
> 2019-03-27  Martin Liska  
>
> * dbgcnt.c (dbg_cnt_process_single_pair): Fix GNU coding style.
> (dbg_cnt_process_opt): Parse first tokens aas

as

so what was the issue you fixed?

> dbg_cnt_process_single_pair is also using strtok.
> ---
>  gcc/dbgcnt.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
>


Re: [PATCH 3/3] Dump -fdbg-cnt limit reach also to stderr stream.

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 10:52 AM marxin  wrote:
>

Hmm, I guess I can see how this is useful, thus OK.

Richard.

> gcc/ChangeLog:
>
> 2019-03-27  Martin Liska  
>
> * dbgcnt.c (print_limit_reach): New function.
> (dbg_cnt): Use it.
> ---
>  gcc/dbgcnt.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
>


Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c

2019-03-28 Thread Alan Modra
On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote:
> Also, using a register move cost of 2 for for power9 direct moves
> gives these fails, even with the .md file tweaks:
> +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler-not mtvsrwz
> +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler vspltisb
> +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler xxspltib
> +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler-not mtvsrwz
> +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler vspltisw
> +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler-not mtvsrwz
> +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler vspltish
> +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler xxspltib
> +FAIL: gcc.target/powerpc/vsx-himode3.c scan-assembler lxsihzx
> +FAIL: gcc.target/powerpc/vsx-qimode3.c scan-assembler lxsibzx
> These can be all be fixed by removing "?"s disparaging vector
> alternatives in movsi_internal1 and mov_internal.

Like this.  Bootstrapped and regression tested powerpc64le-linux.
OK for stage1?

* config/rs6000/rs6000.c (rs6000_register_move_cost): Reduce
power9 direct move cost.
* config/rs6000/rs6000.md (movsi_internal1): Don't disparage
vector alternatives.
(mov_internal): Likewise, excepting alternative that
will be split.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fbc16c65de4..a9e27b356df 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -35010,7 +35010,7 @@ rs6000_register_move_cost (machine_mode mode,
  if (rs6000_tune == PROCESSOR_POWER8)
ret = 4 * hard_regno_nregs (0, mode);
  else
-   ret = 3 * hard_regno_nregs (0, mode);
+   ret = 2 * hard_regno_nregs (0, mode);
  /* SFmode requires a conversion when moving between gprs
 and vsx.  */
  if (mode == SFmode)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d383504c600..2fbab973907 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6832,10 +6832,10 @@ (define_insn "movsi_low"
 ;; MF%1 MT%0 NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-   "=r, r,   r,   ?wI, ?wH,
-m,  ?Z,  ?Z,  r,   r,
-r,  ?wIwH,   ?wJwK,   ?wJwK,   ?wu,
-?wJwK,  ?wH, ?wK, ?wIwH,   ?r,
+   "=r, r,   r,   wI,  wH,
+m,  Z,   Z,   r,   r,
+r,  wIwH,wJwK,wJwK,wu,
+wJwK,   wH,  wK,  wIwH,r,
 r,  *h,  *h")
 
(match_operand:SI 1 "input_operand"
@@ -7106,13 +7106,13 @@ (define_expand "mov"
 ;; MTVSRWZ MF%1   MT%1   NOP
 (define_insn "*mov_internal"
   [(set (match_operand:QHI 0 "nonimmediate_operand"
-   "=r,r, ?wJwK, m, Z, r,
-?wJwK, ?wJwK, ?wJwK, ?wK,   ?wK,   r,
-?wJwK, r, *c*l,  *h")
+   "=r,r, wJwK,  m, Z, r,
+wJwK,  wJwK,  wJwK,  wK,?wK,   r,
+wJwK,  r, *c*l,  *h")
 
(match_operand:QHI 1 "input_operand"
"r, m, Z, r, wJwK,  i,
-wJwK,  O, wM,wB,wS,?wJwK,
+wJwK,  O, wM,wB,wS,wJwK,
 r, *h,r, 0"))]
 
   "gpc_reg_operand (operands[0], mode)


-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Small cleanup in rtl.h

2019-03-28 Thread Richard Biener
On Wed, 27 Mar 2019, Jakub Jelinek wrote:

> Hi!
> 
> This cleans up something I found ugly already several times.
> NONDEBUG_INSN_P(X) used to be defined as
> ((GET_CODE (X) == INSN || GET_CODE (X) == DEBUG_INSN
>   || GET_CODE (X) == JUMP_INSN || GET_CODE (X) == CALL_INSN)
>  && GET_CODE (X) != DEBUG_INSN)
> rather than the simpler
> (GET_CODE (X) == INSN || GET_CODE (X) == JUMP_INSN || GET_CODE (X) == 
> CALL_INSN)
> 
> INSN_P is defined the same as before, just with DEBUG_INSN test at the end.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2019-03-27  Jakub Jelinek  
> 
>   * rtl.h (NONDEBUG_INSN_P): Define as NONJUMP_INSN_P or JUMP_P
>   or CALL_P instead of INSN_P && !DEBUG_INSN_P.
>   (INSN_P): Define using NONDEBUG_INSN_P or DEBUG_INSN_P.
> 
> --- gcc/rtl.h.jj  2019-01-25 23:46:08.058166588 +0100
> +++ gcc/rtl.h 2019-03-27 17:35:39.348562954 +0100
> @@ -840,7 +840,7 @@ struct GTY(()) rtvec_def {
>  #define DEBUG_INSN_P(X) (GET_CODE (X) == DEBUG_INSN)
>  
>  /* Predicate yielding nonzero iff X is an insn that is not a debug insn.  */
> -#define NONDEBUG_INSN_P(X) (INSN_P (X) && !DEBUG_INSN_P (X))
> +#define NONDEBUG_INSN_P(X) (NONJUMP_INSN_P (X) || JUMP_P (X) || CALL_P (X))
>  
>  /* Nonzero if DEBUG_MARKER_INSN_P may possibly hold.  */
>  #define MAY_HAVE_DEBUG_MARKER_INSNS debug_nonbind_markers_p
> @@ -851,8 +851,7 @@ struct GTY(()) rtvec_def {
>(MAY_HAVE_DEBUG_MARKER_INSNS || MAY_HAVE_DEBUG_BIND_INSNS)
>  
>  /* Predicate yielding nonzero iff X is a real insn.  */
> -#define INSN_P(X) \
> -  (NONJUMP_INSN_P (X) || DEBUG_INSN_P (X) || JUMP_P (X) || CALL_P (X))
> +#define INSN_P(X) (NONDEBUG_INSN_P (X) || DEBUG_INSN_P (X))
>  
>  /* Predicate yielding nonzero iff X is a note insn.  */
>  #define NOTE_P(X) (GET_CODE (X) == NOTE)
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 4:26 PM Jeff Law  wrote:
>
> On 3/27/19 8:36 AM, Jakub Jelinek wrote:
> > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote:
> >> However, I'm increasingly of the opinion that MIPS targets need to drop
> >> off the priority platform list.  Given the trajectory I see for MIPS
> >> based processors in industry, it's really hard to justify spending this
> >> much time on them, particularly for low priority code quality issues.
> >
> > Besides what has been discussed on IRC for the PR89826 fix, that we really
> > need a df_analyze before processing the first block, because otherwise we
> > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I
> > admit I don't know much about df nor regcprop.
> RIght.  I plan to commit that today along with the test reordering you
> pointed out.
>
> >
> > 1) the df_analyze () after every (successful) processing of a basic block
> > is IMHO way too expensive, I would be very surprised if df_analyze () isn't
> > quadratic in number of basic blocks and so one could construct testcases
> > with millions of basic blocks and at least one regcprop change in each bb
> > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the
> > 95% bbs you said won't have any changes at all)
> I'm going to look this further today.

Look at 
https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest
and you'll see multiple testcases with 'hard reg cprop' >10% compile-time.
It's indeed a hog for no good reason.

Richard.

> >
> > 2) another thing I've noticed today, we queue up the debug insn changes,
> > but if we iterate the second time for any bb, we overwrite and thus lose any
> > of the debug insn queued changes from the first iteration, so those changes
> > aren't applied (wrong-debug?)
> This is actually what worries me, both with the current bits and with
> any bits that change to a worklist of blocks to reprocess.  As is I
> think we preserve the debug stuff across the iterations which should
> keep debug info correct.  Managing that in a world where we queue up the
> iteration step is going to be tougher.
>
> Queuing the iteration step feels like gcc-10 to me, but we'll see if it
> falls out easier than expected.
>
> >
> > 3) as mentioned on IRC, the order of tests in copyprop_hardreg_forward_1
> > conditional doesn't start from the cheapest to most expensive one
> That's going to be fixed in today's commit since it's been tested.
>
> Jeff


Re: GCC 7 backport

2019-03-28 Thread Martin Liška
Hi.

Backporting one documentation fix.

Martin
>From 7bcca48f559a3fefaf37b177b3c72e78d73d73ba Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 28 Mar 2019 09:51:06 +0100
Subject: [PATCH] Backport r265786

gcc/ChangeLog:

2018-11-05  Martin Liska  

	PR web/87829
	* doc/invoke.texi: Remove options that are
	not disabled with -Os.
---
 gcc/doc/invoke.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a41d275e89d..8f279e454b0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7140,8 +7140,7 @@ do not typically increase code size.
 
 @option{-Os} disables the following optimization flags:
 @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol
--falign-labels  -freorder-blocks  -freorder-blocks-algorithm=stc @gol
--freorder-blocks-and-partition  -fprefetch-loop-arrays}
+-falign-labels  -fprefetch-loop-arrays}
 
 It also enables @option{-finline-functions}, causes the compiler to tune for
 code size rather than execution speed, and performs further optimizations
-- 
2.21.0



Re: GCC 8 backports

2019-03-28 Thread Martin Liška
Hi.

I'm backporting a documentation fix.

Martin
>From 1055e2c2787be337c33787608473a7a4bcbe82b8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 28 Mar 2019 09:47:27 +0100
Subject: [PATCH] Backport r265786

gcc/ChangeLog:

2018-11-05  Martin Liska  

	PR web/87829
	* doc/invoke.texi: Remove options that are
	not disabled with -Os.
---
 gcc/doc/invoke.texi | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4ee24c3ab78..de86d7a1e01 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7680,8 +7680,7 @@ do not typically increase code size.
 
 @option{-Os} disables the following optimization flags:
 @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol
--falign-labels  -freorder-blocks  -freorder-blocks-algorithm=stc @gol
--freorder-blocks-and-partition  -fprefetch-loop-arrays}
+-falign-labels  -fprefetch-loop-arrays}
 
 It also enables @option{-finline-functions}, causes the compiler to tune for
 code size rather than execution speed, and performs further optimizations
-- 
2.21.0



Re: [PATCH] Revert r254150 (PR bootstrap/89829).

2019-03-28 Thread Richard Biener
On Wed, Mar 27, 2019 at 11:02 AM Martin Liška  wrote:
>
> Hi.
>
> The revision should be reverted as prev-* is the location
> where stagetrain stores all *.gcda files.
>
> I've tested that with PGO on x86_64-linux-gnu.
>
> Ready for trunk?

OK.

> Thanks,
> Martin
>
> ChangeLog:
>
> 2019-03-27  Martin Liska  
>
> PR bootstrap/89829
> * Makefile.in: Revert r254150.
> * Makefile.tpl: Likewise.
> ---
>  Makefile.in  | 4 ++--
>  Makefile.tpl | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
>


Re: Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)

2019-03-28 Thread Uros Bizjak
On Thu, Mar 28, 2019 at 7:47 AM Hongtao Liu  wrote:
>
> Hi Uros:
>   would you help to review this patch?

This is AVX512F patch, you will need the approval from the maintainer
first. I have no plans to maintain AVX512 beyond rubber-stamping OK
dead obvious regression from a reputable contributors. It is simply
too much involvment for me. If the appointed maintainer doesn't
respond anymore, then I suggest you raise the issue with GCC steering
committe.

Uros.

> Regards,
> Hongtao.
>
> On Sun, Mar 24, 2019 at 8:13 PM Hongtao Liu  wrote:
> >
> > Hi:
> >   The following patch adds forgotten avx512f fpclass instrinsics for
> > masked scalar operations.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512),
> > ok for trunk?
> >
> > Index: ChangeLog
> > ===
> > --- ChangeLog (revision 269894)
> > +++ ChangeLog (working copy)
> > @@ -1,3 +1,16 @@
> > +2019-03-24 Hongtao Liu 
> > +
> > + PR target/89803
> > + * config/i386/avx512dqintrin.h
> > + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask):
> > + New intrinsics.
> > + * config/i386/i386-builtin.def
> > + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask):
> > + New builtins.
> > + * config/i386/sse.md
> > + (define_insn "avx512dq_vmfpclass):
> > + Modified with mask.
> > +
> >  2019-03-23  Segher Boessenkool  
> >
> >   * config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit
> > Index: config/i386/avx512dqintrin.h
> > ===
> > --- config/i386/avx512dqintrin.h (revision 269894)
> > +++ config/i386/avx512dqintrin.h (working copy)
> > @@ -1372,6 +1372,20 @@
> >return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm);
> >  }
> >
> > +extern __inline __mmask8
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm)
> > +{
> > +  return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, 
> > __U);
> > +}
> > +
> > +extern __inline __mmask8
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm)
> > +{
> > +  return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, 
> > __U);
> > +}
> > +
> >  extern __inline __m512i
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R)
> > @@ -2623,6 +2637,12 @@
> >  #define _mm_fpclass_sd_mask(X, C) \
> >((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) 
> > (C))) \
> >
> > +#define _mm_mask_fpclass_ss_mask(X, C, U) \
> > +  ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X),
> > (int) (C)), (__mmask8) (U))
> > +
> > +#define _mm_mask_fpclass_sd_mask(X, C, U) \
> > +  ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X),
> > (int) (C)), (__mmask8) (U))
> > +
> >  #define _mm512_mask_fpclass_pd_mask(u, X, C)\
> >((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \
> >   (int) (C), (__mmask8)(u)))
> > Index: config/i386/i386-builtin.def
> > ===
> > --- config/i386/i386-builtin.def (revision 269894)
> > +++ config/i386/i386-builtin.def (working copy)
> > @@ -2082,9 +2082,11 @@
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv4df_mask,
> > "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256,
> > UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv2df_mask,
> > "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128,
> > UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df,
> > "__builtin_ia32_fpclasssd", IX86_BUILTIN_FPCLASSSD, UNKNOWN, (int)
> > QI_FTYPE_V2DF_INT)
> > +BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
> > CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask",
> > IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv8sf_mask,
> > "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256,
> > UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512dq_fpclassv4sf_mask,
> > "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128,
> > UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI)
> >  BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv4sf,
> > "__builtin_ia32_fpcla", IX86_BUILTIN_FPCLA, UNKNOWN, (int)
> > QI_FTYPE_V4SF_INT)
> > +BDESC (OPTION_MASK_ISA_AVX512DQ, 0,
> > CODE_FOR_avx512dq_vmfpclassv4sf_mask, "__builtin_ia32_fpcla_mask",
> > IX86_BUILTIN_FPCLA_MASK, UNKNOWN, (int)