Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c
On Thu, Mar 28, 2019 at 01:08:55PM -0500, Segher Boessenkool wrote: > > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order > > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS > > as an allocno class. It is similar to the aarch64 version but without > > any selection by regno mode if the best class is a union class. > > It would be nice if we could do without such hacks. Alas. Yeah. We have some serious problems with register pressure calculations in sched1 and ira. This is merely a workaround for the regression. I intend to poke a little more at the scheduler to see whether I can figure out a proper fix, but for now this is the best I have. > > OK assuming Pat's latest spec test run doesn't contain any nasty > > surprises? > > Not for stage 4, no. Sorry. But it should be great in stage 1 :-) Good. I'm happy to leave this until next stage 1. > > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h > > index 9fb36e41e7d..20c59f89c8f 100644 > > --- a/gcc/config/rs6000/darwin.h > > +++ b/gcc/config/rs6000/darwin.h > > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands; > >&& reg_class_subset_p (BASE_REGS, (CLASS))) \ > > ? BASE_REGS \ > > : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT\ > > - && (CLASS) == GEN_OR_FLOAT_REGS) \ > > + && ((CLASS) == GEN_OR_FLOAT_REGS \ > > + || (CLASS) == GEN_OR_VSX_REGS)) \ > > ? GENERAL_REGS \ > > : (CLASS)) > > Darwin doesn't do VSX at all... But maybe there is something that can get > allocated to both FPRs and VRs, sure. And GPRs. > > This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places > where we should care about this change for correctness. For sure there are other places. A new union register class trips all those places where union classes fail. For example, ira.c:setup_class_translate_array doesn't give useful answers for GEN_OR_VSX_REGS, or GEN_OR_FLOAT_REGS for that matter. That makes uses of ira_allocno_class_translate and ira_pressure_class_translate for pseudos suspect whenever one of the union classes is involved. rs6000_ira_change_pseudo_allocno_class helps of course, in that GEN_OR_VSX_REGS mostly won't be used. > > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum > > reg_class rclass) > >return NO_REGS; > > } > > > > - if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS) > > + if (GET_MODE_CLASS (mode) == MODE_INT > > + && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS)) > > return GENERAL_REGS; > > Maybe do this whenever rclass contains the GPRs? Well, you caught me out here. Like the darwin.h change I made this change early on in the process of fixing register_move_cost, on the grounds that whatever we did for GEN_OR_FLOAT_REGS probably should be done for GEN_OR_VSX_REGS. That's not a really good reason particularly since this code is so old (git a99459e46d, svn r35162). Maybe it's just a reload hack? So you might be better questioning the need for this change at all. I'll see how the test results look if I remove it. Int modes can live in VSX regs, after all. > > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode, > >reg_class_t from, reg_class_t to) > > { > >int ret; > > + reg_class_t rclass; > > > >if (TARGET_DEBUG_COST) > > dbg_cost_ctrl++; > > > > + /* If we have VSX, we can easily move between FPR or Altivec registers, > > + otherwise we can only easily move within classes. > > + Do this first so we give best-case answers for union classes > > + containing both gprs and vsx regs. */ > > + HARD_REG_SET to_vsx, from_vsx; > > + COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]); > > + AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]); > > + COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]); > > + AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]); > > This is a bit expensive to run at every call of rs6000_register_move_cost. > Can it be precomputed easily? register_move_cost tends to be cached by callers of this function, so I'm inclined to go for simple and correct rather than fast and complicated. > > + { > > + if (TARGET_DIRECT_MOVE > > + && (rs6000_tune == PROCESSOR_POWER8 > > + || rs6000_tune == PROCESSOR_POWER9)) > > TARGET_DIRECT_MOVE is always on for these CPUs. Should this also use the > m*vsr* cost with say -mcpu=power7 -mtune=power9? No, because if we don't generate m*vsr*, and we shouldn't, then that would be telling a lie. > This list makes me think... Should there be a GEN_OR_ALTIVEC as well? That might make sense for pre-vsx processors, if you can find a testcase where the GENERAL_REGS cost is
[PR89862] Fix ARM lto bootstrap
Hi All, LTO bootstrap for ARM fails with the commit commit 67c18bce7054934528ff5930cca283b4ac967dca * combine.c (record_dead_and_set_regs_1): Record the source unmodified for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target. It fails with an internal compiler error: in operator+=, at profile-count.h:792. With the commit now we are not generating gen_lowpart for CONST_INT as in (set (subreg:SI (reg:QI 1434) 0) (const_int 224 [0xe0])) and likes. As discussed in the PR, attached patch fixes this and fixes the bootstrap failure. I am not able to create a reduced testcase for this. However, it is being tested with LTO bootstrap for ARM. I therefore believe that it is OK. I have also tested the patch with x86_64-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan diff --git a/gcc/rtl.h b/gcc/rtl.h index f991919..52ecd5a 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -4401,6 +4401,7 @@ word_register_operation_p (const_rtx x) { switch (GET_CODE (x)) { +case CONST_INT: case ROTATE: case ROTATERT: case SIGN_EXTRACT: log Description: Binary data
Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c
> On 28 Mar 2019, at 18:08, Segher Boessenkool > wrote: > >> diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h >> index 9fb36e41e7d..20c59f89c8f 100644 >> --- a/gcc/config/rs6000/darwin.h >> +++ b/gcc/config/rs6000/darwin.h >> @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands; >> && reg_class_subset_p (BASE_REGS, (CLASS)))\ >>? BASE_REGS \ >>: (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT \ >> - && (CLASS) == GEN_OR_FLOAT_REGS) \ >> + && ((CLASS) == GEN_OR_FLOAT_REGS \ >> + || (CLASS) == GEN_OR_VSX_REGS)) \ >>? GENERAL_REGS\ >>: (CLASS)) > > Darwin doesn't do VSX at all… Well.. Darwin doesn’t currently run on any CPU with VSX hardware… However, in their wisdom, the folks who were implementing it way back when made Darwin have a soft implementation of V2DF and V2DI (in case that matters, which seems unlikely in this context). > But maybe there is something that can get > allocated to both FPRs and VRs, sure. And GPRs. not sure what is being asked or stated here - is there a question about ABI, or just the assertion that some quantities could be placed in GPRs, FPRs or VRs? (the latter seems reasonable to me, the former I’d need to think some more about). Iain
Re: [PATCH] [ARC] PR 88409: miscompilation due to missing cc clobber in longlong.h macros
On 3/28/19 5:07 PM, Marc Glisse wrote: > On Thu, 28 Mar 2019, Vineet Gupta wrote: > >> simple test such as below was failing. >> >> | void main(int argc, char *argv[]) >> | { >> |size_t total_time = 115424; // expected 115.424 >> |double secs = (double)total_time/(double)1000; >> |printf("%s %d %lf\n", "secs", total_time, secs); // prints 113.504 >> |printf("%d\n", (size_t)secs); >> | } >> >> The printf eventually called into glibc stdlib/divrem.c:__mpn_divrem() >> which uses the __arc__ specific inline asm macros from longlong.h which >> were causing miscompilation. > Hello, > > do you intend to post similar patches for glibc and gmp, which both have a > similar longlong.h? Yeah, glibc typically "syncs" longlong.h from gcc so once gcc patch is merged, I'll post a sync patch to glibc. Good tip about gmp, I wasn't aware of that. I suppose I could post there too once dust settles on gcc side. Thx, -Vinet
Re: [PATCH] [ARC] PR 88409: miscompilation due to missing cc clobber in longlong.h macros
On Thu, 28 Mar 2019, Vineet Gupta wrote: simple test such as below was failing. | void main(int argc, char *argv[]) | { |size_t total_time = 115424; // expected 115.424 |double secs = (double)total_time/(double)1000; |printf("%s %d %lf\n", "secs", total_time, secs); // prints 113.504 |printf("%d\n", (size_t)secs); | } The printf eventually called into glibc stdlib/divrem.c:__mpn_divrem() which uses the __arc__ specific inline asm macros from longlong.h which were causing miscompilation. Hello, do you intend to post similar patches for glibc and gmp, which both have a similar longlong.h? -- Marc Glisse
Re: [C++ debug PATCH] [PR88534] accept VAR_DECL in class literal template parms
On 3/28/19 11:45 AM, Jason Merrill wrote: On 3/27/19 6:56 PM, Martin Sebor wrote: On 3/27/19 3:11 PM, Martin Sebor wrote: On 3/27/19 4:44 AM, Jonathan Wakely wrote: On 21/03/19 15:03 -0400, Jason Merrill wrote: On 3/20/19 6:06 PM, Marek Polacek wrote: On Wed, Mar 20, 2019 at 10:58:32PM +0100, Jakub Jelinek wrote: On Wed, Mar 20, 2019 at 05:55:04PM -0400, Marek Polacek wrote: On Wed, Mar 20, 2019 at 04:56:33PM -0300, Alexandre Oliva wrote: On Mar 20, 2019, Marek Polacek wrote: This test fails with pr88534.C:58:1: sorry, unimplemented: string literal in function template signature Interesting... gcc-8 rejected it with an error message rejecting the template parameter, but my latest trunk build (dated Mar 13, r269641) compiles it all right. Was there a subsequent fix, maybe? I didn't realize it was supposed to be rejected. Ah, that problem only started with r269814, namely this hunk: Maybe this is done too early and should be postponed to genericization (perhaps except for TREE_STATIC vars)? Or skip when DECL is template_parm_object_p. Or handle it in mangle.c. I don't see any reason we shouldn't accept struct A { char c[4]; }; template struct B { }; B b; Probably we should use the same mangling whether the initializer for c was spelled as a string literal or list of integers. The thing we still don't want to allow is mangling the *address* of a string literal. Will that help PR 47488 as well? What I have (attached) accepts all three test cases from PR 47488 (comment #0, #1, #2, and #5) but with different mangling. The difference in the mangling of the function in the test case in comment #0 is Clang: _Z1gIiEDTcl1fcvT__ELA1_KcEEERKS0_ GCC 9: _Z1gIiEDTcl1fcvT__ELKc0EEERKS0_ I'm not very familiar with the C++ ABI mangling but from what I can tell, Clang treats the type as a literal array of 1 const char element (LA1_Kc) without actually encoding its value, while with my patch GCC encodes it as a constant literal initializer consisting of 1 null char (LKc0). In other words, Clang would mangle these two the same for the same T: template < typename T > decltype (f (T(), "123")) g (const T&); template < typename T > decltype (f (T(), "abc")) g (const T&); while GCC would mangle them differently. Which is correct? Clang's seems correct in this case but would it also be correct to mangle Jason's B the same as B? After some poking around I found the issue below that seems to answer the question at least for Jason's example: https://github.com/itanium-cxx-abi/cxx-abi/issues/64 But this is an open issue that leaves some other questions unanswered, mainly that of the exact encoding of the literal (the length of the directly encoded subsequence and the hash algorithm to compute the hash value of the string). In any event, using the same encoding as for initializer lists (i.e., (elt-type, elt-value)...) doesn't follow this approach. Should I prototype the solution outlined in the issue for GCC 9? I think let's leave that for the future, and use the aggregate encoding for GCC 9. I suppose there's still the question of compatibility between initializer lists and string literals, namely whether this B is supposed to mangle the same as B The ABI issue only talks about string literals and not braced initializer lists. I see in Revision History [150204] Add mangling for braced initializer lists, so presumably there is a difference, but I can't find what it is. The issue for mangling template arguments of class type is https://github.com/itanium-cxx-abi/cxx-abi/issues/63 Whoops. Off-by-one error. Thanks! The mangling of braced-init-lists is # type {expr-list}, conversion with braced-init-list argument ::= tl * E # {expr-list}, braced-init-list in any other context ::= il * E Your patch doesn't use this mangling. Here's a variant of my example that actually causes these to show up in the mangling: struct A { char c[4]; }; template struct B { }; void f(B) {} void g(B) {} Right, that's also what I'm testing. There are few interesting variations. Given struct A { char c[5]; }; all of these are equivalent and so should mangle the same: void f (B) {} void f (B) {} void f (B) {} void f (B) {} void f (B) {} Interior nuls need to be mangled but the trailing ones, either explicit or implicit, don't. That also needs to be specified in the ABI. I believe we also need to handle these the same, both during overloading and mangling: void f (B) {} void f (B) {} void f (B) {} void f (B) {} GCC currently treats them as distinct irrespective of A's member's type. I opened bug 89878 for this since I don't expect to deal with the overloading part in this patch. Martin
[PATCH] [ARC] PR 88409: miscompilation due to missing cc clobber in longlong.h macros
simple test such as below was failing. | void main(int argc, char *argv[]) | { |size_t total_time = 115424; // expected 115.424 |double secs = (double)total_time/(double)1000; |printf("%s %d %lf\n", "secs", total_time, secs); // prints 113.504 |printf("%d\n", (size_t)secs); | } The printf eventually called into glibc stdlib/divrem.c:__mpn_divrem() which uses the __arc__ specific inline asm macros from longlong.h which were causing miscompilation. include/ 2019-03-28 Vineet Gupta PR 89877 * longlong.h [__arc__] (add_ss): Add cc clobber (sub_ddmmss): Likewise. Signed-off-by: Vineet Gupta --- include/ChangeLog | 7 +++ include/longlong.h | 6 -- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/ChangeLog b/include/ChangeLog index be08141deeb9..96d967d10a52 100644 --- a/include/ChangeLog +++ b/include/ChangeLog @@ -1,3 +1,10 @@ +2019-03-28 Vineet Gupta + + PR 89877 + + * longlong.h [__arc__] (add_ss): Add cc clobber + (sub_ddmmss): Likewise. + 2019-02-11 Philippe Waroquiers * splay-tree.h (splay_tree_delete_key_fn): Update comment. diff --git a/include/longlong.h b/include/longlong.h index 3dd8dc3aa80c..1f0ce4204255 100644 --- a/include/longlong.h +++ b/include/longlong.h @@ -199,7 +199,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, UDItype); : "%r" ((USItype) (ah)), \ "rICal" ((USItype) (bh)), \ "%r" ((USItype) (al)), \ -"rICal" ((USItype) (bl))) +"rICal" ((USItype) (bl)) \ + : "cc") #define sub_ddmmss(sh, sl, ah, al, bh, bl) \ __asm__ ("sub.f %1, %4, %5\n\tsbc %0, %2, %3" \ : "=r" ((USItype) (sh)), \ @@ -207,7 +208,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, UDItype); : "r" ((USItype) (ah)), \ "rICal" ((USItype) (bh)), \ "r" ((USItype) (al)), \ -"rICal" ((USItype) (bl))) +"rICal" ((USItype) (bl)) \ + : "cc") #define __umulsidi3(u,v) ((UDItype)(USItype)u*(USItype)v) #ifdef __ARC_NORM__ -- 2.7.4
[committed] Fix omp lowering VLA ICE (PR middle-end/89621)
Hi! The following testcase ICEs, because remap_type remaps all VLA types, even those that shouldn't be and e.g. the Fortran FE then asserts TYPE_MAIN_VARIANT equality, which is not true because of that spurious remapping. During normal inlining and most of other remap_type uses, id->copy_decl is copy_decl_no_change or something that calls it unconditionally and so it is guaranteed we really need to remap all VLA types. Similarly, during OpenMP lowering of certain constructs like parallel, task, target we really better remap all VLA types, while the id->copy_decl hook doesn't guarantee returning a new decl, it is usually just for for global vars and global vars typically aren't used in VLA types after gimplification, we want to record the value at certain point rather than change the bounds of the type if the global var changes. In other OpenMP constructs, such as worksharing, distribute, simd etc., we return new decls only for decls that have explicit clauses on the constructs, everything else is shared and just uses decls from outer context. If we remap_type some type in such context, we create a new type, but all remap_decls return the decl passed to it, so we end up with multiple VLA types which aren't compatible, but have the same dimensions etc. The following patch makes sure we don't really remap such types. In order not to slow down normal inlining etc., there is an extra copy_body_data flag to request this extra behavior and we then first check if we need to actually remap and only remap if we need to. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-03-28 Jakub Jelinek PR middle-end/89621 * tree-inline.h (struct copy_body_data): Add dont_remap_vla_if_no_change flag. * tree-inline.c (remap_type_3, remap_type_2): New functions. (remap_type): Don't remap vla types if id->dont_remap_vla_if_no_change and remap_type_2 returns false. * omp-low.c (new_omp_context): Set ctx->cb.dont_remap_vla_if_no_change. Move ctx->cb.adjust_array_error_bounds setting to the outermost ctx only from where it is copied to nested contexts. * gfortran.dg/gomp/pr89621.f90: New test. --- gcc/tree-inline.h.jj2019-03-27 18:13:40.446862001 +0100 +++ gcc/tree-inline.h 2019-03-27 23:09:20.576018070 +0100 @@ -123,6 +123,13 @@ struct copy_body_data an uninitialized VAR_DECL temporary. */ bool adjust_array_error_bounds; + /* Usually copy_decl callback always creates new decls, in that case + we want to remap all variably_modified_type_p types. If this flag + is set, remap_type will do further checks to see if remap_decl + of any decls mentioned in the type will remap to anything but itself + and only in that case will actually remap the type. */ + bool dont_remap_vla_if_no_change; + /* A function to be called when duplicating BLOCK nodes. */ void (*transform_lang_insert_block) (tree); --- gcc/tree-inline.c.jj2019-03-27 18:13:40.408862615 +0100 +++ gcc/tree-inline.c 2019-03-28 14:11:14.421419730 +0100 @@ -598,6 +598,92 @@ remap_type_1 (tree type, copy_body_data return new_tree; } +/* Helper function for remap_type_2, called through walk_tree. */ + +static tree +remap_type_3 (tree *tp, int *walk_subtrees, void *data) +{ + copy_body_data *id = (copy_body_data *) data; + + if (TYPE_P (*tp)) +*walk_subtrees = 0; + + else if (DECL_P (*tp) && remap_decl (*tp, id) != *tp) +return *tp; + + return NULL_TREE; +} + +/* Return true if TYPE needs to be remapped because remap_decl on any + needed embedded decl returns something other than that decl. */ + +static bool +remap_type_2 (tree type, copy_body_data *id) +{ + tree t; + +#define RETURN_TRUE_IF_VAR(T) \ + do \ +{ \ + tree _t = (T); \ + if (_t) \ + { \ + if (DECL_P (_t) && remap_decl (_t, id) != _t) \ + return true;\ + if (!TYPE_SIZES_GIMPLIFIED (type) \ + && walk_tree (&_t, remap_type_3, id, NULL)) \ + return true;\ + } \ +} \ + while (0) + + switch (TREE_CODE (type)) +{ +case POINTER_TYPE: +case REFERENCE_TYPE: +case FUNCTION_TYPE: +case METHOD_TYPE: + return remap_type_2 (TREE_TYPE (type), id); + +case INTEGER_TYPE: +case REAL_TYPE: +case FIXED_POINT_TYPE: +case ENUMERAL_TYPE: +case BOOLEAN_TYPE: + RETURN_TRUE_IF_VAR (TYPE_MIN_VALUE (type)); + RETURN_TRUE_IF_VAR (TYPE_MAX_VALUE (type
Re: [PATCH] Fix PR71598, aliasing between enums and compatible types
On 3/26/19 4:40 AM, Richard Biener wrote: On Mon, 18 Mar 2019, Richard Biener wrote: On Fri, 15 Mar 2019, Jason Merrill wrote: On 3/15/19 9:33 AM, Richard Biener wrote: The following is an attempt to fix PR71598 where C (and C++?) have an implementation-defined compatible integer type for each enum and the TBAA rules mandate that accesses using a compatible type are allowed. This does not apply to C++; an enum does not alias its underlying type. Thus the following different patch, introducing c_get_alias_set and only doing the special handling for C family frontends (I assume that at least ObjC is also affected). Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK? Ping. Also consider the additional testcase below to be added. Richard. 2019-03-18 Richard Biener PR c/71598 * gimple.c: Include langhooks.h. (gimple_get_alias_set): Treat enumeral types as the underlying integer type. Won't this affect all languages? Jason
Re: C++ PATCH for c++/89612 - ICE with member friend template with noexcept
On 3/28/19 2:59 PM, Marek Polacek wrote: On Thu, Mar 21, 2019 at 04:00:41PM -0400, Jason Merrill wrote: On 3/19/19 11:45 AM, Marek Polacek wrote: On Thu, Mar 14, 2019 at 04:22:41PM -0400, Jason Merrill wrote: On 3/7/19 4:52 PM, Marek Polacek wrote: This was one of those PRs where the more you poke, the more ICEs turn up. This patch fixes the ones I could find. The original problem was that maybe_instantiate_noexcept got a TEMPLATE_DECL created for the member friend template in do_friend. Its noexcept-specification was deferred, so we went to the block with push_access_scope, but that crashes on a TEMPLATE_DECL. One approach could be to somehow not defer noexcept-specs for friend templates, I guess, but I didn't want to do that. How does it make sense to instantiate the noexcept-specifier of a template? We should only get there for fully-instantiated function decls. Hmm, but duplicate_decls calls check_redeclaration_exception_specification even for DECL_FUNCTION_TEMPLATE_Ps. ... Note the crash happens in tsubst_friend_function. I wouldn't know when to check the noexcept-specifier of such a TEMPLATE_DECL for a template friend if not there. Hmm, true, I guess we do need to do a partial instantiation of the noexcept-specifier in order to compare it. *nod* That broke in register_parameter_specializations but we don't need this code anyway, so let's do away with it -- the current_class_{ref,ptr} code is enough to fix the PR that register_parameter_specializations was introduced for. What about uses of non-'this' parameters in the noexcept-specification? template struct C { template friend void foo(T t) noexcept(sizeof(decltype(t)) > 1); }; template void foo(int i) noexcept { } C c; Still works. I extended the test to see if we detect the scenario when the noexcept-specifiers don't match, and we do. It's noexcept39.C. Lastly, I found an invalid testcase that was breaking because a template code leaked to constexpr functions. This I fixed similarly to the recent explicit PR fix (r269131). This spot should probably also use build_converted_constant_expr. Ok, I'll address this. I'm finding this repeated pattern awkward. Earlier you changed check_narrowing to use maybe_constant_value instead of fold_non_dependent_expr, but perhaps whatever that fixed should have been fixed instead with a processing_template_decl_sentinel in the enclosing code that already instantiated the expression. That ought to avoid any need to change this spot or r269131. So this also came up in the other patch. Why don't I drop this part (and the noexcept1.C test) and open a new PR for this issue, so that we don't conflate two problems? The following patch fixes the original issue. Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. Jason 2019-03-28 Marek Polacek PR c++/89612 - ICE with member friend template with noexcept. * pt.c (maybe_instantiate_noexcept): For function templates, use their template result (function decl). Don't set up local specializations. Temporarily turn on processing_template_decl. Update the template type too. * g++.dg/cpp0x/noexcept38.C: New test. * g++.dg/cpp0x/noexcept39.C: New test. * g++.dg/cpp1z/noexcept-type21.C: New test. diff --git gcc/cp/pt.c gcc/cp/pt.c index 05d5371d8a6..fa30a7f00c8 100644 --- gcc/cp/pt.c +++ gcc/cp/pt.c @@ -24192,6 +24192,17 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain) if (DECL_CLONED_FUNCTION_P (fn)) fn = DECL_CLONED_FUNCTION (fn); + + tree orig_fn = NULL_TREE; + /* For a member friend template we can get a TEMPLATE_DECL. Let's use + its FUNCTION_DECL for the rest of this function -- push_access_scope + doesn't accept TEMPLATE_DECLs. */ + if (DECL_FUNCTION_TEMPLATE_P (fn)) +{ + orig_fn = fn; + fn = DECL_TEMPLATE_RESULT (fn); +} + fntype = TREE_TYPE (fn); spec = TYPE_RAISES_EXCEPTIONS (fntype); @@ -24228,37 +24239,41 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain) push_deferring_access_checks (dk_no_deferred); input_location = DECL_SOURCE_LOCATION (fn); - /* A new stack interferes with pop_access_scope. */ - { - /* Set up the list of local specializations. */ - local_specialization_stack lss (lss_copy); - - tree save_ccp = current_class_ptr; - tree save_ccr = current_class_ref; - /* If needed, set current_class_ptr for the benefit of - tsubst_copy/PARM_DECL. */ - tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn)); - if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl)) - { - tree this_parm = DECL_ARGUMENTS (tdecl); - current_class_ptr = NULL_TREE; - current_class_ref = cp_build_fold_indirect_ref (this_parm); - current_class_ptr = this_parm; - } + tree save_
Re: C++ PATCH for c++/87145 - bogus error converting class type in template argument list
On 3/20/19 4:12 PM, Marek Polacek wrote: The fix for 77656 caused us to call convert_nontype_argument even for value-dependent arguments, to perform the conversion in order to avoid a bogus warning. In this case, the argument is Pod{N}. The call to build_converted_constant_expr in convert_nontype_argument produces Pod::operator Enum(&{N}). It doesn't crash because we're in a template and build_address no longer crashes on CONSTRUCTORs in a template. Yeah, we shouldn't be preserving lower level codes like this ADDR_EXPR; we should probably return an IMPLICIT_CONV_EXPR rather than make the call explicit. Jason
Re: [PATCH, RFC] Wrong warning message fix for gcc 9
On 3/28/19 11:49 AM, Roman Zhuykov wrote: Ping A week ago I decided it is so minor that I can report here with a patch without creating bugzilla PR. Technically it is a "9 regression" in new functionality added back in summer. Patch was succesfully bootstrapped and regtested on x86_64. Ok for trunk? Thanks for the patch! The change makes sense to me. Can you please also add a test case to the test suite? I can't approve patches, even obvious ones, so please wait for an approval from a maintainer before committing it. Martin чт, 21 мар. 2019 г. в 18:53, Roman Zhuykov : Hello, I have found a minor diagnostic issue. Since r262910 we got a little odd error messages in cases like this: $ gcc -flto-compression-level=2-O2 -c empty.c gcc: error: argument to '-flto-compression-level=' is not between 0 and 9 $ g++ -fabi-version=2-O2 -c empty.cpp cc1plus: error: '-fabi-version=1' is no longer supported Old messages were more correct: $ gcc-8 -flto-compression-level=2-O2 -c empty.c gcc: error: argument to '-flto-compression-level=' should be a non-negative integer $ g++-8 -fabi-version=2-O2 -c empty.cpp g++: error: argument to '-fabi-version=' should be a non-negative integer When UInteger option value string is not a number, but it's first char is a digit, integral_argument function returns -1 without setting *err. Proposed untested patch attached. -- Best Regards, Roman Zhuykov gcc/ChangeLog: 2019-03-21 Roman Zhuykov * opts-common.c (integral_argument): Set errno properly in one case. diff --git a/gcc/opts-common.c b/gcc/opts-common.c --- a/gcc/opts-common.c +++ b/gcc/opts-common.c @@ -205,8 +205,10 @@ integral_argument (const char *arg, int *err, bool byte_size_suffix) value = strtoull (arg, &end, 0); if (*end) { - /* errno is most likely EINVAL here. */ - *err = errno; + if (errno) + *err = errno; + else + *err = EINVAL; return -1; }
[PATCH, i386]: Correct RMW operation with LEA peephole (PR 89865)
Attached patch corrects RMW operation with LEA peephole pattern. The mode of the LEA is either SImode (for QImode, HImode or SImode operation) or DImode. 2019-03-28 Uroš Bizjak PR target/89865 * config/i386/i386.md (RMW operation with LEA peephole): Use LEAMODE mode attribute instead of SWI mode iterator for LEA pattern. The patch triggers FAIL: gcc.target/i386/pr49095.c scan-assembler-times ), % 45 testsuite failure also for x86_64-linux-gnu. The adjusted number of found patterns was wrong from the beginning and hid the uncovered problem with LEA operation. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 269995) +++ config/i386/i386.md (working copy) @@ -18684,14 +18684,16 @@ (define_peephole2 [(set (match_operand:SWI 0 "register_operand") (match_operand:SWI 1 "memory_operand")) - (set (match_operand:SWI 3 "register_operand") - (plus:SWI (match_dup 0) - (match_operand:SWI 2 ""))) - (set (match_dup 1) (match_dup 3)) - (set (reg FLAGS_REG) (compare (match_dup 3) (const_int 0)))] + (set (match_operand: 3 "register_operand") + (plus: (match_operand: 4 "register_operand") + (match_operand: 2 ""))) + (set (match_dup 1) (match_operand:SWI 5 "register_operand")) + (set (reg FLAGS_REG) (compare (match_dup 5) (const_int 0)))] "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && REGNO (operands[4]) == REGNO (operands[0]) + && REGNO (operands[5]) == REGNO (operands[3]) && peep2_reg_dead_p (4, operands[3]) - && (rtx_equal_p (operands[0], operands[3]) + && ((REGNO (operands[0]) == REGNO (operands[3])) || peep2_reg_dead_p (2, operands[0])) && !reg_overlap_mentioned_p (operands[0], operands[1]) && !reg_overlap_mentioned_p (operands[3], operands[1]) @@ -18700,17 +18702,17 @@ || immediate_operand (operands[2], QImode) || any_QIreg_operand (operands[2], QImode)) && ix86_match_ccmode (peep2_next_insn (3), CCGOCmode)" - [(parallel [(set (match_dup 4) (match_dup 6)) - (set (match_dup 1) (match_dup 5))])] + [(parallel [(set (match_dup 6) (match_dup 8)) + (set (match_dup 1) (match_dup 7))])] { - operands[4] = SET_DEST (PATTERN (peep2_next_insn (3))); - operands[5] + operands[6] = SET_DEST (PATTERN (peep2_next_insn (3))); + operands[7] = gen_rtx_PLUS (mode, copy_rtx (operands[1]), - operands[2]); - operands[6] -= gen_rtx_COMPARE (GET_MODE (operands[4]), - copy_rtx (operands[5]), + gen_lowpart (mode, operands[2])); + operands[8] += gen_rtx_COMPARE (GET_MODE (operands[6]), + copy_rtx (operands[7]), const0_rtx); })
Re: [PATCH v3] luoxhu - backport r250477, r255555, r257253 and r258137
Hi Xiong, Sorry this took so long. On Mon, Mar 04, 2019 at 07:15:31PM -0600, luo...@linux.ibm.com wrote: > This is a backport of r250477, r25, r257253 and r258137 from trunk to > gcc-7-branch to support built-in functions: > vec_extract_fp_from_shorth, vec_extract_fp_from_shortl, > vec_extract_fp32_from_shorth and vec_extract_fp32_from_shortl, etc. > The patches were on trunk before GCC 8 forked already. r257253 and r258137 > are dependent testcases require vsx support need merge to avoid regression. > > The discussion for the patch r250477 that went into trunk is: > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00624.html > The discussion for the patch r25 that went into trunk is: > https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00394.html > VSX support for patch r257253 and r258137: > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02391.html > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01506.html > > Regression-tested on Linux POWER8 LE. Okay for GCC 7. Thank you! Segher
[PATCH, i386]: Fix PR89848, ICE: in convert_op
On Tue, Mar 26, 2019 at 8:05 PM Uros Bizjak wrote: > > Attached patch fixes a corner case in STV pass where the shift operand > register equals shift count register. The specialization for shift > insns marked register as processed, but didn't process shift input > operand, leaving an unprocessed DImode register. There is another instance of the same problem in make_vector_copies. 2019-03-28 Uroš Bizjak PR target/89848 * config/i386/i386.c (dimode_scalar_chain::make_vector_copies): Also process XEXP (src, 0) of a shift insn. testsuite/ChangeLog: 2019-03-28 Uroš Bizjak PR target/89848 * gcc.target/i386/pr89848.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN, will be backported to gcc-8 branch. Uros. Index: testsuite/gcc.target/i386/pr89848.c === --- testsuite/gcc.target/i386/pr89848.c (nonexistent) +++ testsuite/gcc.target/i386/pr89848.c (revision 270003) @@ -0,0 +1,11 @@ +/* PR target/89848 */ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -msse2 -mtune=pentium3m" } */ + +long long +foo (long long x) +{ + x >>= 3; + x <<= x; + return x; +} Index: config/i386/i386.c === --- config/i386/i386.c (revision 270002) +++ config/i386/i386.c (revision 270003) @@ -1901,7 +1901,10 @@ || GET_CODE (src) == LSHIFTRT) && !CONST_INT_P (XEXP (src, 1)) && reg_or_subregno (XEXP (src, 1)) == regno) - XEXP (src, 1) = vreg; + { + XEXP (src, 0) = replace_with_subreg (XEXP (src, 0), reg, reg); + XEXP (src, 1) = vreg; + } } else replace_with_subreg_in_insn (insn, reg, vreg);
Re: C++ PATCH for c++/89612 - ICE with member friend template with noexcept
On Thu, Mar 21, 2019 at 04:00:41PM -0400, Jason Merrill wrote: > On 3/19/19 11:45 AM, Marek Polacek wrote: > > On Thu, Mar 14, 2019 at 04:22:41PM -0400, Jason Merrill wrote: > > > On 3/7/19 4:52 PM, Marek Polacek wrote: > > > > This was one of those PRs where the more you poke, the more ICEs turn > > > > up. > > > > This patch fixes the ones I could find. The original problem was that > > > > maybe_instantiate_noexcept got a TEMPLATE_DECL created for the member > > > > friend template in do_friend. Its noexcept-specification was deferred, > > > > so we went to the block with push_access_scope, but that crashes on a > > > > TEMPLATE_DECL. One approach could be to somehow not defer > > > > noexcept-specs > > > > for friend templates, I guess, but I didn't want to do that. > > > > How does it make sense to instantiate the noexcept-specifier of a > > > template? > > > We should only get there for fully-instantiated function decls. > > > > Hmm, but duplicate_decls calls check_redeclaration_exception_specification > > even > > for DECL_FUNCTION_TEMPLATE_Ps. > > ... > > Note the crash happens in tsubst_friend_function. I wouldn't know when to > > check the noexcept-specifier of such a TEMPLATE_DECL for a template friend > > if not there. > > Hmm, true, I guess we do need to do a partial instantiation of the > noexcept-specifier in order to compare it. *nod* > > That broke in register_parameter_specializations but we don't need this > > code anyway, so let's do away with it -- the current_class_{ref,ptr} > > code is enough to fix the PR that register_parameter_specializations was > > introduced for. > > What about uses of non-'this' parameters in the noexcept-specification? > > template > struct C { > template > friend void foo(T t) noexcept(sizeof(decltype(t)) > 1); > }; > > template > void foo(int i) noexcept { } > > C c; Still works. I extended the test to see if we detect the scenario when the noexcept-specifiers don't match, and we do. It's noexcept39.C. > > > > Lastly, I found an invalid testcase that was breaking because a > > > > template code > > > > leaked to constexpr functions. This I fixed similarly to the recent > > > > explicit > > > > PR fix (r269131). > > > > > > This spot should probably also use build_converted_constant_expr. > > > > Ok, I'll address this. > > I'm finding this repeated pattern awkward. Earlier you changed > check_narrowing to use maybe_constant_value instead of > fold_non_dependent_expr, but perhaps whatever that fixed should have been > fixed instead with a processing_template_decl_sentinel in the enclosing code > that already instantiated the expression. That ought to avoid any need to > change this spot or r269131. So this also came up in the other patch. Why don't I drop this part (and the noexcept1.C test) and open a new PR for this issue, so that we don't conflate two problems? The following patch fixes the original issue. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2019-03-28 Marek Polacek PR c++/89612 - ICE with member friend template with noexcept. * pt.c (maybe_instantiate_noexcept): For function templates, use their template result (function decl). Don't set up local specializations. Temporarily turn on processing_template_decl. Update the template type too. * g++.dg/cpp0x/noexcept38.C: New test. * g++.dg/cpp0x/noexcept39.C: New test. * g++.dg/cpp1z/noexcept-type21.C: New test. diff --git gcc/cp/pt.c gcc/cp/pt.c index 05d5371d8a6..fa30a7f00c8 100644 --- gcc/cp/pt.c +++ gcc/cp/pt.c @@ -24192,6 +24192,17 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain) if (DECL_CLONED_FUNCTION_P (fn)) fn = DECL_CLONED_FUNCTION (fn); + + tree orig_fn = NULL_TREE; + /* For a member friend template we can get a TEMPLATE_DECL. Let's use + its FUNCTION_DECL for the rest of this function -- push_access_scope + doesn't accept TEMPLATE_DECLs. */ + if (DECL_FUNCTION_TEMPLATE_P (fn)) +{ + orig_fn = fn; + fn = DECL_TEMPLATE_RESULT (fn); +} + fntype = TREE_TYPE (fn); spec = TYPE_RAISES_EXCEPTIONS (fntype); @@ -24228,37 +24239,41 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t complain) push_deferring_access_checks (dk_no_deferred); input_location = DECL_SOURCE_LOCATION (fn); - /* A new stack interferes with pop_access_scope. */ - { - /* Set up the list of local specializations. */ - local_specialization_stack lss (lss_copy); - - tree save_ccp = current_class_ptr; - tree save_ccr = current_class_ref; - /* If needed, set current_class_ptr for the benefit of - tsubst_copy/PARM_DECL. */ - tree tdecl = DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (fn)); - if (DECL_NONSTATIC_MEMBER_FUNCTION_P (tdecl)) - { - tree this_parm = DECL_ARGUMENTS (tdecl); - curren
Re: C++ PATCH for c++/89852 - ICE with C++11 functional cast with { }
On 3/27/19 5:45 PM, Marek Polacek wrote: Here we have a non-dependent constructor in a template: { VIEW_CONVERT_EXPR(j) } In digest_init we call massage_init_elt, which calls digest_init_r on the element. We convert the element, but we're in a template, so perform_implicit_conversion added an IMPLICIT_CONV_EXPR around it. And then massage_init_elt calls maybe_constant_init on the element and the usual sadness ensues. Only after fold_non_dependent_expr. Perhaps we want a fold_non_dependent_init? Jason
Re: C++ PATCH for c++/89836 - bool constant expression and explicit conversions
On Thu, Mar 28, 2019 at 02:02:47PM -0400, Jason Merrill wrote: > On 3/27/19 3:08 PM, Marek Polacek wrote: > > I noticed that this test doesn't compile because > > build_converted_constant_expr > > failed to consider explicit conversion functions. That's wrong: while Core > > Issue 1981 specifies that explicit conversion functions are not considered > > for > > contextual conversions, they are considered in contextual conversions to > > bool, > > as defined in Core 2039. > > > > Fixed by adding a new wrapper that uses LOOKUP_NORMAL instead of > > LOOKUP_IMPLICIT. > > > > Besides [dcl.fct.spec], these: [except.spec], [dcl.dcl]/6 (static_assert), > > [stmt.if] (constexpr if) all talk about "a contextually converted constant > > expression of type bool", so it would seem to make sense to use > > build_converted_constant_bool_expr for them also. E.g. use that instead of > > perform_implicit_conversion_flags in build_noexcept_spec. But that doesn't > > work for this test: > > > > int e(); > > int fn() noexcept(e); > > > > because build_converted_constant_expr would issue a conversion error. We're > > converting "e" to "bool". We have a ck_lvalue conversion from "int f()" to > > "int (*f)()" and then a ck_std conversion from "int (*f)()" to bool. Those > > should work fine but we issue an error for the ck_std conversion. > > Converting a pointer to bool is a "boolean conversion", which is not allowed > under the rules for a converted constant expression ([expr.const]p7). So we > should reject that testcase. Oh, right (though clang also accepts it). I'll see if there's anything else that breaks if I use build_converted_constant_expr instead of perform_implicit_conversion. > > Also build_converted_constant_expr doesn't have the processing_template_decl > > handling that perform_implicit_conversion_flags does, which (I believe) > > lead me > > to changing check_narrowing to use maybe_constant_value instead of > > fold_non_dependent_expr. > > As I was saying elsewhere, I think that change was probably a mistake, and > that we should have fixed that bug by changing the caller instead. I need to address that. Unfortunately adding a sentinel in the caller broke so much stuff :(. > > Anyway, this patch should be safe. > > > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > OK. Thanks, committed. Marek
Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c
On Thu, Mar 28, 2019 at 09:15:54PM +1030, Alan Modra wrote: > On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote: > > Also, using a register move cost of 2 for for power9 direct moves > > gives these fails, even with the .md file tweaks: > > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler-not mtvsrwz > > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler vspltisb > > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler xxspltib > > +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler-not mtvsrwz > > +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler vspltisw > > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler-not mtvsrwz > > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler vspltish > > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler xxspltib > > +FAIL: gcc.target/powerpc/vsx-himode3.c scan-assembler lxsihzx > > +FAIL: gcc.target/powerpc/vsx-qimode3.c scan-assembler lxsibzx > > These can be all be fixed by removing "?"s disparaging vector > > alternatives in movsi_internal1 and mov_internal. > > Like this. Bootstrapped and regression tested powerpc64le-linux. > OK for stage1? Yup, together with the previous patch. Thanks, Segher
Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c
Hi! On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote: > This patch makes a number of corrections to rs6000_register_move_cost, > adds a new register union class, GEN_OR_VSX_REGS, and adjusts insn > alternative to suit. Cool beans. [ snip various great explanations, thanks! ] > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS > as an allocno class. It is similar to the aarch64 version but without > any selection by regno mode if the best class is a union class. It would be nice if we could do without such hacks. Alas. > OK assuming Pat's latest spec test run doesn't contain any nasty > surprises? Not for stage 4, no. Sorry. But it should be great in stage 1 :-) > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h > index 9fb36e41e7d..20c59f89c8f 100644 > --- a/gcc/config/rs6000/darwin.h > +++ b/gcc/config/rs6000/darwin.h > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands; >&& reg_class_subset_p (BASE_REGS, (CLASS)))\ > ? BASE_REGS \ > : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT \ > - && (CLASS) == GEN_OR_FLOAT_REGS) \ > + && ((CLASS) == GEN_OR_FLOAT_REGS \ > + || (CLASS) == GEN_OR_VSX_REGS)) \ > ? GENERAL_REGS\ > : (CLASS)) Darwin doesn't do VSX at all... But maybe there is something that can get allocated to both FPRs and VRs, sure. And GPRs. This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places where we should care about this change for correctness. > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_class > rclass) >return NO_REGS; > } > > - if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS) > + if (GET_MODE_CLASS (mode) == MODE_INT > + && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS)) > return GENERAL_REGS; Maybe do this whenever rclass contains the GPRs? > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode, > reg_class_t from, reg_class_t to) > { >int ret; > + reg_class_t rclass; > >if (TARGET_DEBUG_COST) > dbg_cost_ctrl++; > > + /* If we have VSX, we can easily move between FPR or Altivec registers, > + otherwise we can only easily move within classes. > + Do this first so we give best-case answers for union classes > + containing both gprs and vsx regs. */ > + HARD_REG_SET to_vsx, from_vsx; > + COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]); > + AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]); > + COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]); > + AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]); This is a bit expensive to run at every call of rs6000_register_move_cost. Can it be precomputed easily? > + { > + if (TARGET_DIRECT_MOVE > + && (rs6000_tune == PROCESSOR_POWER8 > + || rs6000_tune == PROCESSOR_POWER9)) TARGET_DIRECT_MOVE is always on for these CPUs. Should this also use the m*vsr* cost with say -mcpu=power7 -mtune=power9? If so you can simplify this condition. Or maybe it give problems elsewhere? It's not really worth spending to much time on, separate -mtune isn't used much at all. > +static reg_class_t > +rs6000_ira_change_pseudo_allocno_class (int regno ATTRIBUTE_UNUSED, > + reg_class_t allocno_class, > + reg_class_t best_class) > +{ > + switch (allocno_class) > +{ > +default: > + break; Please put default cases at the end. > + gcc_checking_assert (best_class == GEN_OR_VSX_REGS > +|| best_class == GEN_OR_FLOAT_REGS > +|| best_class == VSX_REGS > +|| best_class == ALTIVEC_REGS > +|| best_class == FLOAT_REGS > +|| best_class == GENERAL_REGS > +|| best_class == BASE_REGS); This list makes me think... Should there be a GEN_OR_ALTIVEC as well? Segher
Re: C++ PATCH for c++/89214 - ICE when initializing aggregates with bases
On Thu, Mar 28, 2019 at 01:30:46PM -0400, Jason Merrill wrote: > On 3/26/19 11:06 AM, Marek Polacek wrote: > > On Tue, Mar 26, 2019 at 12:34:03PM +0100, Andreas Schwab wrote: > > > I don't see any of the warnings in the tests on ia64. > > > > > > FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 33) > > > FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 34) > > > FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 35) > > > FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 45) > > > FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 46) > > > FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 47) > > > PASS: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for excess errors) > > > > > > FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 22) > > > FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 23) > > > FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 31) > > > FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 32) > > > PASS: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for excess errors) > > > > I don't have access to an ia64 box -- I see none on Compile Farm. But if > > it doesn't ICE, can we just add ! { target ia64-*-* } to the dg-warnings? > > Hmm, this behavior shouldn't be dependent on the target. True. But I can't reproduce the missing warning on e.g. ppc64. Andreas, could you please find out why we're not hitting this code in digest_init_r: 1210 tree elt = CONSTRUCTOR_ELT (stripped_init, 0)->value; 1211 if (reference_related_p (type, TREE_TYPE (elt))) or if we are, is the reference_related_p condition not true? Marek
Re: C++ PATCH for c++/89836 - bool constant expression and explicit conversions
On 3/27/19 3:08 PM, Marek Polacek wrote: I noticed that this test doesn't compile because build_converted_constant_expr failed to consider explicit conversion functions. That's wrong: while Core Issue 1981 specifies that explicit conversion functions are not considered for contextual conversions, they are considered in contextual conversions to bool, as defined in Core 2039. Fixed by adding a new wrapper that uses LOOKUP_NORMAL instead of LOOKUP_IMPLICIT. Besides [dcl.fct.spec], these: [except.spec], [dcl.dcl]/6 (static_assert), [stmt.if] (constexpr if) all talk about "a contextually converted constant expression of type bool", so it would seem to make sense to use build_converted_constant_bool_expr for them also. E.g. use that instead of perform_implicit_conversion_flags in build_noexcept_spec. But that doesn't work for this test: int e(); int fn() noexcept(e); because build_converted_constant_expr would issue a conversion error. We're converting "e" to "bool". We have a ck_lvalue conversion from "int f()" to "int (*f)()" and then a ck_std conversion from "int (*f)()" to bool. Those should work fine but we issue an error for the ck_std conversion. Converting a pointer to bool is a "boolean conversion", which is not allowed under the rules for a converted constant expression ([expr.const]p7). So we should reject that testcase. Also build_converted_constant_expr doesn't have the processing_template_decl handling that perform_implicit_conversion_flags does, which (I believe) lead me to changing check_narrowing to use maybe_constant_value instead of fold_non_dependent_expr. As I was saying elsewhere, I think that change was probably a mistake, and that we should have fixed that bug by changing the caller instead. Anyway, this patch should be safe. Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. Jason
Re: [PATCH][Tree-optimization/PR89730]grant always_inline when -flive-patching=inline-only-static
> On Mar 28, 2019, at 11:30 AM, Qing Zhao wrote: > > Thanks. > > I updated ipa-inline.c as you suggested, and tested the gcc with all > live-patching-*.c testing cases, all were fine. > bootstrapped on aarch64, and now the regression testing is running. the regression test passed without any issue. qing > > the new patch is as following: > > Okay for trunk? > > thanks. > > Qing > > gcc/ChangeLog: > > 2019-03-28 qing zhao > > PR tree-optimization/89730 > * ipa-inline.c (can_inline_edge_p): Delete the checking for > -flive-patching=inline-only-static. > (can_inline_edge_by_limits_p): Add the checking for > -flive-patching=inline-only-static and grant always_inline > even when -flive-patching=inline-only-static is specified. > > > gcc/testsuite/ChangeLog: > > 2019-03-28 qing zhao > > PR tree-optimization/89730 > * gcc.dg/live-patching-4.c: New test. > >
Re: [PATCH, RFC] Wrong warning message fix for gcc 9
Ping A week ago I decided it is so minor that I can report here with a patch without creating bugzilla PR. Technically it is a "9 regression" in new functionality added back in summer. Patch was succesfully bootstrapped and regtested on x86_64. Ok for trunk? чт, 21 мар. 2019 г. в 18:53, Roman Zhuykov : > > Hello, I have found a minor diagnostic issue. > > Since r262910 we got a little odd error messages in cases like this: > $ gcc -flto-compression-level=2-O2 -c empty.c > gcc: error: argument to '-flto-compression-level=' is not between 0 and 9 > $ g++ -fabi-version=2-O2 -c empty.cpp > cc1plus: error: '-fabi-version=1' is no longer supported > > Old messages were more correct: > $ gcc-8 -flto-compression-level=2-O2 -c empty.c > gcc: error: argument to '-flto-compression-level=' should be a > non-negative integer > $ g++-8 -fabi-version=2-O2 -c empty.cpp > g++: error: argument to '-fabi-version=' should be a non-negative integer > > When UInteger option value string is not a number, but it's first char > is a digit, integral_argument function returns -1 without setting > *err. > > Proposed untested patch attached. > > -- > Best Regards, > Roman Zhuykov > > gcc/ChangeLog: > > 2019-03-21 Roman Zhuykov > > * opts-common.c (integral_argument): Set errno properly in one case. > > diff --git a/gcc/opts-common.c b/gcc/opts-common.c > --- a/gcc/opts-common.c > +++ b/gcc/opts-common.c > @@ -205,8 +205,10 @@ integral_argument (const char *arg, int *err, > bool byte_size_suffix) > value = strtoull (arg, &end, 0); > if (*end) > { > - /* errno is most likely EINVAL here. */ > - *err = errno; > + if (errno) > + *err = errno; > + else > + *err = EINVAL; > return -1; > }
Re: [C++ debug PATCH] [PR88534] accept VAR_DECL in class literal template parms
On 3/27/19 6:56 PM, Martin Sebor wrote: On 3/27/19 3:11 PM, Martin Sebor wrote: On 3/27/19 4:44 AM, Jonathan Wakely wrote: On 21/03/19 15:03 -0400, Jason Merrill wrote: On 3/20/19 6:06 PM, Marek Polacek wrote: On Wed, Mar 20, 2019 at 10:58:32PM +0100, Jakub Jelinek wrote: On Wed, Mar 20, 2019 at 05:55:04PM -0400, Marek Polacek wrote: On Wed, Mar 20, 2019 at 04:56:33PM -0300, Alexandre Oliva wrote: On Mar 20, 2019, Marek Polacek wrote: This test fails with pr88534.C:58:1: sorry, unimplemented: string literal in function template signature Interesting... gcc-8 rejected it with an error message rejecting the template parameter, but my latest trunk build (dated Mar 13, r269641) compiles it all right. Was there a subsequent fix, maybe? I didn't realize it was supposed to be rejected. Ah, that problem only started with r269814, namely this hunk: Maybe this is done too early and should be postponed to genericization (perhaps except for TREE_STATIC vars)? Or skip when DECL is template_parm_object_p. Or handle it in mangle.c. I don't see any reason we shouldn't accept struct A { char c[4]; }; template struct B { }; B b; Probably we should use the same mangling whether the initializer for c was spelled as a string literal or list of integers. The thing we still don't want to allow is mangling the *address* of a string literal. Will that help PR 47488 as well? What I have (attached) accepts all three test cases from PR 47488 (comment #0, #1, #2, and #5) but with different mangling. The difference in the mangling of the function in the test case in comment #0 is Clang: _Z1gIiEDTcl1fcvT__ELA1_KcEEERKS0_ GCC 9: _Z1gIiEDTcl1fcvT__ELKc0EEERKS0_ I'm not very familiar with the C++ ABI mangling but from what I can tell, Clang treats the type as a literal array of 1 const char element (LA1_Kc) without actually encoding its value, while with my patch GCC encodes it as a constant literal initializer consisting of 1 null char (LKc0). In other words, Clang would mangle these two the same for the same T: template < typename T > decltype (f (T(), "123")) g (const T&); template < typename T > decltype (f (T(), "abc")) g (const T&); while GCC would mangle them differently. Which is correct? Clang's seems correct in this case but would it also be correct to mangle Jason's B the same as B? After some poking around I found the issue below that seems to answer the question at least for Jason's example: https://github.com/itanium-cxx-abi/cxx-abi/issues/64 But this is an open issue that leaves some other questions unanswered, mainly that of the exact encoding of the literal (the length of the directly encoded subsequence and the hash algorithm to compute the hash value of the string). In any event, using the same encoding as for initializer lists (i.e., (elt-type, elt-value)...) doesn't follow this approach. Should I prototype the solution outlined in the issue for GCC 9? I think let's leave that for the future, and use the aggregate encoding for GCC 9. I suppose there's still the question of compatibility between initializer lists and string literals, namely whether this B is supposed to mangle the same as B The ABI issue only talks about string literals and not braced initializer lists. I see in Revision History [150204] Add mangling for braced initializer lists, so presumably there is a difference, but I can't find what it is. The issue for mangling template arguments of class type is https://github.com/itanium-cxx-abi/cxx-abi/issues/63 The mangling of braced-init-lists is # type {expr-list}, conversion with braced-init-list argument ::= tl * E # {expr-list}, braced-init-list in any other context ::= il * E Your patch doesn't use this mangling. Here's a variant of my example that actually causes these to show up in the mangling: struct A { char c[4]; }; template struct B { }; void f(B) {} void g(B) {} Jason
Re: C++ PATCH for c++/89214 - ICE when initializing aggregates with bases
On 3/26/19 11:06 AM, Marek Polacek wrote: On Tue, Mar 26, 2019 at 12:34:03PM +0100, Andreas Schwab wrote: I don't see any of the warnings in the tests on ia64. FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 33) FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 34) FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 35) FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 45) FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 46) FAIL: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for warnings, line 47) PASS: g++.dg/cpp1z/aggr-base8.C -std=c++17 (test for excess errors) FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 22) FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 23) FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 31) FAIL: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for warnings, line 32) PASS: g++.dg/cpp1z/aggr-base9.C -std=c++17 (test for excess errors) I don't have access to an ia64 box -- I see none on Compile Farm. But if it doesn't ICE, can we just add ! { target ia64-*-* } to the dg-warnings? Hmm, this behavior shouldn't be dependent on the target. Jason
Re: [PATCH][Tree-optimization/PR89730]grant always_inline when -flive-patching=inline-only-static
Thanks. I updated ipa-inline.c as you suggested, and tested the gcc with all live-patching-*.c testing cases, all were fine. bootstrapped on aarch64, and now the regression testing is running. the new patch is as following: Okay for trunk? thanks. Qing gcc/ChangeLog: 2019-03-28 qing zhao PR tree-optimization/89730 * ipa-inline.c (can_inline_edge_p): Delete the checking for -flive-patching=inline-only-static. (can_inline_edge_by_limits_p): Add the checking for -flive-patching=inline-only-static and grant always_inline even when -flive-patching=inline-only-static is specified. gcc/testsuite/ChangeLog: 2019-03-28 qing zhao PR tree-optimization/89730 * gcc.dg/live-patching-4.c: New test. fixing-PR89730.patch Description: Binary data
Re: [C++ PATCH] Fix SWITCH_STMT handling in potential_constant_expression_1 (PR c++/89785)
On 3/26/19 6:28 PM, Jakub Jelinek wrote: Hi! As the testcase shows, the SWITCH_STMT handling in potential_constant_expression_1 is quite conservative, it doesn't recurse on the body of the switch stmt, because at least for the case where the switch condition isn't some easily determinable constant, we'd need to try all possible values of the switch expression (or start walking from all possible case labels, see PR for details), but right now potential_constant_expression_1 doesn't have the careful stmt skipping logic cxx_eval_* has anyway. So, the following patch still doesn't recurse on SWITCH_STMT_BODY using potential_constant_expression_1, but instead checks if the body contains a return or continue stmt (the latter only if not nested in some loop body inside of the switch body). If there is no return nor continue, assuming there is no endless loop (which wouldn't be constant expression due to the ops limit) the switch body must fall through to the code after it. If there is a return or continue, we set *jump_target to it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Jason
Re: [PATCH] optinfo-emit-json.cc: don't call get_fnname_from_decl (PR middle-end/89725)
On Thu, 2019-03-28 at 15:14 +0100, Jakub Jelinek wrote: > On Thu, Mar 28, 2019 at 11:02:52AM -0400, David Malcolm wrote: > > optrecord_json_writer::optinfo_to_json can in theory be called from > > any > > optimization pass, but currently uses get_fnname_from_decl, which > > is RTL-specific, which can lead to an ICE (PR middle-end/89725). > > The ICE is a separate problem. The problem with DECL_RTL early is > that it is undesirable to create RTL for decls during gimple passes, > we then need to make sure to clear it when versioning the function > etc. > Ah, my bad; thanks for the clarification. Dave
[PATCH] Updated patch for PR84101
This is an update from last years attempt to tame down vectorization when it runs in to ABI inefficiencies at function return. I've updated the patch to look for multi-reg returns instead of !vector ones (due to word_mode vectorization) and handle a bit more returns, esp. the common pattern of ret = vector; D.1234 = ret; ret = {CLOBBER}; return D.1234; Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? I know this isn't the ultimate solution but we keep getting duplicates of the PR. Richard. 2019-03-28 Richard Biener PR tree-optimization/84101 * tree-vect-stmts.c: Include explow.h for hard_function_value, regs.h for hard_regno_nregs. (cfun_returns): New helper. (vect_model_store_cost): When vectorizing a store to a decl we return and the function ABI returns in a multi-reg location account for the possible spilling that will happen. * gcc.target/i386/pr84101.c: New testcase. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 269987) +++ gcc/tree-vect-stmts.c (working copy) @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. #include "tree-cfg.h" #include "tree-ssa-loop-manip.h" #include "cfgloop.h" +#include "explow.h" #include "tree-ssa-loop.h" #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" @@ -52,6 +53,7 @@ along with GCC; see the file COPYING3. #include "vec-perm-indices.h" #include "tree-ssa-loop-niter.h" #include "gimple-fold.h" +#include "regs.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -948,6 +950,37 @@ vect_model_promotion_demotion_cost (stmt "prologue_cost = %d .\n", inside_cost, prologue_cost); } +/* Returns true if the current function returns DECL. */ + +static bool +cfun_returns (tree decl) +{ + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) +{ + greturn *ret = safe_dyn_cast (last_stmt (e->src)); + if (!ret) + continue; + if (gimple_return_retval (ret) == decl) + return true; + /* We often end up with an aggregate copy to the result decl, + handle that case as well. First skip intermediate clobbers +though. */ + gimple *def = ret; + do + { + def = SSA_NAME_DEF_STMT (gimple_vuse (def)); + } + while (gimple_clobber_p (def)); + if (is_a (def) + && gimple_assign_lhs (def) == gimple_return_retval (ret) + && gimple_assign_rhs1 (def) == decl) + return true; +} + return false; +} + /* Function vect_model_store_cost Models cost for stores. In the case of grouped accesses, one access @@ -1032,6 +1065,37 @@ vect_model_store_cost (stmt_vec_info stm vec_to_scalar, stmt_info, 0, vect_body); } + /* When vectorizing a store into the function result assign + a penalty if the function returns in a multi-register location. + In this case we assume we'll end up with having to spill the + vector result and do piecewise loads. */ + tree base = get_base_address (STMT_VINFO_DATA_REF (stmt_info)->ref); + if (base + && (TREE_CODE (base) == RESULT_DECL + || (DECL_P (base) && cfun_returns (base))) + && !aggregate_value_p (base, cfun->decl)) +{ + rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1); + /* ??? Handle PARALLEL in some way. */ + if (REG_P (reg)) + { + int nregs = hard_regno_nregs (REGNO (reg), GET_MODE (reg)); + /* Assume that a reg-reg move is possible and cheap, +do not account for vector to gp register move cost. */ + if (nregs > 1) + { + /* Spill. */ + prologue_cost += record_stmt_cost (cost_vec, ncopies, +vector_store, +stmt_info, 0, vect_epilogue); + /* Loads. */ + prologue_cost += record_stmt_cost (cost_vec, ncopies * nregs, +scalar_load, +stmt_info, 0, vect_epilogue); + } + } +} + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_model_store_cost: inside_cost = %d, " Index: gcc/testsuite/gcc.target/i386/pr84101.c === --- gcc/testsuite/gcc.target/i386/pr84101.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr84101.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp2-details" } */ + +typedef struct uint64_pair uint64_pair_t ; +struct uint64_pair +{ + unsigned long w0 ; + unsigned long w1 ; +} ; + +uint64_pair_t pair(int num) +{ + uint64_pair_t p ; + + p.w0 = num << 1 ;
Re: [PATCH] Follow-up regcprop fixes
On 3/27/19 4:24 PM, Jakub Jelinek wrote: > On Wed, Mar 27, 2019 at 04:32:15PM +0100, Jakub Jelinek wrote: >> On Wed, Mar 27, 2019 at 09:24:46AM -0600, Jeff Law wrote: 2) another thing I've noticed today, we queue up the debug insn changes, but if we iterate the second time for any bb, we overwrite and thus lose any of the debug insn queued changes from the first iteration, so those changes aren't applied (wrong-debug?) >>> This is actually what worries me, both with the current bits and with >>> any bits that change to a worklist of blocks to reprocess. As is I >>> think we preserve the debug stuff across the iterations which should >>> keep debug info correct. Managing that in a world where we queue up the >>> iteration step is going to be tougher. >> >> The patch I've posted would do df_analyze, all bbs once, then df_analyze, >> process queued debug changes, and if anything needs to be repeated (just >> once), redo the bbs from worklist and repeat the above. >> That seems to me the easiest way to get rid of the per-bb df_analyze calls >> as well as fixing the debug stuff. Because otherwise, we'd need to somehow >> merge the queued debug insns from the first and second pass and figure out >> how to deal with that. > > Here is everything in patch on top of current trunk which contains your > regcprop.c changes. > > In addition to the previously mentioned changes, as you've requested it > clears the visited sbitmap between the two passes, fixes clearing of the > all_vd[bb->index].e[regno].debug_insn_changes pointers after > cprop_hardreg_debug processes them and fixes up the n_debug_insn_changes > updates, so that the invariant that it always is the sum of length of the > debug_insn_changes chains for all hard regs in the bb is true (before that > n_debug_insn_changes could be higher, and in the final debug insn processing > case wasn't actually decremented at all, so the == 0 check was useless > there). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if it > succeeds on some other targets too? It's run through my tester without any issues. > > 2019-03-27 Jakub Jelinek > > * regcprop.c (copyprop_hardreg_forward_1): Remove redundant INSN_P > test. > (cprop_hardreg_bb, cprop_hardreg_debug): New functions. > (pass_cprop_hardreg::execute): Use those. Don't repeat bb processing > immediately after first one with df_analyze in between, but rather > process all bbs, queueing ones that need second pass in a worklist, > df_analyze, process queued debug insn changes and if second pass is > needed, process bbs from worklist, df_analyze, process queued debug > insns again. OK. Jeff
Re: [PATCH] optinfo-emit-json.cc: don't call get_fnname_from_decl (PR middle-end/89725)
On Thu, Mar 28, 2019 at 11:02:52AM -0400, David Malcolm wrote: > optrecord_json_writer::optinfo_to_json can in theory be called from any > optimization pass, but currently uses get_fnname_from_decl, which > is RTL-specific, which can lead to an ICE (PR middle-end/89725). The ICE is a separate problem. The problem with DECL_RTL early is that it is undesirable to create RTL for decls during gimple passes, we then need to make sure to clear it when versioning the function etc. > > In that PR, Jakub suggested using either DECL_ASSEMBLER_NAME or the > "printable name" (via current_function_name). > > This patch makes it use DECL_ASSEMBLER_NAME. > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > OK for trunk? > > gcc/ChangeLog: > PR middle-end/89725 > * optinfo-emit-json.cc (optrecord_json_writer::optinfo_to_json): > Use DECL_ASSEMBLER_NAME rather than get_fnname_from_decl. Ok, thanks. Jakub
[PATCH] optinfo-emit-json.cc: don't call get_fnname_from_decl (PR middle-end/89725)
optrecord_json_writer::optinfo_to_json can in theory be called from any optimization pass, but currently uses get_fnname_from_decl, which is RTL-specific, which can lead to an ICE (PR middle-end/89725). In that PR, Jakub suggested using either DECL_ASSEMBLER_NAME or the "printable name" (via current_function_name). This patch makes it use DECL_ASSEMBLER_NAME. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: PR middle-end/89725 * optinfo-emit-json.cc (optrecord_json_writer::optinfo_to_json): Use DECL_ASSEMBLER_NAME rather than get_fnname_from_decl. --- gcc/optinfo-emit-json.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc index 814446b..1cfcdfe 100644 --- a/gcc/optinfo-emit-json.cc +++ b/gcc/optinfo-emit-json.cc @@ -411,7 +411,8 @@ optrecord_json_writer::optinfo_to_json (const optinfo *optinfo) if (current_function_decl) { - const char *fnname = get_fnname_from_decl (current_function_decl); + const char *fnname + = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (current_function_decl)); obj->set ("function", new json::string (fnname)); } -- 1.8.5.3
Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities
On Thu, Mar 28, 2019 at 09:55:46AM +0100, Richard Biener wrote: > On Wed, Mar 27, 2019 at 4:26 PM Jeff Law wrote: > > > > On 3/27/19 8:36 AM, Jakub Jelinek wrote: > > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: > > >> However, I'm increasingly of the opinion that MIPS targets need to drop > > >> off the priority platform list. Given the trajectory I see for MIPS > > >> based processors in industry, it's really hard to justify spending this > > >> much time on them, particularly for low priority code quality issues. > > > > > > Besides what has been discussed on IRC for the PR89826 fix, that we really > > > need a df_analyze before processing the first block, because otherwise we > > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, > > > but I > > > admit I don't know much about df nor regcprop. > > RIght. I plan to commit that today along with the test reordering you > > pointed out. > > > > > > > > 1) the df_analyze () after every (successful) processing of a basic block > > > is IMHO way too expensive, I would be very surprised if df_analyze () > > > isn't > > > quadratic in number of basic blocks and so one could construct testcases > > > with millions of basic blocks and at least one regcprop change in each bb > > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the > > > 95% bbs you said won't have any changes at all) > > I'm going to look this further today. > > Look at > https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest > and you'll see multiple testcases with 'hard reg cprop' >10% compile-time. > It's indeed a hog for no good reason. I've tried https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071#c1 in --enable-checking=yes,rtl,extra bootstrapped cc1 at -O2, without and with the patch. The important times in -ftime-report with vanilla trunk: phase opt and generate : 250.76 (100%) 2.00 ( 96%) 253.36 (100%) 768860 kB ( 99%) df live regs : 19.95 ( 8%) 0.03 ( 1%) 19.39 ( 8%) 0 kB ( 0%) df live&initialized regs : 20.29 ( 8%) 0.05 ( 2%) 19.73 ( 8%) 0 kB ( 0%) df reg dead/unused notes : 158.66 ( 63%) 0.02 ( 1%) 160.12 ( 63%) 4665 kB ( 1%) hard reg cprop : 21.03 ( 8%) 0.01 ( 0%) 21.39 ( 8%) 509 kB ( 0%) TOTAL : 250.85 2.09253.57 776940 kB (ignoring everything <2% in the first % column). Configure with --enable-checking=release to disable checks. With the https://gcc.gnu.org/ml/gcc-patches/2019-03/msg01335.html patch the same testcase with -O2 -ftime-report results in identical assembly, but: phase opt and generate : 28.92 (100%) 1.82 ( 95%) 30.85 ( 99%) 768882 kB ( 99%) CFG verifier : 1.66 ( 6%) 0.02 ( 1%) 1.69 ( 5%) 0 kB ( 0%) df live regs : 0.63 ( 2%) 0.00 ( 0%) 0.61 ( 2%) 0 kB ( 0%) df live&initialized regs : 1.01 ( 3%) 0.03 ( 2%) 1.00 ( 3%) 0 kB ( 0%) df must-initialized regs : 1.51 ( 5%) 0.93 ( 48%) 2.46 ( 8%) 0 kB ( 0%) tree SSA verifier : 2.79 ( 10%) 0.01 ( 1%) 2.78 ( 9%) 0 kB ( 0%) tree STMT verifier : 2.00 ( 7%) 0.00 ( 0%) 1.99 ( 6%) 0 kB ( 0%) dominance computation : 0.61 ( 2%) 0.00 ( 0%) 0.59 ( 2%) 0 kB ( 0%) out of ssa : 0.61 ( 2%) 0.04 ( 2%) 0.65 ( 2%) 1 kB ( 0%) loop init : 0.58 ( 2%) 0.00 ( 0%) 0.63 ( 2%) 38 kB ( 0%) combiner : 0.44 ( 2%) 0.02 ( 1%) 0.47 ( 2%) 17926 kB ( 2%) integrated RA : 2.24 ( 8%) 0.08 ( 4%) 2.35 ( 8%) 205177 kB ( 26%) LRA non-specific : 1.46 ( 5%) 0.05 ( 3%) 1.50 ( 5%) 19172 kB ( 2%) LRA create live ranges : 1.23 ( 4%) 0.00 ( 0%) 1.23 ( 4%) 2589 kB ( 0%) reload CSE regs: 0.54 ( 2%) 0.00 ( 0%) 0.51 ( 2%) 8456 kB ( 1%) scheduling 2 : 0.73 ( 3%) 0.09 ( 5%) 0.81 ( 3%) 2715 kB ( 0%) verify RTL sharing : 1.19 ( 4%) 0.00 ( 0%) 1.15 ( 4%) 0 kB ( 0%) TOTAL : 29.02 1.92 31.07 776962 kB So 8.5x usr time speedup with that patch. Jakub
Re: [v3 PATCH] Don't revisit a variant we are already visiting.
On 28/03/19 15:21 +0200, Ville Voutilainen wrote: On Thu, 28 Mar 2019 at 15:07, Ville Voutilainen wrote: This shaves off some unnecessary codegen. In the assignment operators and swap, we are already visiting the rhs, so we shouldn't visit it again to get to its guts, we already have them in-hand. 2019-03-28 Ville Voutilainen Don't revisit a variant we are already visiting. * include/std/variant (__variant_construct_single): New. (__variant_construct): Use it. (_M_destructive_move): Likewise, turn into a template. (_M_destructive_copy): Likewise. (_Copy_assign_base::operator=): Adjust. (_Move_assign_base::operator=): Likewise. (swap): Likewise. Minor adjustment, use direct-init in all cases: -+ auto __tmp = std::move(__rhs_mem); ++ auto __tmp(std::move(__rhs_mem)); diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index 3932a8a..fe96cc8 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -428,20 +428,25 @@ namespace __variant using _Variant_storage_alias = _Variant_storage<_Traits<_Types...>::_S_trivial_dtor, _Types...>; + template + void __variant_construct_single(_Tp&& __lhs, _Up&& __rhs_mem) Please indent this line to match the opening brace, not the template-head. @@ -490,34 +495,38 @@ namespace __variant std::forward<_Move_ctor_base>(__rhs)); } - void _M_destructive_move(_Move_ctor_base&& __rhs) - { - this->_M_reset(); - __try - { - __variant_construct<_Types...>(*this, - std::forward<_Move_ctor_base>(__rhs)); - } - __catch (...) - { - this->_M_index = variant_npos; - __throw_exception_again; - } - } + template +void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs) Does __rhs_index need to be a template parameter in these function templates? It seems to always be passed size_t, but the value will always fit in unsigned short. Could it just use size_t? Or even unsigned, since we don't really need to pass 64 bits? I think typename _Base::__index_type is strictly correct, but that's a lot to type and to read. OK with the indentation tweak, and I'll leave changing the _Idx or not to your judgment.
Re: [PATCH] PR c/79022 fix mismatch parameter order in declaratio
On Thu, Mar 28, 2019 at 2:28 PM Jonathan Wakely wrote: > > The declaration of create_nested_ptr_option in the header has the 'from' > and 'to' parameters in the opposite order from the definition in > gengtype.c: > > /* Return an options structure for a "nested_ptr" option. */ > options_p > create_nested_ptr_option (options_p next, type_p t, > const char *to, const char *from) > > and the only caller in gengtype-parse.c: > > return create_nested_ptr_option (prev, ty, to, from); > > This patch swaps the parameter names in the declaration. > > PR c/79022 > * gengtype.h (create_nested_ptr_option): Fix parameter names to match > definition. > > > I've rebuilt the compiler, but not run the tests, because this doesn't > change the meaning of any code and seems almost obvious. > > OK for trunk? OK. > >
[PATCH] PR c/79022 fix mismatch parameter order in declaratio
The declaration of create_nested_ptr_option in the header has the 'from' and 'to' parameters in the opposite order from the definition in gengtype.c: /* Return an options structure for a "nested_ptr" option. */ options_p create_nested_ptr_option (options_p next, type_p t, const char *to, const char *from) and the only caller in gengtype-parse.c: return create_nested_ptr_option (prev, ty, to, from); This patch swaps the parameter names in the declaration. PR c/79022 * gengtype.h (create_nested_ptr_option): Fix parameter names to match definition. I've rebuilt the compiler, but not run the tests, because this doesn't change the meaning of any code and seems almost obvious. OK for trunk? commit 434ffd58b7ecb9e750d6cb2d955567ba1c0f4af2 Author: Jonathan Wakely Date: Thu Mar 28 13:07:30 2019 + PR c/79022 fix mismatch parameter order in declaratio The declaration of create_nested_ptr_option in the header has the 'from' and 'to' parameters in the opposite order from the definition in gengtype.c: /* Return an options structure for a "nested_ptr" option. */ options_p create_nested_ptr_option (options_p next, type_p t, const char *to, const char *from) and the only caller in gengtype-parse.c: return create_nested_ptr_option (prev, ty, to, from); This patch swaps the parameter names in the declaration. PR c/79022 * gengtype.h (create_nested_ptr_option): Fix parameter names to match definition. diff --git a/gcc/gengtype.h b/gcc/gengtype.h index db9cb0f401a..02be0c16b55 100644 --- a/gcc/gengtype.h +++ b/gcc/gengtype.h @@ -203,8 +203,8 @@ options_p create_nested_option (options_p next, const char* name, struct nested_ptr_data* info); /* Create a nested pointer option. */ -options_p create_nested_ptr_option (options_p, type_p t, -const char *from, const char *to); +options_p create_nested_ptr_option (options_p next, type_p t, + const char *to, const char *from); /* A name and a type. */ struct pair {
Re: [v3 PATCH] Don't revisit a variant we are already visiting.
On Thu, 28 Mar 2019 at 15:07, Ville Voutilainen wrote: > > This shaves off some unnecessary codegen. In the assignment > operators and swap, we are already visiting the rhs, so we shouldn't > visit it again to get to its guts, we already have them in-hand. > > 2019-03-28 Ville Voutilainen > > Don't revisit a variant we are already visiting. > * include/std/variant (__variant_construct_single): New. > (__variant_construct): Use it. > (_M_destructive_move): Likewise, turn into a template. > (_M_destructive_copy): Likewise. > (_Copy_assign_base::operator=): Adjust. > (_Move_assign_base::operator=): Likewise. > (swap): Likewise. Minor adjustment, use direct-init in all cases: -+ auto __tmp = std::move(__rhs_mem); ++ auto __tmp(std::move(__rhs_mem)); diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index 3932a8a..fe96cc8 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -428,20 +428,25 @@ namespace __variant using _Variant_storage_alias = _Variant_storage<_Traits<_Types...>::_S_trivial_dtor, _Types...>; + template + void __variant_construct_single(_Tp&& __lhs, _Up&& __rhs_mem) +{ + void* __storage = std::addressof(__lhs._M_u); + using _Type = remove_reference_t; + if constexpr (!is_same_v<_Type, __variant_cookie>) +::new (__storage) + _Type(std::forward(__rhs_mem)); +} + template void __variant_construct(_Tp&& __lhs, _Up&& __rhs) { __lhs._M_index = __rhs._M_index; - void* __storage = std::addressof(__lhs._M_u); - __do_visit([__storage](auto&& __rhs_mem) mutable + __do_visit([&__lhs](auto&& __rhs_mem) mutable -> __detail::__variant::__variant_cookie { - using _Type = remove_reference_t; - if constexpr (is_same_v<__remove_cvref_t, - remove_cv_t<_Type>> - && !is_same_v<_Type, __variant_cookie>) - ::new (__storage) - _Type(std::forward(__rhs_mem)); + __variant_construct_single(std::forward<_Tp>(__lhs), + std::forward(__rhs_mem)); return {}; }, __variant_cast<_Types...>(std::forward(__rhs))); } @@ -490,34 +495,38 @@ namespace __variant std::forward<_Move_ctor_base>(__rhs)); } - void _M_destructive_move(_Move_ctor_base&& __rhs) - { - this->_M_reset(); - __try - { - __variant_construct<_Types...>(*this, - std::forward<_Move_ctor_base>(__rhs)); - } - __catch (...) - { - this->_M_index = variant_npos; - __throw_exception_again; - } - } + template +void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __try + { + __variant_construct_single(*this, + std::forward<_Up>(__rhs)); + } + __catch (...) + { + this->_M_index = variant_npos; + __throw_exception_again; + } + } - void _M_destructive_copy(const _Move_ctor_base& __rhs) - { - this->_M_reset(); - __try - { - __variant_construct<_Types...>(*this, __rhs); - } - __catch (...) - { - this->_M_index = variant_npos; - __throw_exception_again; - } - } + template +void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __try + { + __variant_construct_single(*this, __rhs); + } + __catch (...) + { + this->_M_index = variant_npos; + __throw_exception_again; + } + } _Move_ctor_base(const _Move_ctor_base&) = default; _Move_ctor_base& operator=(const _Move_ctor_base&) = default; @@ -530,17 +539,21 @@ namespace __variant using _Base = _Copy_ctor_alias<_Types...>; using _Base::_Base; - void _M_destructive_move(_Move_ctor_base&& __rhs) - { - this->_M_reset(); - __variant_construct<_Types...>(*this, - std::forward<_Move_ctor_base>(__rhs)); - } - void _M_destructive_copy(const _Move_ctor_base& __rhs) - { - this->_M_reset(); - __variant_construct<_Types...>(*this, __rhs); - } + template +void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __variant_construct_single(*this, + std::forward<_Up>(__rhs)); + } + template +void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __variant_construct_single(*this, __rhs); + } }; template @@ -580,12 +593,13 @@ namespace __variant if constexpr (is_nothrow_copy_constructible_v<__rhs_type> || !is_nothrow_move_constructible_v<__rhs_type>) { - this->_M_destructive_copy(__rhs); + this->_M_destructive_copy(__rhs._M_index, __rhs_mem); } else { - _Copy_assign_base __tmp(__rhs); - this->_M_destructive_move(std::move(__tmp)); + a
Re: [PATCH 2/3] Extend format of -fdbg-cnt: add aux_base filter.
On 3/28/19 1:53 PM, Richard Biener wrote: > On Wed, Mar 27, 2019 at 10:52 AM marxin wrote: >> > > Hmm, I don't think we want this - it looks too much like a hack to me. > > I guess for an easier way to figure out the min/max ranges (do we > handle anti-ranges?) would be to specify a function name the > debug counter would be enabled for. That might also help in your > situation. Well, it's a bit hack, but a useful one. In my case I had a couple files which triggered the debug counter limit. Function filter will work, but note that you have also an IPA debug counters. Here function context can be not clear. > > What I've usually done is do the bisection via dbg-counter on > a single file, manually link the spec binary and then > do a runcpu w/o recompiling the binary. Having a complex build system, adjusting just -dbg-cnt is much more convenient than doing a manual setup. Martin > > Richard. > >> gcc/ChangeLog: >> >> 2019-03-27 Martin Liska >> >> * dbgcnt.c (dbg_cnt_set_limit_by_name): Add new argument >> aux_base and filter based on aux_base_name. >> (dbg_cnt_process_single_pair): Parse aux_base. >> * doc/invoke.texi: Document new extended format. >> >> gcc/testsuite/ChangeLog: >> >> 2019-03-27 Martin Liska >> >> * gcc.dg/dbg-cnt-1.c: New test. >> --- >> gcc/dbgcnt.c | 11 --- >> gcc/doc/invoke.texi | 8 ++-- >> gcc/testsuite/gcc.dg/dbg-cnt-1.c | 6 ++ >> 3 files changed, 20 insertions(+), 5 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/dbg-cnt-1.c >>
Re: [PATCH 1/3] Fix multiple values for -fdbg-cnt.
On Thu, Mar 28, 2019 at 2:05 PM Martin Liška wrote: > > On 3/28/19 1:51 PM, Richard Biener wrote: > > On Wed, Mar 27, 2019 at 10:52 AM marxin wrote: > >> > >> > >> gcc/ChangeLog: > >> > >> 2019-03-27 Martin Liska > >> > >> * dbgcnt.c (dbg_cnt_process_single_pair): Fix GNU coding style. > >> (dbg_cnt_process_opt): Parse first tokens aas > > > > as > > Better than 'ass' :) > > > > > so what was the issue you fixed? > > $ gcc -c -fdbg-cnt=dce:0,vect_slp:3 main.c > dbg_cnt 'dce' set to 0-3 > > while correct output is: > > $ gcc-8 -c -fdbg-cnt=dce:0,vect_slp:3 main.c > dbg_cnt 'dce' set to 0 > dbg_cnt 'vect_slp' set to 3 Ah. OK. Richard. > Martin > > > > >> dbg_cnt_process_single_pair is also using strtok. > >> --- > >> gcc/dbgcnt.c | 25 +++-- > >> 1 file changed, 15 insertions(+), 10 deletions(-) > >> >
[v3 PATCH] Don't revisit a variant we are already visiting.
This shaves off some unnecessary codegen. In the assignment operators and swap, we are already visiting the rhs, so we shouldn't visit it again to get to its guts, we already have them in-hand. 2019-03-28 Ville Voutilainen Don't revisit a variant we are already visiting. * include/std/variant (__variant_construct_single): New. (__variant_construct): Use it. (_M_destructive_move): Likewise, turn into a template. (_M_destructive_copy): Likewise. (_Copy_assign_base::operator=): Adjust. (_Move_assign_base::operator=): Likewise. (swap): Likewise. diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index 3932a8a..003cc15 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -428,20 +428,25 @@ namespace __variant using _Variant_storage_alias = _Variant_storage<_Traits<_Types...>::_S_trivial_dtor, _Types...>; + template + void __variant_construct_single(_Tp&& __lhs, _Up&& __rhs_mem) +{ + void* __storage = std::addressof(__lhs._M_u); + using _Type = remove_reference_t; + if constexpr (!is_same_v<_Type, __variant_cookie>) +::new (__storage) + _Type(std::forward(__rhs_mem)); +} + template void __variant_construct(_Tp&& __lhs, _Up&& __rhs) { __lhs._M_index = __rhs._M_index; - void* __storage = std::addressof(__lhs._M_u); - __do_visit([__storage](auto&& __rhs_mem) mutable + __do_visit([&__lhs](auto&& __rhs_mem) mutable -> __detail::__variant::__variant_cookie { - using _Type = remove_reference_t; - if constexpr (is_same_v<__remove_cvref_t, - remove_cv_t<_Type>> - && !is_same_v<_Type, __variant_cookie>) - ::new (__storage) - _Type(std::forward(__rhs_mem)); + __variant_construct_single(std::forward<_Tp>(__lhs), + std::forward(__rhs_mem)); return {}; }, __variant_cast<_Types...>(std::forward(__rhs))); } @@ -490,34 +495,38 @@ namespace __variant std::forward<_Move_ctor_base>(__rhs)); } - void _M_destructive_move(_Move_ctor_base&& __rhs) - { - this->_M_reset(); - __try - { - __variant_construct<_Types...>(*this, - std::forward<_Move_ctor_base>(__rhs)); - } - __catch (...) - { - this->_M_index = variant_npos; - __throw_exception_again; - } - } + template +void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __try + { + __variant_construct_single(*this, + std::forward<_Up>(__rhs)); + } + __catch (...) + { + this->_M_index = variant_npos; + __throw_exception_again; + } + } - void _M_destructive_copy(const _Move_ctor_base& __rhs) - { - this->_M_reset(); - __try - { - __variant_construct<_Types...>(*this, __rhs); - } - __catch (...) - { - this->_M_index = variant_npos; - __throw_exception_again; - } - } + template +void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __try + { + __variant_construct_single(*this, __rhs); + } + __catch (...) + { + this->_M_index = variant_npos; + __throw_exception_again; + } + } _Move_ctor_base(const _Move_ctor_base&) = default; _Move_ctor_base& operator=(const _Move_ctor_base&) = default; @@ -530,17 +539,21 @@ namespace __variant using _Base = _Copy_ctor_alias<_Types...>; using _Base::_Base; - void _M_destructive_move(_Move_ctor_base&& __rhs) - { - this->_M_reset(); - __variant_construct<_Types...>(*this, - std::forward<_Move_ctor_base>(__rhs)); - } - void _M_destructive_copy(const _Move_ctor_base& __rhs) - { - this->_M_reset(); - __variant_construct<_Types...>(*this, __rhs); - } + template +void _M_destructive_move(_Idx __rhs_index, _Up&& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __variant_construct_single(*this, + std::forward<_Up>(__rhs)); + } + template +void _M_destructive_copy(_Idx __rhs_index, const _Up& __rhs) +{ + this->_M_reset(); + this->_M_index = __rhs_index; + __variant_construct_single(*this, __rhs); + } }; template @@ -580,12 +593,13 @@ namespace __variant if constexpr (is_nothrow_copy_constructible_v<__rhs_type> || !is_nothrow_move_constructible_v<__rhs_type>) { - this->_M_destructive_copy(__rhs); + this->_M_destructive_copy(__rhs._M_index, __rhs_mem); } else { - _Copy_assign_base __tmp(__rhs); - this->_M_destructive_move(std::move(__tmp)); + auto __tmp(__rhs_mem); + this->_M_destructive_move(__rhs._M_index, + std::move(__tmp)); } } else @@ -641,8 +655,8 @@ namespace __variant } else { - _Move_assign_base __tmp(std::move(__rhs)); - t
Re: [PATCH 1/3] Fix multiple values for -fdbg-cnt.
On 3/28/19 1:51 PM, Richard Biener wrote: > On Wed, Mar 27, 2019 at 10:52 AM marxin wrote: >> >> >> gcc/ChangeLog: >> >> 2019-03-27 Martin Liska >> >> * dbgcnt.c (dbg_cnt_process_single_pair): Fix GNU coding style. >> (dbg_cnt_process_opt): Parse first tokens aas > > as Better than 'ass' :) > > so what was the issue you fixed? $ gcc -c -fdbg-cnt=dce:0,vect_slp:3 main.c dbg_cnt 'dce' set to 0-3 while correct output is: $ gcc-8 -c -fdbg-cnt=dce:0,vect_slp:3 main.c dbg_cnt 'dce' set to 0 dbg_cnt 'vect_slp' set to 3 Martin > >> dbg_cnt_process_single_pair is also using strtok. >> --- >> gcc/dbgcnt.c | 25 +++-- >> 1 file changed, 15 insertions(+), 10 deletions(-) >>
Re: [PATCH][Tree-optimization/PR89730]grant always_inline when -flive-patching=inline-only-static
On Tue, Mar 26, 2019 at 5:25 PM Qing Zhao wrote: > > Hi, Richard, > > thanks for the suggestion. > > I tried it yesterday, but it did not work. > > the reason is: > > inside “can_inline_edge_by_limits_p”, the “allowance for always_inline” is > guarded by the following condition: > > else if “caller_tree != callee_tree” > > this condition is ONLY true when when callee has different optimization level > than the caller. > > So, I think that the correct spot for the checking of > live-patching=inline-only-static should still be inside “can_inline_edge_p”. > > so my previous patch for this bug should be fine. > > what’s your opinion? You could still move it after the following? /* Check if caller growth allows the inlining. */ if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) && !disregard_limits && !lookup_attribute ("flatten", DECL_ATTRIBUTES (caller->decl)) && !caller_growth_limits (e)) inlinable = false; that said, looking at DECL_DISREGARD_INLINE_LIMITS in can_inline_edge_p looks wrong. Richard. > thanks. > > Qing > > > On Mar 25, 2019, at 7:23 AM, Richard Biener > > wrote: > > > > On Wed, Mar 20, 2019 at 4:16 PM Qing Zhao wrote: > >> > >> Hi, > >> > >> there is a bug in the current support for > >> -flive-patching=inline-only-static: > >> > >> it rejects inlining of external always_inline routine, therefore triggers > >> a compilation time error. > >> > >> we should always inline “always_inline” routines. > >> > >> please review the following simple patch, it has been bootstrapped and > >> regression tested on aarch64. > >> > >> Okay for committing? > > > > To me this seems like can_inline_edge_p was the wrong spot to disable > > inlining > > for -flive-patching=inline-only-static and instead it should have been in > > can_inline_edge_by_limits_p which already has proper allowance for > > always-inline. > > > > So can you move the check there, like after > > > > /* gcc.dg/pr43564.c. Apply user-forced inline even at -O0. */ > > else if (always_inline) > >; > > > > ? > > > >> thanks. > >> > >> Qing. > >> > >> gcc/ChangeLog: > >> > >> 2019-03-20 qing zhao > >> > >>PR tree-optimization/89730 > >> * ipa-inline.c (can_inline_edge_p): Grant always_inline even when > >> -flive-patching=inline-only-static. > >> > >> gcc/testsuite/ChangeLog: > >> > >> 2019-03-20 qing zhao > >> > >>* gcc.dg/live-patching-4.c: New test. > >> >
Re: [PATCH 2/3] Extend format of -fdbg-cnt: add aux_base filter.
On Wed, Mar 27, 2019 at 10:52 AM marxin wrote: > Hmm, I don't think we want this - it looks too much like a hack to me. I guess for an easier way to figure out the min/max ranges (do we handle anti-ranges?) would be to specify a function name the debug counter would be enabled for. That might also help in your situation. What I've usually done is do the bisection via dbg-counter on a single file, manually link the spec binary and then do a runcpu w/o recompiling the binary. Richard. > gcc/ChangeLog: > > 2019-03-27 Martin Liska > > * dbgcnt.c (dbg_cnt_set_limit_by_name): Add new argument > aux_base and filter based on aux_base_name. > (dbg_cnt_process_single_pair): Parse aux_base. > * doc/invoke.texi: Document new extended format. > > gcc/testsuite/ChangeLog: > > 2019-03-27 Martin Liska > > * gcc.dg/dbg-cnt-1.c: New test. > --- > gcc/dbgcnt.c | 11 --- > gcc/doc/invoke.texi | 8 ++-- > gcc/testsuite/gcc.dg/dbg-cnt-1.c | 6 ++ > 3 files changed, 20 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/dbg-cnt-1.c >
Re: [PATCH 1/3] Fix multiple values for -fdbg-cnt.
On Wed, Mar 27, 2019 at 10:52 AM marxin wrote: > > > gcc/ChangeLog: > > 2019-03-27 Martin Liska > > * dbgcnt.c (dbg_cnt_process_single_pair): Fix GNU coding style. > (dbg_cnt_process_opt): Parse first tokens aas as so what was the issue you fixed? > dbg_cnt_process_single_pair is also using strtok. > --- > gcc/dbgcnt.c | 25 +++-- > 1 file changed, 15 insertions(+), 10 deletions(-) >
Re: [PATCH 3/3] Dump -fdbg-cnt limit reach also to stderr stream.
On Wed, Mar 27, 2019 at 10:52 AM marxin wrote: > Hmm, I guess I can see how this is useful, thus OK. Richard. > gcc/ChangeLog: > > 2019-03-27 Martin Liska > > * dbgcnt.c (print_limit_reach): New function. > (dbg_cnt): Use it. > --- > gcc/dbgcnt.c | 26 -- > 1 file changed, 16 insertions(+), 10 deletions(-) >
Re: [RS6000] PR89271, gcc.target/powerpc/vsx-simode2.c
On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote: > Also, using a register move cost of 2 for for power9 direct moves > gives these fails, even with the .md file tweaks: > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler-not mtvsrwz > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler vspltisb > +FAIL: gcc.target/powerpc/vec-set-char.c scan-assembler xxspltib > +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler-not mtvsrwz > +FAIL: gcc.target/powerpc/vec-set-int.c scan-assembler vspltisw > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler-not mtvsrwz > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler vspltish > +FAIL: gcc.target/powerpc/vec-set-short.c scan-assembler xxspltib > +FAIL: gcc.target/powerpc/vsx-himode3.c scan-assembler lxsihzx > +FAIL: gcc.target/powerpc/vsx-qimode3.c scan-assembler lxsibzx > These can be all be fixed by removing "?"s disparaging vector > alternatives in movsi_internal1 and mov_internal. Like this. Bootstrapped and regression tested powerpc64le-linux. OK for stage1? * config/rs6000/rs6000.c (rs6000_register_move_cost): Reduce power9 direct move cost. * config/rs6000/rs6000.md (movsi_internal1): Don't disparage vector alternatives. (mov_internal): Likewise, excepting alternative that will be split. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index fbc16c65de4..a9e27b356df 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -35010,7 +35010,7 @@ rs6000_register_move_cost (machine_mode mode, if (rs6000_tune == PROCESSOR_POWER8) ret = 4 * hard_regno_nregs (0, mode); else - ret = 3 * hard_regno_nregs (0, mode); + ret = 2 * hard_regno_nregs (0, mode); /* SFmode requires a conversion when moving between gprs and vsx. */ if (mode == SFmode) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index d383504c600..2fbab973907 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -6832,10 +6832,10 @@ (define_insn "movsi_low" ;; MF%1 MT%0 NOP (define_insn "*movsi_internal1" [(set (match_operand:SI 0 "nonimmediate_operand" - "=r, r, r, ?wI, ?wH, -m, ?Z, ?Z, r, r, -r, ?wIwH, ?wJwK, ?wJwK, ?wu, -?wJwK, ?wH, ?wK, ?wIwH, ?r, + "=r, r, r, wI, wH, +m, Z, Z, r, r, +r, wIwH,wJwK,wJwK,wu, +wJwK, wH, wK, wIwH,r, r, *h, *h") (match_operand:SI 1 "input_operand" @@ -7106,13 +7106,13 @@ (define_expand "mov" ;; MTVSRWZ MF%1 MT%1 NOP (define_insn "*mov_internal" [(set (match_operand:QHI 0 "nonimmediate_operand" - "=r,r, ?wJwK, m, Z, r, -?wJwK, ?wJwK, ?wJwK, ?wK, ?wK, r, -?wJwK, r, *c*l, *h") + "=r,r, wJwK, m, Z, r, +wJwK, wJwK, wJwK, wK,?wK, r, +wJwK, r, *c*l, *h") (match_operand:QHI 1 "input_operand" "r, m, Z, r, wJwK, i, -wJwK, O, wM,wB,wS,?wJwK, +wJwK, O, wM,wB,wS,wJwK, r, *h,r, 0"))] "gpc_reg_operand (operands[0], mode) -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] Small cleanup in rtl.h
On Wed, 27 Mar 2019, Jakub Jelinek wrote: > Hi! > > This cleans up something I found ugly already several times. > NONDEBUG_INSN_P(X) used to be defined as > ((GET_CODE (X) == INSN || GET_CODE (X) == DEBUG_INSN > || GET_CODE (X) == JUMP_INSN || GET_CODE (X) == CALL_INSN) > && GET_CODE (X) != DEBUG_INSN) > rather than the simpler > (GET_CODE (X) == INSN || GET_CODE (X) == JUMP_INSN || GET_CODE (X) == > CALL_INSN) > > INSN_P is defined the same as before, just with DEBUG_INSN test at the end. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. > 2019-03-27 Jakub Jelinek > > * rtl.h (NONDEBUG_INSN_P): Define as NONJUMP_INSN_P or JUMP_P > or CALL_P instead of INSN_P && !DEBUG_INSN_P. > (INSN_P): Define using NONDEBUG_INSN_P or DEBUG_INSN_P. > > --- gcc/rtl.h.jj 2019-01-25 23:46:08.058166588 +0100 > +++ gcc/rtl.h 2019-03-27 17:35:39.348562954 +0100 > @@ -840,7 +840,7 @@ struct GTY(()) rtvec_def { > #define DEBUG_INSN_P(X) (GET_CODE (X) == DEBUG_INSN) > > /* Predicate yielding nonzero iff X is an insn that is not a debug insn. */ > -#define NONDEBUG_INSN_P(X) (INSN_P (X) && !DEBUG_INSN_P (X)) > +#define NONDEBUG_INSN_P(X) (NONJUMP_INSN_P (X) || JUMP_P (X) || CALL_P (X)) > > /* Nonzero if DEBUG_MARKER_INSN_P may possibly hold. */ > #define MAY_HAVE_DEBUG_MARKER_INSNS debug_nonbind_markers_p > @@ -851,8 +851,7 @@ struct GTY(()) rtvec_def { >(MAY_HAVE_DEBUG_MARKER_INSNS || MAY_HAVE_DEBUG_BIND_INSNS) > > /* Predicate yielding nonzero iff X is a real insn. */ > -#define INSN_P(X) \ > - (NONJUMP_INSN_P (X) || DEBUG_INSN_P (X) || JUMP_P (X) || CALL_P (X)) > +#define INSN_P(X) (NONDEBUG_INSN_P (X) || DEBUG_INSN_P (X)) > > /* Predicate yielding nonzero iff X is a note insn. */ > #define NOTE_P(X) (GET_CODE (X) == NOTE) > > Jakub > -- Richard Biener SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities
On Wed, Mar 27, 2019 at 4:26 PM Jeff Law wrote: > > On 3/27/19 8:36 AM, Jakub Jelinek wrote: > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote: > >> However, I'm increasingly of the opinion that MIPS targets need to drop > >> off the priority platform list. Given the trajectory I see for MIPS > >> based processors in industry, it's really hard to justify spending this > >> much time on them, particularly for low priority code quality issues. > > > > Besides what has been discussed on IRC for the PR89826 fix, that we really > > need a df_analyze before processing the first block, because otherwise we > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I > > admit I don't know much about df nor regcprop. > RIght. I plan to commit that today along with the test reordering you > pointed out. > > > > > 1) the df_analyze () after every (successful) processing of a basic block > > is IMHO way too expensive, I would be very surprised if df_analyze () isn't > > quadratic in number of basic blocks and so one could construct testcases > > with millions of basic blocks and at least one regcprop change in each bb > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the > > 95% bbs you said won't have any changes at all) > I'm going to look this further today. Look at https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest and you'll see multiple testcases with 'hard reg cprop' >10% compile-time. It's indeed a hog for no good reason. Richard. > > > > 2) another thing I've noticed today, we queue up the debug insn changes, > > but if we iterate the second time for any bb, we overwrite and thus lose any > > of the debug insn queued changes from the first iteration, so those changes > > aren't applied (wrong-debug?) > This is actually what worries me, both with the current bits and with > any bits that change to a worklist of blocks to reprocess. As is I > think we preserve the debug stuff across the iterations which should > keep debug info correct. Managing that in a world where we queue up the > iteration step is going to be tougher. > > Queuing the iteration step feels like gcc-10 to me, but we'll see if it > falls out easier than expected. > > > > > 3) as mentioned on IRC, the order of tests in copyprop_hardreg_forward_1 > > conditional doesn't start from the cheapest to most expensive one > That's going to be fixed in today's commit since it's been tested. > > Jeff
Re: GCC 7 backport
Hi. Backporting one documentation fix. Martin >From 7bcca48f559a3fefaf37b177b3c72e78d73d73ba Mon Sep 17 00:00:00 2001 From: marxin Date: Thu, 28 Mar 2019 09:51:06 +0100 Subject: [PATCH] Backport r265786 gcc/ChangeLog: 2018-11-05 Martin Liska PR web/87829 * doc/invoke.texi: Remove options that are not disabled with -Os. --- gcc/doc/invoke.texi | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a41d275e89d..8f279e454b0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7140,8 +7140,7 @@ do not typically increase code size. @option{-Os} disables the following optimization flags: @gccoptlist{-falign-functions -falign-jumps -falign-loops @gol --falign-labels -freorder-blocks -freorder-blocks-algorithm=stc @gol --freorder-blocks-and-partition -fprefetch-loop-arrays} +-falign-labels -fprefetch-loop-arrays} It also enables @option{-finline-functions}, causes the compiler to tune for code size rather than execution speed, and performs further optimizations -- 2.21.0
Re: GCC 8 backports
Hi. I'm backporting a documentation fix. Martin >From 1055e2c2787be337c33787608473a7a4bcbe82b8 Mon Sep 17 00:00:00 2001 From: marxin Date: Thu, 28 Mar 2019 09:47:27 +0100 Subject: [PATCH] Backport r265786 gcc/ChangeLog: 2018-11-05 Martin Liska PR web/87829 * doc/invoke.texi: Remove options that are not disabled with -Os. --- gcc/doc/invoke.texi | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4ee24c3ab78..de86d7a1e01 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7680,8 +7680,7 @@ do not typically increase code size. @option{-Os} disables the following optimization flags: @gccoptlist{-falign-functions -falign-jumps -falign-loops @gol --falign-labels -freorder-blocks -freorder-blocks-algorithm=stc @gol --freorder-blocks-and-partition -fprefetch-loop-arrays} +-falign-labels -fprefetch-loop-arrays} It also enables @option{-finline-functions}, causes the compiler to tune for code size rather than execution speed, and performs further optimizations -- 2.21.0
Re: [PATCH] Revert r254150 (PR bootstrap/89829).
On Wed, Mar 27, 2019 at 11:02 AM Martin Liška wrote: > > Hi. > > The revision should be reverted as prev-* is the location > where stagetrain stores all *.gcda files. > > I've tested that with PGO on x86_64-linux-gnu. > > Ready for trunk? OK. > Thanks, > Martin > > ChangeLog: > > 2019-03-27 Martin Liska > > PR bootstrap/89829 > * Makefile.in: Revert r254150. > * Makefile.tpl: Likewise. > --- > Makefile.in | 4 ++-- > Makefile.tpl | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > >
Re: Ping Re: [PATCH] Add missing avx512dqintrin.h _mm_mask_fpclass_s[sd]_mask (PR target/897803)
On Thu, Mar 28, 2019 at 7:47 AM Hongtao Liu wrote: > > Hi Uros: > would you help to review this patch? This is AVX512F patch, you will need the approval from the maintainer first. I have no plans to maintain AVX512 beyond rubber-stamping OK dead obvious regression from a reputable contributors. It is simply too much involvment for me. If the appointed maintainer doesn't respond anymore, then I suggest you raise the issue with GCC steering committe. Uros. > Regards, > Hongtao. > > On Sun, Mar 24, 2019 at 8:13 PM Hongtao Liu wrote: > > > > Hi: > > The following patch adds forgotten avx512f fpclass instrinsics for > > masked scalar operations. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux (on skylake-avx512), > > ok for trunk? > > > > Index: ChangeLog > > === > > --- ChangeLog (revision 269894) > > +++ ChangeLog (working copy) > > @@ -1,3 +1,16 @@ > > +2019-03-24 Hongtao Liu > > + > > + PR target/89803 > > + * config/i386/avx512dqintrin.h > > + (_mm_mask_fpclass_ss_mask,_mm_mask_fpclass_sd_mask): > > + New intrinsics. > > + * config/i386/i386-builtin.def > > + (__builtin_ia32_fpcla_mask, _builtin_ia32_fpclasssd_mask): > > + New builtins. > > + * config/i386/sse.md > > + (define_insn "avx512dq_vmfpclass): > > + Modified with mask. > > + > > 2019-03-23 Segher Boessenkool > > > > * config/rs6000/xmmintrin.h (_mm_movemask_pi8): Implement for 32-bit > > Index: config/i386/avx512dqintrin.h > > === > > --- config/i386/avx512dqintrin.h (revision 269894) > > +++ config/i386/avx512dqintrin.h (working copy) > > @@ -1372,6 +1372,20 @@ > >return (__mmask8) __builtin_ia32_fpclasssd ((__v2df) __A, __imm); > > } > > > > +extern __inline __mmask8 > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > +_mm_mask_fpclass_ss_mask (__mmask8 __U, __m128 __A, const int __imm) > > +{ > > + return (__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) __A, __imm, > > __U); > > +} > > + > > +extern __inline __mmask8 > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > +_mm_mask_fpclass_sd_mask (__mmask8 __U, __m128d __A, const int __imm) > > +{ > > + return (__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) __A, __imm, > > __U); > > +} > > + > > extern __inline __m512i > > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > _mm512_cvtt_roundpd_epi64 (__m512d __A, const int __R) > > @@ -2623,6 +2637,12 @@ > > #define _mm_fpclass_sd_mask(X, C) \ > >((__mmask8) __builtin_ia32_fpclasssd ((__v2df) (__m128d) (X), (int) > > (C))) \ > > > > +#define _mm_mask_fpclass_ss_mask(X, C, U) \ > > + ((__mmask8) __builtin_ia32_fpcla_mask ((__v4sf) (__m128) (X), > > (int) (C)), (__mmask8) (U)) > > + > > +#define _mm_mask_fpclass_sd_mask(X, C, U) \ > > + ((__mmask8) __builtin_ia32_fpclasssd_mask ((__v2df) (__m128d) (X), > > (int) (C)), (__mmask8) (U)) > > + > > #define _mm512_mask_fpclass_pd_mask(u, X, C)\ > >((__mmask8) __builtin_ia32_fpclasspd512_mask ((__v8df) (__m512d) (X), \ > > (int) (C), (__mmask8)(u))) > > Index: config/i386/i386-builtin.def > > === > > --- config/i386/i386-builtin.def (revision 269894) > > +++ config/i386/i386-builtin.def (working copy) > > @@ -2082,9 +2082,11 @@ > > BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, > > CODE_FOR_avx512dq_fpclassv4df_mask, > > "__builtin_ia32_fpclasspd256_mask", IX86_BUILTIN_FPCLASSPD256, > > UNKNOWN, (int) QI_FTYPE_V4DF_INT_UQI) > > BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, > > CODE_FOR_avx512dq_fpclassv2df_mask, > > "__builtin_ia32_fpclasspd128_mask", IX86_BUILTIN_FPCLASSPD128, > > UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI) > > BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv2df, > > "__builtin_ia32_fpclasssd", IX86_BUILTIN_FPCLASSSD, UNKNOWN, (int) > > QI_FTYPE_V2DF_INT) > > +BDESC (OPTION_MASK_ISA_AVX512DQ, 0, > > CODE_FOR_avx512dq_vmfpclassv2df_mask, "__builtin_ia32_fpclasssd_mask", > > IX86_BUILTIN_FPCLASSSD_MASK, UNKNOWN, (int) QI_FTYPE_V2DF_INT_UQI) > > BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, > > CODE_FOR_avx512dq_fpclassv8sf_mask, > > "__builtin_ia32_fpclassps256_mask", IX86_BUILTIN_FPCLASSPS256, > > UNKNOWN, (int) QI_FTYPE_V8SF_INT_UQI) > > BDESC (OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512VL, 0, > > CODE_FOR_avx512dq_fpclassv4sf_mask, > > "__builtin_ia32_fpclassps128_mask", IX86_BUILTIN_FPCLASSPS128, > > UNKNOWN, (int) QI_FTYPE_V4SF_INT_UQI) > > BDESC (OPTION_MASK_ISA_AVX512DQ, 0, CODE_FOR_avx512dq_vmfpclassv4sf, > > "__builtin_ia32_fpcla", IX86_BUILTIN_FPCLA, UNKNOWN, (int) > > QI_FTYPE_V4SF_INT) > > +BDESC (OPTION_MASK_ISA_AVX512DQ, 0, > > CODE_FOR_avx512dq_vmfpclassv4sf_mask, "__builtin_ia32_fpcla_mask", > > IX86_BUILTIN_FPCLA_MASK, UNKNOWN, (int)