[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 Richard Guenther changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED Target Milestone|4.5.4 |4.6.0 --- Comment #29 from Richard Guenther 2012-07-02 10:27:51 UTC --- Fixed in 4.6.0, the 4.5 branch is being closed.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 Richard Guenther changed: What|Removed |Added Target Milestone|4.5.3 |4.5.4 --- Comment #28 from Richard Guenther 2011-04-28 14:51:41 UTC --- GCC 4.5.3 is being released, adjusting target milestone.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 Richard Guenther changed: What|Removed |Added Priority|P3 |P2 Known to fail||4.5.2
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 Richard Guenther changed: What|Removed |Added Target Milestone|--- |4.5.3 --- Comment #27 from Richard Guenther 2010-12-28 14:57:37 UTC --- (In reply to comment #24) > VCE is often very expensive though (often a memory store followed by memory > load into a different register, etc.), so 0 unconditionally is IMHO wrong. > Perhaps for some TYPE_MODE combinations at most. I think assuming VCE is zero-cost on the tree level makes sense though, as they tend to get away usually (that is, when they appear in regular code, not as a result of weird type punnings).
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #26 from Jan Hubicka 2010-12-21 10:39:42 UTC --- Hi, I read the comment only after comiting the patch. We generally believe conversions to be free even if this is not always the case. FP->int conversions tends to be expensive, too. I don't think it is serious problem since the conversions tends to be dominated by real work elsewhere and there is good chance for conversions to combine and optimize when code is duplicated by inlining or peeling or so. For non-registers V_C_Es are already counted as all non-register accesses are believed to be read/writes. So all we get wrong are those int<->fp V_C_Es. I don't think they are terribly common and it is very target specific on how expensive they really are... SSE intrincics and SRA are nowdays both quite good source of V_C_Es that are cheap so I would guess that wast majority of them is cheap anyway. Honza
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #25 from Jan Hubicka 2010-12-21 10:30:36 UTC --- Author: hubicka Date: Tue Dec 21 10:30:33 2010 New Revision: 168108 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168108 Log: PR middle-end/47000 * tree-inline.c (estimate_operator_cost): Handle VIEW_CONVERT_EXPR. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-inline.c
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #24 from Jakub Jelinek 2010-12-20 08:32:10 UTC --- VCE is often very expensive though (often a memory store followed by memory load into a different register, etc.), so 0 unconditionally is IMHO wrong. Perhaps for some TYPE_MODE combinations at most.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #23 from Jan Hubicka 2010-12-19 11:58:35 UTC --- sha256_4way.c:287:78: warning: called from here sha256_4way.c:50:23: warning: inlining failed in call to ‘ROTR’: --param inline-unit-growth limit reached so you could also workaround with --param inline-unit-growth=. Otherwise H.J.'s proposed backport seems like most sane way to solve the problem. I guess it can be backported. I am testing Index: tree-inline.c === --- tree-inline.c (revision 168047) +++ tree-inline.c (working copy) @@ -3281,6 +3281,7 @@ estimate_operator_cost (enum tree_code c CASE_CONVERT: case COMPLEX_EXPR: case PAREN_EXPR: +case VIEW_CONVERT_EXPR: return 0; /* Assign cost of 1 to usual operations. to solve the V_C_E problems.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #22 from Jan Hubicka 2010-12-19 11:53:32 UTC --- freq: 8000 size: 2 time: 2 D.13088_5480 = VIEW_CONVERT_EXPR(D.8004_729); freq: 8000 size: 2 time: 2 D.13087_5481 = VIEW_CONVERT_EXPR(a_5271); freq: 8000 size: 7 time: 16 D.13086_5482 = __builtin_ia32_paddd128 (D.13087_5481, D.13088_5480); obviously we also should count V_C_E as free like other conversions. Will test patch for that.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #21 from Jan Hubicka 2010-12-19 11:49:53 UTC --- > I'd like to wait for Honza's opinion before we just start trying random > patches. Well, if H.J.'s proposed backport of the builtin cost sizes helps, I guess it is sane way to fix this. I will take a look why main inliner don't do the job when early inliner ignores the call. I am not sure how much of heuristics changes makes sense to backport to 4.5. Depends on importance of the regression I guess. Honza
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #20 from Jeff Garzik 2010-12-18 21:25:46 UTC --- (In reply to comment #16) > I don't think it is a good idea to change inliner heuristics in 4.5 at this > point. If it is always a good idea to inline that function, it should be > __attribute__((always_inline)). I confirm that replacing 'inline' with '__attribute__((always_inline))' also resolves the regression. It is a bit disappointing to leave such a major performance diff (-26%!) in latest stable compiler release without resolution (if the decision is to leave the inliner alone).
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #19 from Jeff Garzik 2010-12-18 21:17:09 UTC --- (In reply to comment #14) > Created attachment 22813 [details] > A new patch > > Try this. This patch successfully fixes the performance regression in 4.5.1. Thanks!
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #18 from Jeff Garzik 2010-12-18 21:16:31 UTC --- argh, please ignore comment #17. misquote.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #17 from Jeff Garzik 2010-12-18 21:15:28 UTC --- (In reply to comment #8) > -if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_MD) > +/* Do not special case builtins where we see the body. > + This just confuse inliner. */ > +if (!decl || cgraph_node (decl)->analyzed) > + ; > +else if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_MD) >cost = weights->target_builtin_call_cost; > else >cost = weights->call_cost; This patch successfully fixes the performance regression in 4.5.1.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #16 from Jakub Jelinek 2010-12-18 20:26:29 UTC --- I don't think it is a good idea to change inliner heuristics in 4.5 at this point. If it is always a good idea to inline that function, it should be __attribute__((always_inline)).
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #15 from H.J. Lu 2010-12-18 19:38:47 UTC --- (In reply to comment #13) > I'd like to wait for Honza's opinion before we just start trying random > patches. > > But if you feel like trying some other things, perhaps you can see if > backporting all changes of > http://gcc.gnu.org/viewcvs?view=revision&revision=166517 helps. This checkin depends on is_simple_builtin and is_inexpensive_builtin, which are new in 4.6.
[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000 --- Comment #14 from H.J. Lu 2010-12-18 19:35:24 UTC --- Created attachment 22813 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22813 A new patch Try this.