[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2012-07-02 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

Richard Guenther  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED
   Target Milestone|4.5.4   |4.6.0

--- Comment #29 from Richard Guenther  2012-07-02 
10:27:51 UTC ---
Fixed in 4.6.0, the 4.5 branch is being closed.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2011-04-28 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

Richard Guenther  changed:

   What|Removed |Added

   Target Milestone|4.5.3   |4.5.4

--- Comment #28 from Richard Guenther  2011-04-28 
14:51:41 UTC ---
GCC 4.5.3 is being released, adjusting target milestone.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2011-03-08 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

Richard Guenther  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to fail||4.5.2


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-28 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

Richard Guenther  changed:

   What|Removed |Added

   Target Milestone|--- |4.5.3

--- Comment #27 from Richard Guenther  2010-12-28 
14:57:37 UTC ---
(In reply to comment #24)
> VCE is often very expensive though (often a memory store followed by memory
> load into a different register, etc.), so 0 unconditionally is IMHO wrong.
> Perhaps for some TYPE_MODE combinations at most.

I think assuming VCE is zero-cost on the tree level makes sense though,
as they tend to get away usually (that is, when they appear in regular
code, not as a result of weird type punnings).


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-21 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #26 from Jan Hubicka  2010-12-21 
10:39:42 UTC ---
Hi,
I read the comment only after comiting the patch.  We generally believe
conversions to be free even if this is not always the case. FP->int conversions
tends to be expensive, too.  I don't think it is serious problem since the
conversions tends to be dominated by real work elsewhere and there is good
chance for conversions to combine and optimize when code is duplicated by
inlining or peeling or so.

For non-registers V_C_Es are already counted as all non-register accesses are
believed to be read/writes.  So all we get wrong are those int<->fp V_C_Es. I
don't think they are terribly common and it is very target specific on how
expensive they really are... SSE intrincics and SRA are nowdays both quite good
source of V_C_Es that are cheap so I would guess that wast majority of them is
cheap anyway.

Honza


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-21 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #25 from Jan Hubicka  2010-12-21 
10:30:36 UTC ---
Author: hubicka
Date: Tue Dec 21 10:30:33 2010
New Revision: 168108

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=168108
Log:

PR middle-end/47000
* tree-inline.c (estimate_operator_cost): Handle VIEW_CONVERT_EXPR.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-inline.c


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-20 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #24 from Jakub Jelinek  2010-12-20 
08:32:10 UTC ---
VCE is often very expensive though (often a memory store followed by memory
load into a different register, etc.), so 0 unconditionally is IMHO wrong.
Perhaps for some TYPE_MODE combinations at most.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-19 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #23 from Jan Hubicka  2010-12-19 
11:58:35 UTC ---
sha256_4way.c:287:78: warning: called from here
sha256_4way.c:50:23: warning: inlining failed in call to ‘ROTR’: --param
inline-unit-growth limit reached

so you could also workaround with --param inline-unit-growth=.
Otherwise H.J.'s proposed backport seems like most sane way to solve the
problem.  I guess it can be backported.


I am testing
Index: tree-inline.c
===
--- tree-inline.c   (revision 168047)
+++ tree-inline.c   (working copy)
@@ -3281,6 +3281,7 @@ estimate_operator_cost (enum tree_code c
 CASE_CONVERT:
 case COMPLEX_EXPR:
 case PAREN_EXPR:
+case VIEW_CONVERT_EXPR:
   return 0;

 /* Assign cost of 1 to usual operations.

to solve the V_C_E problems.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-19 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #22 from Jan Hubicka  2010-12-19 
11:53:32 UTC ---
  freq:  8000 size:  2 time:  2 D.13088_5480 = VIEW_CONVERT_EXPR(D.8004_729);
  freq:  8000 size:  2 time:  2 D.13087_5481 = VIEW_CONVERT_EXPR(a_5271);
  freq:  8000 size:  7 time: 16 D.13086_5482 = __builtin_ia32_paddd128
(D.13087_5481, D.13088_5480);

obviously we also should count V_C_E as free like other conversions.  Will test
patch for that.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-19 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #21 from Jan Hubicka  2010-12-19 11:49:53 
UTC ---
> I'd like to wait for Honza's opinion before we just start trying random
> patches. 
Well, if H.J.'s proposed backport of the builtin cost sizes helps, I guess it
is sane
way to fix this.  I will take a look why main inliner don't do the job when
early inliner
ignores the call.

I am not sure how much of heuristics changes makes sense to backport to 4.5.
Depends on importance
of the regression I guess.

Honza


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #20 from Jeff Garzik  2010-12-18 21:25:46 
UTC ---
(In reply to comment #16)
> I don't think it is a good idea to change inliner heuristics in 4.5 at this
> point.  If it is always a good idea to inline that function, it should be
> __attribute__((always_inline)).

I confirm that replacing 'inline' with '__attribute__((always_inline))' also
resolves the regression.

It is a bit disappointing to leave such a major performance diff (-26%!) in
latest stable compiler release without resolution (if the decision is to leave
the inliner alone).


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #19 from Jeff Garzik  2010-12-18 21:17:09 
UTC ---
(In reply to comment #14)
> Created attachment 22813 [details]
> A new patch
> 
> Try this.

This patch successfully fixes the performance regression in 4.5.1.

Thanks!


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #18 from Jeff Garzik  2010-12-18 21:16:31 
UTC ---
argh, please ignore comment #17.  misquote.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jgarzik at pobox dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #17 from Jeff Garzik  2010-12-18 21:15:28 
UTC ---
(In reply to comment #8)
> -if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_MD)
> +/* Do not special case builtins where we see the body.
> +   This just confuse inliner.  */
> +if (!decl || cgraph_node (decl)->analyzed)
> +  ;
> +else if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_MD)
>cost = weights->target_builtin_call_cost;
>  else
>cost = weights->call_cost;

This patch successfully fixes the performance regression in 4.5.1.


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #16 from Jakub Jelinek  2010-12-18 
20:26:29 UTC ---
I don't think it is a good idea to change inliner heuristics in 4.5 at this
point.  If it is always a good idea to inline that function, it should be
__attribute__((always_inline)).


[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread hjl.tools at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #15 from H.J. Lu  2010-12-18 19:38:47 
UTC ---
(In reply to comment #13)
> I'd like to wait for Honza's opinion before we just start trying random
> patches. 
> 
> But if you feel like trying some other things, perhaps you can see if
> backporting all changes of
> http://gcc.gnu.org/viewcvs?view=revision&revision=166517 helps.

This checkin depends on is_simple_builtin and is_inexpensive_builtin,
which are new in 4.6.



[Bug target/47000] [4.5 Regression] Failure to inline SSE intrinsics

2010-12-18 Thread hjl.tools at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47000

--- Comment #14 from H.J. Lu  2010-12-18 19:35:24 
UTC ---
Created attachment 22813
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22813
A new patch

Try this.