Re: [RFC] Context sensitive inline analysis

2011-10-03 Thread Richard Sandiford
Richard Sandiford  writes:
> Jan Hubicka  writes:
>> the problem is sign overflow in time computation. Time should be
>> capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
>> 2. This happens to be >2^31 & <2^32 so we overflow here because of use
>> of signed arithmetics.
>>
>> Index: ipa-inline-analysis.c
>> ===
>> --- ipa-inline-analysis.c(revision 179266)
>> +++ ipa-inline-analysis.c(working copy)
>> @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
>>  /* Estimate runtime of function can easilly run into huge numbers with many
>> nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in 
>> integer.
>> For anything larger we use gcov_type.  */
>> -#define MAX_TIME 100
>> +#define MAX_TIME 50
>>  
>>  /* Number of bits in integer, but we really want to be stable across 
>> different
>> hosts.  */
>
> Could you update the comment too?  ("time * INLINE_SIZE_SCALE * 2")

OK, I did it myself.  Tested on x86_64-linux-gnu and applied as obvious.

Richard


gcc/
* ipa-inline-analysis.c (MAX_TIME): Update comment.

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   2011-10-03 09:10:21.0 +0100
+++ gcc/ipa-inline-analysis.c   2011-10-03 09:10:55.633044417 +0100
@@ -90,8 +90,8 @@ Software Foundation; either version 3, o
 #include "alloc-pool.h"
 
 /* Estimate runtime of function can easilly run into huge numbers with many
-   nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
-   For anything larger we use gcov_type.  */
+   nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE * 2 in an
+   integer.  For anything larger we use gcov_type.  */
 #define MAX_TIME 50
 
 /* Number of bits in integer, but we really want to be stable across different


Re: [RFC] Context sensitive inline analysis

2011-09-28 Thread Richard Sandiford
Jan Hubicka  writes:
> the problem is sign overflow in time computation. Time should be
> capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
> 2. This happens to be >2^31 & <2^32 so we overflow here because of use
> of signed arithmetics.
>
> Index: ipa-inline-analysis.c
> ===
> --- ipa-inline-analysis.c (revision 179266)
> +++ ipa-inline-analysis.c (working copy)
> @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
>  /* Estimate runtime of function can easilly run into huge numbers with many
> nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
> For anything larger we use gcov_type.  */
> -#define MAX_TIME 100
> +#define MAX_TIME 50
>  
>  /* Number of bits in integer, but we really want to be stable across 
> different
> hosts.  */

Could you update the comment too?  ("time * INLINE_SIZE_SCALE * 2")

Richard


Re: [RFC] Context sensitive inline analysis

2011-09-27 Thread Jan Hubicka
> > This caused:
> >
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49179
> >
> >
> 
> This also caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49091
> 

Hi,
the problem is sign overflow in time computation. Time should be capped by 
MAX_TIME
and we compute MAX_TIME * INLINE_SIZE_SCALE * 2. This happens to be >2^31 & 
<2^32
so we overflow here because of use of signed arithmetics.

Hopefully the following is enough.  The floating point arithmetics would make 
things easier
since loop structure can scale times up a lot and their relative comparsions 
matters for
benefit computation.  Not sure if switching it to our software floats is the 
coolest
idea however.

Will commit it after testing on x86_64-linux

Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 179266)
+++ ipa-inline-analysis.c   (working copy)
@@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
 /* Estimate runtime of function can easilly run into huge numbers with many
nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
For anything larger we use gcov_type.  */
-#define MAX_TIME 100
+#define MAX_TIME 50
 
 /* Number of bits in integer, but we really want to be stable across different
hosts.  */


Re: [RFC] Context sensitive inline analysis

2011-05-26 Thread H.J. Lu
On Thu, May 26, 2011 at 9:53 PM, H.J. Lu  wrote:
> On Fri, Apr 22, 2011 at 5:35 AM, Jan Hubicka  wrote:
>> Hi,
>> this patch implements infrastructure to summarize function body size&time in 
>> a
>> way that is sensitive on function context.  At the moment context means
>>
>>  1) if function is inline or offline
>>  2) if some of parameters are known compile time constants.
>>
>> but we should handle more later.
>>
>> The analysis is implemented by introducing notion of predicates, that are
>> simple logical formulas in conjunctive-disjunctive form on conditions.
>> Conditions are simple tests like "function is not inlined" "op0 is not
>> constant" "op0 > 6". That is one can express things like "this statement will
>> execute if op1 >6 or op0 is not constant".
>>
>> The patch implements simple infrastructure to represent the predicates.  
>> There
>> are hard limits on everything - i.e. there are no more than 32 different
>> conditions and no more than 8 clauses.  This makes it possible to test 
>> clauses
>> by simple logicaloperations on integers and represent predicates at array of 
>> 8
>> integers thatis very cheap.  The implementation details are quite contained 
>> so
>> we might relax the limits, but I don't really see a need for that.
>>
>> The main point of designing this was to allow effective way of evaulating 
>> those
>> predicates for given context, since this happens many times during inlining,
>> and to not make WPA memory usage grow too much.  At the same time I wanted 
>> the
>> infrastructure to be flexible enough to allow adding more tricks in future.
>> Like we might consider extra inlining points if callee uses the argument to
>> drive number of iterations of loop or when caller pass a pointer to a static
>> variable that might be SRAed after inlining etc. etc.
>>
>> At the moment only consumer of predicates is size/time vector that is vector
>> of simple entries consiting of size, time and predicate.  Function size/time 
>> is then
>> computed as sum of all entries whose predicate might be true in given 
>> context +
>> size/time of all call edges (this is because call edges can disappear at 
>> different
>> conditions or be turned into constant).
>>
>> I plan to add more uses of predicates in near future - i.e. attaching
>> predicates to edges so we know what calls will be optimized out at WPA time.
>> Also I plan to use the analysis to drive function clonning (i.e. partial
>> specialization): when function is called from several places with the same
>> context and the context makes a difference at expected runtime, clone the
>> function.
>>
>> The analysis part deciding on predicates is currently very simple, kind of 
>> proof
>> of concept:
>>
>>  1) Every BB gets assigned predicate when it is reachable. At the moment it 
>> happens
>>    only if the all predecestors of BB are conditionals that tests function
>>    parameter.  Obviously we will need to propagate this info for sane 
>> results.
>>
>>  2) Every statement gets assigned predicate when it will become constant. 
>> Again
>>    it is very simple, only statements using only function arguments are 
>> considered.
>>    Simple propagation across SSA graph will do better.
>>
>>  3) Finally the statement is accounted at a predicate that is conjunction of 
>> both
>>    above.
>>  All call statements are accounted unconditoinally because we don't 
>> predicate edges, yet.
>>
>> While computing function sizes is fast, it is not as speedy as original
>> "time-benefit".  Small function inliner is quite insane about querying the
>> sizes/times again and again while it keeps up to date its queue. For this
>> purpose I added cache same way as we already cache function growths.  Not 
>> that
>> I would not plan to make inliner&badness more sensible here soon.
>> So far I did not want to touch the actual heuristics part of inliner and 
>> hope to do
>> it after getting the infrastructure to the point I want it to be for 4.7.
>>
>> The patch bootstraps®tests.  I tested that compile time implication on
>> tramp3d is positive (because of caching, without it inliner grows from 1.1% 
>> to
>> 4% of compile time) I also tested SPECs and there are not great changes, that
>> is not bad result given stupidity of the analysis ;).
>>
>> I will look into Mozilla even though I plan to look into solving scability
>> problems of the inliner as followup instead of snowballing this.
>>
>> I plan to work on the patch little further during weekend, in particular make
>> dumps more readable since they got bit convoluted by random formatting. But i
>> am sending the patch for comments and I plan to get it finished till next 
>> week.
>>
>> Honza
>>
>>        * gengtype.c (open_base_files): Add ipa-inline.h include.
>>        * ipa-cp.c (ipcp_get_lattice, ipcp_lattice_from_jfunc): Move to 
>> ipa-prop.c
>>        update all uses.
>>        * ipa-prop.c: (ipa_get_lattice, ipa_lattice_from_jfunc): ... here.
>>        * ipa-inline-transform.c (inline_call):

Re: [RFC] Context sensitive inline analysis

2011-05-26 Thread H.J. Lu
On Fri, Apr 22, 2011 at 5:35 AM, Jan Hubicka  wrote:
> Hi,
> this patch implements infrastructure to summarize function body size&time in a
> way that is sensitive on function context.  At the moment context means
>
>  1) if function is inline or offline
>  2) if some of parameters are known compile time constants.
>
> but we should handle more later.
>
> The analysis is implemented by introducing notion of predicates, that are
> simple logical formulas in conjunctive-disjunctive form on conditions.
> Conditions are simple tests like "function is not inlined" "op0 is not
> constant" "op0 > 6". That is one can express things like "this statement will
> execute if op1 >6 or op0 is not constant".
>
> The patch implements simple infrastructure to represent the predicates.  There
> are hard limits on everything - i.e. there are no more than 32 different
> conditions and no more than 8 clauses.  This makes it possible to test clauses
> by simple logicaloperations on integers and represent predicates at array of 8
> integers thatis very cheap.  The implementation details are quite contained so
> we might relax the limits, but I don't really see a need for that.
>
> The main point of designing this was to allow effective way of evaulating 
> those
> predicates for given context, since this happens many times during inlining,
> and to not make WPA memory usage grow too much.  At the same time I wanted the
> infrastructure to be flexible enough to allow adding more tricks in future.
> Like we might consider extra inlining points if callee uses the argument to
> drive number of iterations of loop or when caller pass a pointer to a static
> variable that might be SRAed after inlining etc. etc.
>
> At the moment only consumer of predicates is size/time vector that is vector
> of simple entries consiting of size, time and predicate.  Function size/time 
> is then
> computed as sum of all entries whose predicate might be true in given context 
> +
> size/time of all call edges (this is because call edges can disappear at 
> different
> conditions or be turned into constant).
>
> I plan to add more uses of predicates in near future - i.e. attaching
> predicates to edges so we know what calls will be optimized out at WPA time.
> Also I plan to use the analysis to drive function clonning (i.e. partial
> specialization): when function is called from several places with the same
> context and the context makes a difference at expected runtime, clone the
> function.
>
> The analysis part deciding on predicates is currently very simple, kind of 
> proof
> of concept:
>
>  1) Every BB gets assigned predicate when it is reachable. At the moment it 
> happens
>    only if the all predecestors of BB are conditionals that tests function
>    parameter.  Obviously we will need to propagate this info for sane results.
>
>  2) Every statement gets assigned predicate when it will become constant. 
> Again
>    it is very simple, only statements using only function arguments are 
> considered.
>    Simple propagation across SSA graph will do better.
>
>  3) Finally the statement is accounted at a predicate that is conjunction of 
> both
>    above.
>  All call statements are accounted unconditoinally because we don't predicate 
> edges, yet.
>
> While computing function sizes is fast, it is not as speedy as original
> "time-benefit".  Small function inliner is quite insane about querying the
> sizes/times again and again while it keeps up to date its queue. For this
> purpose I added cache same way as we already cache function growths.  Not that
> I would not plan to make inliner&badness more sensible here soon.
> So far I did not want to touch the actual heuristics part of inliner and hope 
> to do
> it after getting the infrastructure to the point I want it to be for 4.7.
>
> The patch bootstraps®tests.  I tested that compile time implication on
> tramp3d is positive (because of caching, without it inliner grows from 1.1% to
> 4% of compile time) I also tested SPECs and there are not great changes, that
> is not bad result given stupidity of the analysis ;).
>
> I will look into Mozilla even though I plan to look into solving scability
> problems of the inliner as followup instead of snowballing this.
>
> I plan to work on the patch little further during weekend, in particular make
> dumps more readable since they got bit convoluted by random formatting. But i
> am sending the patch for comments and I plan to get it finished till next 
> week.
>
> Honza
>
>        * gengtype.c (open_base_files): Add ipa-inline.h include.
>        * ipa-cp.c (ipcp_get_lattice, ipcp_lattice_from_jfunc): Move to 
> ipa-prop.c
>        update all uses.
>        * ipa-prop.c: (ipa_get_lattice, ipa_lattice_from_jfunc): ... here.
>        * ipa-inline-transform.c (inline_call): Use inline_merge_summary to 
> merge
>        summary of inlined function into former caller.
>        * ipa-inline.c (max_benefit): Remove.
>        (edge_badness): Compensate for rem

Re: [RFC] Context sensitive inline analysis

2011-04-30 Thread David Edelsohn
Honza,

This patch appears to fix the failure on AIX: my build progressed past
libstdc++.

Thanks, David

2011/4/30 Jan Hubicka :
>> On Thu, Apr 28, 2011 at 9:27 AM, Jan Hubicka  wrote:
>> >> Honza,
>> >>
>> >> I continue to receive an ICE:
>> >>
>> >> /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
>> >> /tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
>> >> internal compiler error: vector VEC(tree,base) index domain error, in
>> >> evaluate_conditions_for_edge at ipa-inline-analysis.c:537
>> >>
>> >> I was able to bootstrap with GCC just prior to your patch on Friday.
>> > Hi,
>> > can I have a preprocessed testcase? The one attached to HP PR don't seem
>> > to reproduce for me.  Perhaps we just need a bounds check here though I 
>> > think
>> > we should catch all "evil" edges with can_inline_edge_p and never try to 
>> > propagate
>> > across those.
>>
>> The failure currently occurs when building stdc++.h.gch with -O2.
>
> Duh, this sounds scary.
>
> I am attaching fix for the HP failure. Hopefully it will fix yours, too.
> Reproducing gch ICE in cross would be more fun (and of course, GCH should not
> make difference, so it seems that we have some latent problem here too)
>>
>> Apparently this does not reproduce on PPC Linux using the original TOC
>> model (cmodel=small).  Note that GCC on AIX still defaults to 32 bit
>> application and GCC on PPC Linux is 64 bit, so that might contribute
>> to the difference.  Or the different process data layout of Linux vs
>> AIX avoiding failure from memory corruption.
>
> The problem on HP is weird iteraction of ipa-cp, early inliner and constructor
> merging pass.  It needs !have_ctors/dtors target to reproduce and you really
> need to be lucky to get this happen. So I hope it is yours problem, too.  At
> least yours testcase looks almost identical to HP and works for me now, too.
>
> Martin, this is an example why we probably shoudl update jump functions to
> represent the program after ipa-cp transform.  In this case we simply 
> construct
> new direct call into the clone and that one gets misanalyzed.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
>        PR middle-end/48752
>        * ipa-inline.c (early_inliner): Disable when doing late
>        addition of function.
> Index: ipa-inline.c
> ===
> *** ipa-inline.c        (revision 173189)
> --- ipa-inline.c        (working copy)
> *** early_inliner (void)
> *** 1663,1668 
> --- 1663,1676 
>    if (seen_error ())
>      return 0;
>
> +   /* Do nothing if datastructures for ipa-inliner are already computed.  
> This happens when
> +      some pass decides to construct new function and 
> cgraph_add_new_function calls lowering
> +      passes and early optimization on it.  This may confuse ourself when 
> early inliner decide
> +      to inline call to function clone, because function clones don't have 
> parameter list
> +      in ipa-prop matching their signature.  */
> +   if (ipa_node_params_vector)
> +     return 0;
> +
>  #ifdef ENABLE_CHECKING
>    verify_cgraph_node (node);
>  #endif
>


Re: [RFC] Context sensitive inline analysis

2011-04-30 Thread Jan Hubicka
> On Thu, Apr 28, 2011 at 9:27 AM, Jan Hubicka  wrote:
> >> Honza,
> >>
> >> I continue to receive an ICE:
> >>
> >> /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
> >> /tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
> >> internal compiler error: vector VEC(tree,base) index domain error, in
> >> evaluate_conditions_for_edge at ipa-inline-analysis.c:537
> >>
> >> I was able to bootstrap with GCC just prior to your patch on Friday.
> > Hi,
> > can I have a preprocessed testcase? The one attached to HP PR don't seem
> > to reproduce for me.  Perhaps we just need a bounds check here though I 
> > think
> > we should catch all "evil" edges with can_inline_edge_p and never try to 
> > propagate
> > across those.
> 
> The failure currently occurs when building stdc++.h.gch with -O2.

Duh, this sounds scary. 

I am attaching fix for the HP failure. Hopefully it will fix yours, too.
Reproducing gch ICE in cross would be more fun (and of course, GCH should not
make difference, so it seems that we have some latent problem here too)
> 
> Apparently this does not reproduce on PPC Linux using the original TOC
> model (cmodel=small).  Note that GCC on AIX still defaults to 32 bit
> application and GCC on PPC Linux is 64 bit, so that might contribute
> to the difference.  Or the different process data layout of Linux vs
> AIX avoiding failure from memory corruption.

The problem on HP is weird iteraction of ipa-cp, early inliner and constructor
merging pass.  It needs !have_ctors/dtors target to reproduce and you really
need to be lucky to get this happen. So I hope it is yours problem, too.  At
least yours testcase looks almost identical to HP and works for me now, too.

Martin, this is an example why we probably shoudl update jump functions to
represent the program after ipa-cp transform.  In this case we simply construct
new direct call into the clone and that one gets misanalyzed.

Bootstrapped/regtested x86_64-linux, comitted.

PR middle-end/48752 
* ipa-inline.c (early_inliner): Disable when doing late
addition of function.
Index: ipa-inline.c
===
*** ipa-inline.c(revision 173189)
--- ipa-inline.c(working copy)
*** early_inliner (void)
*** 1663,1668 
--- 1663,1676 
if (seen_error ())
  return 0;
  
+   /* Do nothing if datastructures for ipa-inliner are already computed.  This 
happens when
+  some pass decides to construct new function and cgraph_add_new_function 
calls lowering
+  passes and early optimization on it.  This may confuse ourself when 
early inliner decide
+  to inline call to function clone, because function clones don't have 
parameter list
+  in ipa-prop matching their signature.  */
+   if (ipa_node_params_vector)
+ return 0;
+ 
  #ifdef ENABLE_CHECKING
verify_cgraph_node (node);
  #endif


Re: [RFC] Context sensitive inline analysis

2011-04-28 Thread Jan Hubicka
> Honza,
> 
> I continue to receive an ICE:
> 
> /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
> /tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
> internal compiler error: vector VEC(tree,base) index domain error, in
> evaluate_conditions_for_edge at ipa-inline-analysis.c:537
> 
> I was able to bootstrap with GCC just prior to your patch on Friday.
Hi,
can I have a preprocessed testcase? The one attached to HP PR don't seem
to reproduce for me.  Perhaps we just need a bounds check here though I think
we should catch all "evil" edges with can_inline_edge_p and never try to 
propagate
across those.

Honza


Re: [RFC] Context sensitive inline analysis

2011-04-28 Thread David Edelsohn
Honza,

I continue to receive an ICE:

/farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
/tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
internal compiler error: vector VEC(tree,base) index domain error, in
evaluate_conditions_for_edge at ipa-inline-analysis.c:537

I was able to bootstrap with GCC just prior to your patch on Friday.

- David

On Wed, Apr 27, 2011 at 10:44 AM, Jan Hubicka  wrote:
>>
>> This may have caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48791
>
> Oops, yes, it is mine.  The insertion hook at expansion time is incorrectly 
> called
> after function is expanded, not before.
> ipa-prop should deregister itself earlier, but that can be done incrementally.
> I am testing the following and will commit if testing succeeds.
>
> Index: cgraphunit.c
> ===
> --- cgraphunit.c        (revision 173025)
> +++ cgraphunit.c        (working copy)
> @@ -233,6 +233,7 @@ cgraph_process_new_functions (void)
>          cgraph_finalize_function (fndecl, false);
>          cgraph_mark_reachable_node (node);
>          output = true;
> +          cgraph_call_function_insertion_hooks (node);
>          break;
>
>        case CGRAPH_STATE_IPA:
> @@ -258,12 +259,14 @@ cgraph_process_new_functions (void)
>          free_dominance_info (CDI_DOMINATORS);
>          pop_cfun ();
>          current_function_decl = NULL;
> +          cgraph_call_function_insertion_hooks (node);
>          break;
>
>        case CGRAPH_STATE_EXPANSION:
>          /* Functions created during expansion shall be compiled
>             directly.  */
>          node->process = 0;
> +          cgraph_call_function_insertion_hooks (node);
>          cgraph_expand_function (node);
>          break;
>
> @@ -271,7 +274,6 @@ cgraph_process_new_functions (void)
>          gcc_unreachable ();
>          break;
>        }
> -      cgraph_call_function_insertion_hooks (node);
>       varpool_analyze_pending_decls ();
>     }
>   return output;
>


Re: [RFC] Context sensitive inline analysis

2011-04-27 Thread Jan Hubicka
> 
> This may have caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48791

Oops, yes, it is mine.  The insertion hook at expansion time is incorrectly 
called 
after function is expanded, not before.  
ipa-prop should deregister itself earlier, but that can be done incrementally.
I am testing the following and will commit if testing succeeds.

Index: cgraphunit.c
===
--- cgraphunit.c(revision 173025)
+++ cgraphunit.c(working copy)
@@ -233,6 +233,7 @@ cgraph_process_new_functions (void)
  cgraph_finalize_function (fndecl, false);
  cgraph_mark_reachable_node (node);
  output = true;
+  cgraph_call_function_insertion_hooks (node);
  break;
 
case CGRAPH_STATE_IPA:
@@ -258,12 +259,14 @@ cgraph_process_new_functions (void)
  free_dominance_info (CDI_DOMINATORS);
  pop_cfun ();
  current_function_decl = NULL;
+  cgraph_call_function_insertion_hooks (node);
  break;
 
case CGRAPH_STATE_EXPANSION:
  /* Functions created during expansion shall be compiled
 directly.  */
  node->process = 0;
+  cgraph_call_function_insertion_hooks (node);
  cgraph_expand_function (node);
  break;
 
@@ -271,7 +274,6 @@ cgraph_process_new_functions (void)
  gcc_unreachable ();
  break;
}
-  cgraph_call_function_insertion_hooks (node);
   varpool_analyze_pending_decls ();
 }
   return output;


Re: [RFC] Context sensitive inline analysis

2011-04-27 Thread H.J. Lu
On Wed, Apr 27, 2011 at 5:16 AM, Jan Hubicka  wrote:
> Hi,
> I don't really have testcase for the HP nor AIX ICE, however I can reproduce 
> same ICE when I hack x86 to
> not use ctors/dtors.  This patch fixes it - the problem is that ipa-prop 
> ignore newly added functions
> (the global ctor built) while ipa-inline not and ipa-inline does use ipa-prop 
> for its analysis.
> Fixed by adding the corresponding hook to ipa-prop, regstested&bootstrapped 
> x86_64-linux with the
> hack and comitted.  Let me know if it fixes your problem or not.
>
> Honza
>
>        * ipa-prop.c (function_insertion_hook_holder): New holder.
>        (ipa_add_new_function): New function.
>        (ipa_register_cgraph_hooks, ipa_unregister_cgraph_hooks): 
> Register/deregister
>        holder.

This may have caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48791


-- 
H.J.


Re: [RFC] Context sensitive inline analysis

2011-04-27 Thread Jan Hubicka
Hi,
I don't really have testcase for the HP nor AIX ICE, however I can reproduce 
same ICE when I hack x86 to
not use ctors/dtors.  This patch fixes it - the problem is that ipa-prop ignore 
newly added functions
(the global ctor built) while ipa-inline not and ipa-inline does use ipa-prop 
for its analysis.
Fixed by adding the corresponding hook to ipa-prop, regstested&bootstrapped 
x86_64-linux with the
hack and comitted.  Let me know if it fixes your problem or not.

Honza

* ipa-prop.c (function_insertion_hook_holder): New holder.
(ipa_add_new_function): New function.
(ipa_register_cgraph_hooks, ipa_unregister_cgraph_hooks): 
Register/deregister
holder.
Index: ipa-prop.c
===
--- ipa-prop.c  (revision 172989)
+++ ipa-prop.c  (working copy)
@@ -63,6 +63,7 @@ static struct cgraph_edge_hook_list *edg
 static struct cgraph_node_hook_list *node_removal_hook_holder;
 static struct cgraph_2edge_hook_list *edge_duplication_hook_holder;
 static struct cgraph_2node_hook_list *node_duplication_hook_holder;
+static struct cgraph_node_hook_list *function_insertion_hook_holder;
 
 /* Add cgraph NODE described by INFO to the worklist WL regardless of whether
it is in one or not.  It should almost never be used directly, as opposed to
@@ -2058,6 +2059,15 @@ ipa_node_duplication_hook (struct cgraph
   new_info->node_enqueued = old_info->node_enqueued;
 }
 
+
+/* Analyze newly added function into callgraph.  */
+
+static void
+ipa_add_new_function (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
+{
+  ipa_analyze_node (node);
+}
+
 /* Register our cgraph hooks if they are not already there.  */
 
 void
@@ -2075,6 +2085,8 @@ ipa_register_cgraph_hooks (void)
   if (!node_duplication_hook_holder)
 node_duplication_hook_holder =
   cgraph_add_node_duplication_hook (&ipa_node_duplication_hook, NULL);
+  function_insertion_hook_holder =
+  cgraph_add_function_insertion_hook (&ipa_add_new_function, NULL);
 }
 
 /* Unregister our cgraph hooks if they are not already there.  */
@@ -2090,6 +2102,8 @@ ipa_unregister_cgraph_hooks (void)
   edge_duplication_hook_holder = NULL;
   cgraph_remove_node_duplication_hook (node_duplication_hook_holder);
   node_duplication_hook_holder = NULL;
+  cgraph_remove_function_insertion_hook (function_insertion_hook_holder);
+  function_insertion_hook_holder = NULL;
 }
 
 /* Allocate all necessary data structures necessary for indirect inlining.  */


Re: [RFC] Context sensitive inline analysis

2011-04-26 Thread Jan Hubicka
> Honza,
> 
> This patch causes a bootstrap failure when building libstdc++ on AIX:
> 
> In file included from
> /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
> /tmp/20110423/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
> internal compiler error: vector VEC(tree,base) index domain error, in
> evaulate_conditions_for_edge at ipa-inline-analysis.c:466

Hi,
similar error was reported for HP, too.  I will look into it now.  I hoped it
is same as the Toon's problem (that hack I removed caused quite bad propagation
across unitialized datastructured)

Yesterday I analyzed last problem I reproduced Mozilla and those are due to the
fact that we don't do type compatibility checking when doing indirect inlining
and in LTO type merging. So different than this one.
> 
> I do not know if this is related to the WPA failure reported by Toon.
> 
> Also, I think you mean "evaluate" not "evaulate" in the description
> and new function names.
Duh, will fix that!
Honza
> 
> Thanks, David


Re: [RFC] Context sensitive inline analysis

2011-04-25 Thread David Edelsohn
Honza,

This patch causes a bootstrap failure when building libstdc++ on AIX:

In file included from
/farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
/tmp/20110423/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
internal compiler error: vector VEC(tree,base) index domain error, in
evaulate_conditions_for_edge at ipa-inline-analysis.c:466

I do not know if this is related to the WPA failure reported by Toon.

Also, I think you mean "evaluate" not "evaulate" in the description
and new function names.

Thanks, David


Re: [RFC] Context sensitive inline analysis

2011-04-23 Thread Jan Hubicka
> > The problem is that cgraph_node->uid will be sparse after merging.  I 
> > wonder if we want
> > to solve this by adding new uids to the analyzed nodes that will be denser? 
> > Most of summaries
> > are actually attached to the analyzed nodes only.
> 
> Can't we re-number the UIDs after merging?

Well, at the moment we read unmerged summaries and merge them later, so it
would man shuffling the per-pass data (that is not big deal to do) but also
won't save peak memory use.  The analyzed function uids are by nature always
denser and would have additional advantage of staying more or less dense after
merging (i.e. only comdat funcitons will cause holes). Those will be easilly
used by the inline clones.

Well, I guess I could deffer this for later - once I get class hiarchy on the
callgrpah/varpool, I might pretty much want to have analyzed nodes to inherit
from unanalyzed so we don't have useless data on unanalyzed nodes in general.

Honza
> 
> Richard.
> 
> > Sadly libxul won't build again for apparently problem:
> > [Leaving LTRANS /abuild/jh/tmp//ccIgav2O.args]
> > [Leaving LTRANS libxul.so.ltrans.out]
> > g++: warning: -pipe ignored because -save-temps specified
> > Reading command line options: libxul.so.ltrans0.olto1: error: ELF section 
> > name out of range
> >
> > It seems that for some irrational reason we now decide to stream everything 
> > into
> > single partition that is bad idea, but still our ELF infrastructure should 
> > not
> > give up.
> > the .o file seems wrong:
> > jh@evans:/abuild/jh/build-mozilla-new11-lto-noelfhackO3/toolkit/library> 
> > objdump -h libxul.so.ltrans0.o
> > BFD: libxul.so.ltrans0.o: invalid string offset 4088662 >= 348 for section 
> > `.shstrtab'
> > BFD: libxul.so.ltrans0.o: invalid string offset 407 >= 348 for section 
> > `(null)'
> > objdump: libxul.so.ltrans0.o: File format not recognized
> >
> >
> > Honza
> >


Re: [RFC] Context sensitive inline analysis

2011-04-23 Thread Richard Guenther
On Sat, Apr 23, 2011 at 1:00 AM, Jan Hubicka  wrote:
> Hi,
> the patch also solves inliner compile time problems for mozilla:
>  garbage collection    :  15.88 ( 4%) usr   0.00 ( 0%) sys  15.89 ( 4%) wall  
>      0 kB ( 0%) ggc
>  callgraph optimization:   3.10 ( 1%) usr   0.00 ( 0%) sys   3.09 ( 1%) wall  
>  15604 kB ( 1%) ggc
>  varpool construction  :   0.69 ( 0%) usr   0.01 ( 0%) sys   0.69 ( 0%) wall  
>  51621 kB ( 3%) ggc
>  ipa cp                :   1.99 ( 1%) usr   0.08 ( 1%) sys   2.06 ( 1%) wall  
> 123497 kB ( 8%) ggc
>  ipa lto gimple in     :   0.04 ( 0%) usr   0.02 ( 0%) sys   0.07 ( 0%) wall  
>      0 kB ( 0%) ggc
>  ipa lto gimple out    :  11.70 ( 3%) usr   0.58 ( 8%) sys  12.29 ( 3%) wall  
>      0 kB ( 0%) ggc
>  ipa lto decl in       : 318.89 (81%) usr   3.73 (53%) sys 323.19 (80%) wall  
> 722318 kB (47%) ggc
>  ipa lto decl out      :  10.45 ( 3%) usr   0.23 ( 3%) sys  10.67 ( 3%) wall  
>      0 kB ( 0%) ggc
>  ipa lto decl init I/O :   0.13 ( 0%) usr   0.04 ( 1%) sys   0.16 ( 0%) wall  
>     31 kB ( 0%) ggc
>  ipa lto cgraph I/O    :   1.88 ( 0%) usr   0.26 ( 4%) sys   2.14 ( 1%) wall  
> 433578 kB (28%) ggc
>  ipa lto decl merge    :  20.51 ( 5%) usr   0.14 ( 2%) sys  20.65 ( 5%) wall  
>    962 kB ( 0%) ggc
>  ipa lto cgraph merge  :   2.43 ( 1%) usr   0.00 ( 0%) sys   2.43 ( 1%) wall  
>  14538 kB ( 1%) ggc
>  whopr wpa             :   0.59 ( 0%) usr   0.02 ( 0%) sys   0.62 ( 0%) wall  
>      1 kB ( 0%) ggc
>  whopr wpa I/O         :   0.61 ( 0%) usr   1.75 (25%) sys   2.38 ( 1%) wall  
>      0 kB ( 0%) ggc
>  ipa reference         :   1.02 ( 0%) usr   0.00 ( 0%) sys   1.02 ( 0%) wall  
>      0 kB ( 0%) ggc
>  ipa profile           :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall  
>      0 kB ( 0%) ggc
>  ipa pure const        :   0.85 ( 0%) usr   0.02 ( 0%) sys   0.89 ( 0%) wall  
>      0 kB ( 0%) ggc
>  parser                :   0.66 ( 0%) usr   0.00 ( 0%) sys   0.66 ( 0%) wall  
>  10372 kB ( 1%) ggc
>  inline heuristics     :   1.22 ( 0%) usr   0.07 ( 1%) sys   1.28 ( 0%) wall  
> 159368 kB (10%) ggc
>  callgraph verifier    :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.12 ( 0%) wall  
>      0 kB ( 0%) ggc
>  varconst              :   0.02 ( 0%) usr   0.03 ( 0%) sys   0.03 ( 0%) wall  
>      0 kB ( 0%) ggc
>  unaccounted todo      :   0.74 ( 0%) usr   0.00 ( 0%) sys   0.76 ( 0%) wall  
>      0 kB ( 0%) ggc
>  TOTAL                 : 394.08             7.10           401.76            
> 1533113 kB
>
> one second for inlining seems acceptable.  There is however growth from 20MB 
> to
> 159MB of inliner GGC usage.  It is because of moving inline_summary vector 
> into
> GGC memory.  ipa-cp summaries seems to have similar footprint as seen above.
>
> The problem is that cgraph_node->uid will be sparse after merging.  I wonder 
> if we want
> to solve this by adding new uids to the analyzed nodes that will be denser? 
> Most of summaries
> are actually attached to the analyzed nodes only.

Can't we re-number the UIDs after merging?

Richard.

> Sadly libxul won't build again for apparently problem:
> [Leaving LTRANS /abuild/jh/tmp//ccIgav2O.args]
> [Leaving LTRANS libxul.so.ltrans.out]
> g++: warning: -pipe ignored because -save-temps specified
> Reading command line options: libxul.so.ltrans0.olto1: error: ELF section 
> name out of range
>
> It seems that for some irrational reason we now decide to stream everything 
> into
> single partition that is bad idea, but still our ELF infrastructure should not
> give up.
> the .o file seems wrong:
> jh@evans:/abuild/jh/build-mozilla-new11-lto-noelfhackO3/toolkit/library> 
> objdump -h libxul.so.ltrans0.o
> BFD: libxul.so.ltrans0.o: invalid string offset 4088662 >= 348 for section 
> `.shstrtab'
> BFD: libxul.so.ltrans0.o: invalid string offset 407 >= 348 for section 
> `(null)'
> objdump: libxul.so.ltrans0.o: File format not recognized
>
>
> Honza
>


Re: [RFC] Context sensitive inline analysis

2011-04-22 Thread Jan Hubicka
Hi,
the patch also solves inliner compile time problems for mozilla:
 garbage collection:  15.88 ( 4%) usr   0.00 ( 0%) sys  15.89 ( 4%) wall
   0 kB ( 0%) ggc
 callgraph optimization:   3.10 ( 1%) usr   0.00 ( 0%) sys   3.09 ( 1%) wall   
15604 kB ( 1%) ggc
 varpool construction  :   0.69 ( 0%) usr   0.01 ( 0%) sys   0.69 ( 0%) wall   
51621 kB ( 3%) ggc
 ipa cp:   1.99 ( 1%) usr   0.08 ( 1%) sys   2.06 ( 1%) wall  
123497 kB ( 8%) ggc
 ipa lto gimple in :   0.04 ( 0%) usr   0.02 ( 0%) sys   0.07 ( 0%) wall
   0 kB ( 0%) ggc
 ipa lto gimple out:  11.70 ( 3%) usr   0.58 ( 8%) sys  12.29 ( 3%) wall
   0 kB ( 0%) ggc
 ipa lto decl in   : 318.89 (81%) usr   3.73 (53%) sys 323.19 (80%) wall  
722318 kB (47%) ggc
 ipa lto decl out  :  10.45 ( 3%) usr   0.23 ( 3%) sys  10.67 ( 3%) wall
   0 kB ( 0%) ggc
 ipa lto decl init I/O :   0.13 ( 0%) usr   0.04 ( 1%) sys   0.16 ( 0%) wall
  31 kB ( 0%) ggc
 ipa lto cgraph I/O:   1.88 ( 0%) usr   0.26 ( 4%) sys   2.14 ( 1%) wall  
433578 kB (28%) ggc
 ipa lto decl merge:  20.51 ( 5%) usr   0.14 ( 2%) sys  20.65 ( 5%) wall
 962 kB ( 0%) ggc
 ipa lto cgraph merge  :   2.43 ( 1%) usr   0.00 ( 0%) sys   2.43 ( 1%) wall   
14538 kB ( 1%) ggc
 whopr wpa :   0.59 ( 0%) usr   0.02 ( 0%) sys   0.62 ( 0%) wall
   1 kB ( 0%) ggc
 whopr wpa I/O :   0.61 ( 0%) usr   1.75 (25%) sys   2.38 ( 1%) wall
   0 kB ( 0%) ggc
 ipa reference :   1.02 ( 0%) usr   0.00 ( 0%) sys   1.02 ( 0%) wall
   0 kB ( 0%) ggc
 ipa profile   :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall
   0 kB ( 0%) ggc
 ipa pure const:   0.85 ( 0%) usr   0.02 ( 0%) sys   0.89 ( 0%) wall
   0 kB ( 0%) ggc
 parser:   0.66 ( 0%) usr   0.00 ( 0%) sys   0.66 ( 0%) wall   
10372 kB ( 1%) ggc
 inline heuristics :   1.22 ( 0%) usr   0.07 ( 1%) sys   1.28 ( 0%) wall  
159368 kB (10%) ggc
 callgraph verifier:   0.11 ( 0%) usr   0.02 ( 0%) sys   0.12 ( 0%) wall
   0 kB ( 0%) ggc
 varconst  :   0.02 ( 0%) usr   0.03 ( 0%) sys   0.03 ( 0%) wall
   0 kB ( 0%) ggc
 unaccounted todo  :   0.74 ( 0%) usr   0.00 ( 0%) sys   0.76 ( 0%) wall
   0 kB ( 0%) ggc
 TOTAL : 394.08 7.10   401.76
1533113 kB

one second for inlining seems acceptable.  There is however growth from 20MB to
159MB of inliner GGC usage.  It is because of moving inline_summary vector into
GGC memory.  ipa-cp summaries seems to have similar footprint as seen above.

The problem is that cgraph_node->uid will be sparse after merging.  I wonder if 
we want
to solve this by adding new uids to the analyzed nodes that will be denser? 
Most of summaries
are actually attached to the analyzed nodes only.

Sadly libxul won't build again for apparently problem:
[Leaving LTRANS /abuild/jh/tmp//ccIgav2O.args]
[Leaving LTRANS libxul.so.ltrans.out]
g++: warning: -pipe ignored because -save-temps specified
Reading command line options: libxul.so.ltrans0.olto1: error: ELF section name 
out of range

It seems that for some irrational reason we now decide to stream everything into
single partition that is bad idea, but still our ELF infrastructure should not
give up.
the .o file seems wrong:
jh@evans:/abuild/jh/build-mozilla-new11-lto-noelfhackO3/toolkit/library> 
objdump -h libxul.so.ltrans0.o  
BFD: libxul.so.ltrans0.o: invalid string offset 4088662 >= 348 for section 
`.shstrtab'
BFD: libxul.so.ltrans0.o: invalid string offset 407 >= 348 for section `(null)'
objdump: libxul.so.ltrans0.o: File format not recognized


Honza