Re: [RFC] Context sensitive inline analysis

2011-10-03 Thread Richard Sandiford
Richard Sandiford richard.sandif...@linaro.org writes:
 Jan Hubicka hubi...@ucw.cz writes:
 the problem is sign overflow in time computation. Time should be
 capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
 2. This happens to be 2^31  2^32 so we overflow here because of use
 of signed arithmetics.

 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c(revision 179266)
 +++ ipa-inline-analysis.c(working copy)
 @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
  /* Estimate runtime of function can easilly run into huge numbers with many
 nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in 
 integer.
 For anything larger we use gcov_type.  */
 -#define MAX_TIME 100
 +#define MAX_TIME 50
  
  /* Number of bits in integer, but we really want to be stable across 
 different
 hosts.  */

 Could you update the comment too?  (time * INLINE_SIZE_SCALE * 2)

OK, I did it myself.  Tested on x86_64-linux-gnu and applied as obvious.

Richard


gcc/
* ipa-inline-analysis.c (MAX_TIME): Update comment.

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   2011-10-03 09:10:21.0 +0100
+++ gcc/ipa-inline-analysis.c   2011-10-03 09:10:55.633044417 +0100
@@ -90,8 +90,8 @@ Software Foundation; either version 3, o
 #include alloc-pool.h
 
 /* Estimate runtime of function can easilly run into huge numbers with many
-   nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
-   For anything larger we use gcov_type.  */
+   nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE * 2 in an
+   integer.  For anything larger we use gcov_type.  */
 #define MAX_TIME 50
 
 /* Number of bits in integer, but we really want to be stable across different


Re: [RFC] Context sensitive inline analysis

2011-09-28 Thread Richard Sandiford
Jan Hubicka hubi...@ucw.cz writes:
 the problem is sign overflow in time computation. Time should be
 capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
 2. This happens to be 2^31  2^32 so we overflow here because of use
 of signed arithmetics.

 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c (revision 179266)
 +++ ipa-inline-analysis.c (working copy)
 @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
  /* Estimate runtime of function can easilly run into huge numbers with many
 nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
 For anything larger we use gcov_type.  */
 -#define MAX_TIME 100
 +#define MAX_TIME 50
  
  /* Number of bits in integer, but we really want to be stable across 
 different
 hosts.  */

Could you update the comment too?  (time * INLINE_SIZE_SCALE * 2)

Richard


Re: [RFC] Context sensitive inline analysis

2011-09-27 Thread Jan Hubicka
  This caused:
 
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49179
 
 
 
 This also caused:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49091
 

Hi,
the problem is sign overflow in time computation. Time should be capped by 
MAX_TIME
and we compute MAX_TIME * INLINE_SIZE_SCALE * 2. This happens to be 2^31  
2^32
so we overflow here because of use of signed arithmetics.

Hopefully the following is enough.  The floating point arithmetics would make 
things easier
since loop structure can scale times up a lot and their relative comparsions 
matters for
benefit computation.  Not sure if switching it to our software floats is the 
coolest
idea however.

Will commit it after testing on x86_64-linux

Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 179266)
+++ ipa-inline-analysis.c   (working copy)
@@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
 /* Estimate runtime of function can easilly run into huge numbers with many
nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
For anything larger we use gcov_type.  */
-#define MAX_TIME 100
+#define MAX_TIME 50
 
 /* Number of bits in integer, but we really want to be stable across different
hosts.  */


Re: [RFC] Context sensitive inline analysis

2011-05-26 Thread H.J. Lu
On Thu, May 26, 2011 at 9:53 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Apr 22, 2011 at 5:35 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this patch implements infrastructure to summarize function body sizetime in 
 a
 way that is sensitive on function context.  At the moment context means

  1) if function is inline or offline
  2) if some of parameters are known compile time constants.

 but we should handle more later.

 The analysis is implemented by introducing notion of predicates, that are
 simple logical formulas in conjunctive-disjunctive form on conditions.
 Conditions are simple tests like function is not inlined op0 is not
 constant op0  6. That is one can express things like this statement will
 execute if op1 6 or op0 is not constant.

 The patch implements simple infrastructure to represent the predicates.  
 There
 are hard limits on everything - i.e. there are no more than 32 different
 conditions and no more than 8 clauses.  This makes it possible to test 
 clauses
 by simple logicaloperations on integers and represent predicates at array of 
 8
 integers thatis very cheap.  The implementation details are quite contained 
 so
 we might relax the limits, but I don't really see a need for that.

 The main point of designing this was to allow effective way of evaulating 
 those
 predicates for given context, since this happens many times during inlining,
 and to not make WPA memory usage grow too much.  At the same time I wanted 
 the
 infrastructure to be flexible enough to allow adding more tricks in future.
 Like we might consider extra inlining points if callee uses the argument to
 drive number of iterations of loop or when caller pass a pointer to a static
 variable that might be SRAed after inlining etc. etc.

 At the moment only consumer of predicates is size/time vector that is vector
 of simple entries consiting of size, time and predicate.  Function size/time 
 is then
 computed as sum of all entries whose predicate might be true in given 
 context +
 size/time of all call edges (this is because call edges can disappear at 
 different
 conditions or be turned into constant).

 I plan to add more uses of predicates in near future - i.e. attaching
 predicates to edges so we know what calls will be optimized out at WPA time.
 Also I plan to use the analysis to drive function clonning (i.e. partial
 specialization): when function is called from several places with the same
 context and the context makes a difference at expected runtime, clone the
 function.

 The analysis part deciding on predicates is currently very simple, kind of 
 proof
 of concept:

  1) Every BB gets assigned predicate when it is reachable. At the moment it 
 happens
    only if the all predecestors of BB are conditionals that tests function
    parameter.  Obviously we will need to propagate this info for sane 
 results.

  2) Every statement gets assigned predicate when it will become constant. 
 Again
    it is very simple, only statements using only function arguments are 
 considered.
    Simple propagation across SSA graph will do better.

  3) Finally the statement is accounted at a predicate that is conjunction of 
 both
    above.
  All call statements are accounted unconditoinally because we don't 
 predicate edges, yet.

 While computing function sizes is fast, it is not as speedy as original
 time-benefit.  Small function inliner is quite insane about querying the
 sizes/times again and again while it keeps up to date its queue. For this
 purpose I added cache same way as we already cache function growths.  Not 
 that
 I would not plan to make inlinerbadness more sensible here soon.
 So far I did not want to touch the actual heuristics part of inliner and 
 hope to do
 it after getting the infrastructure to the point I want it to be for 4.7.

 The patch bootstrapsregtests.  I tested that compile time implication on
 tramp3d is positive (because of caching, without it inliner grows from 1.1% 
 to
 4% of compile time) I also tested SPECs and there are not great changes, that
 is not bad result given stupidity of the analysis ;).

 I will look into Mozilla even though I plan to look into solving scability
 problems of the inliner as followup instead of snowballing this.

 I plan to work on the patch little further during weekend, in particular make
 dumps more readable since they got bit convoluted by random formatting. But i
 am sending the patch for comments and I plan to get it finished till next 
 week.

 Honza

        * gengtype.c (open_base_files): Add ipa-inline.h include.
        * ipa-cp.c (ipcp_get_lattice, ipcp_lattice_from_jfunc): Move to 
 ipa-prop.c
        update all uses.
        * ipa-prop.c: (ipa_get_lattice, ipa_lattice_from_jfunc): ... here.
        * ipa-inline-transform.c (inline_call): Use inline_merge_summary to 
 merge
        summary of inlined function into former caller.
        * ipa-inline.c (max_benefit): Remove.
        (edge_badness): Compensate for removal of 

Re: [RFC] Context sensitive inline analysis

2011-04-30 Thread Jan Hubicka
 On Thu, Apr 28, 2011 at 9:27 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Honza,
 
  I continue to receive an ICE:
 
  /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
  /tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
  internal compiler error: vector VEC(tree,base) index domain error, in
  evaluate_conditions_for_edge at ipa-inline-analysis.c:537
 
  I was able to bootstrap with GCC just prior to your patch on Friday.
  Hi,
  can I have a preprocessed testcase? The one attached to HP PR don't seem
  to reproduce for me.  Perhaps we just need a bounds check here though I 
  think
  we should catch all evil edges with can_inline_edge_p and never try to 
  propagate
  across those.
 
 The failure currently occurs when building stdc++.h.gch with -O2.

Duh, this sounds scary. 

I am attaching fix for the HP failure. Hopefully it will fix yours, too.
Reproducing gch ICE in cross would be more fun (and of course, GCH should not
make difference, so it seems that we have some latent problem here too)
 
 Apparently this does not reproduce on PPC Linux using the original TOC
 model (cmodel=small).  Note that GCC on AIX still defaults to 32 bit
 application and GCC on PPC Linux is 64 bit, so that might contribute
 to the difference.  Or the different process data layout of Linux vs
 AIX avoiding failure from memory corruption.

The problem on HP is weird iteraction of ipa-cp, early inliner and constructor
merging pass.  It needs !have_ctors/dtors target to reproduce and you really
need to be lucky to get this happen. So I hope it is yours problem, too.  At
least yours testcase looks almost identical to HP and works for me now, too.

Martin, this is an example why we probably shoudl update jump functions to
represent the program after ipa-cp transform.  In this case we simply construct
new direct call into the clone and that one gets misanalyzed.

Bootstrapped/regtested x86_64-linux, comitted.

PR middle-end/48752 
* ipa-inline.c (early_inliner): Disable when doing late
addition of function.
Index: ipa-inline.c
===
*** ipa-inline.c(revision 173189)
--- ipa-inline.c(working copy)
*** early_inliner (void)
*** 1663,1668 
--- 1663,1676 
if (seen_error ())
  return 0;
  
+   /* Do nothing if datastructures for ipa-inliner are already computed.  This 
happens when
+  some pass decides to construct new function and cgraph_add_new_function 
calls lowering
+  passes and early optimization on it.  This may confuse ourself when 
early inliner decide
+  to inline call to function clone, because function clones don't have 
parameter list
+  in ipa-prop matching their signature.  */
+   if (ipa_node_params_vector)
+ return 0;
+ 
  #ifdef ENABLE_CHECKING
verify_cgraph_node (node);
  #endif


Re: [RFC] Context sensitive inline analysis

2011-04-30 Thread David Edelsohn
Honza,

This patch appears to fix the failure on AIX: my build progressed past
libstdc++.

Thanks, David

2011/4/30 Jan Hubicka hubi...@ucw.cz:
 On Thu, Apr 28, 2011 at 9:27 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Honza,
 
  I continue to receive an ICE:
 
  /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
  /tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
  internal compiler error: vector VEC(tree,base) index domain error, in
  evaluate_conditions_for_edge at ipa-inline-analysis.c:537
 
  I was able to bootstrap with GCC just prior to your patch on Friday.
  Hi,
  can I have a preprocessed testcase? The one attached to HP PR don't seem
  to reproduce for me.  Perhaps we just need a bounds check here though I 
  think
  we should catch all evil edges with can_inline_edge_p and never try to 
  propagate
  across those.

 The failure currently occurs when building stdc++.h.gch with -O2.

 Duh, this sounds scary.

 I am attaching fix for the HP failure. Hopefully it will fix yours, too.
 Reproducing gch ICE in cross would be more fun (and of course, GCH should not
 make difference, so it seems that we have some latent problem here too)

 Apparently this does not reproduce on PPC Linux using the original TOC
 model (cmodel=small).  Note that GCC on AIX still defaults to 32 bit
 application and GCC on PPC Linux is 64 bit, so that might contribute
 to the difference.  Or the different process data layout of Linux vs
 AIX avoiding failure from memory corruption.

 The problem on HP is weird iteraction of ipa-cp, early inliner and constructor
 merging pass.  It needs !have_ctors/dtors target to reproduce and you really
 need to be lucky to get this happen. So I hope it is yours problem, too.  At
 least yours testcase looks almost identical to HP and works for me now, too.

 Martin, this is an example why we probably shoudl update jump functions to
 represent the program after ipa-cp transform.  In this case we simply 
 construct
 new direct call into the clone and that one gets misanalyzed.

 Bootstrapped/regtested x86_64-linux, comitted.

        PR middle-end/48752
        * ipa-inline.c (early_inliner): Disable when doing late
        addition of function.
 Index: ipa-inline.c
 ===
 *** ipa-inline.c        (revision 173189)
 --- ipa-inline.c        (working copy)
 *** early_inliner (void)
 *** 1663,1668 
 --- 1663,1676 
    if (seen_error ())
      return 0;

 +   /* Do nothing if datastructures for ipa-inliner are already computed.  
 This happens when
 +      some pass decides to construct new function and 
 cgraph_add_new_function calls lowering
 +      passes and early optimization on it.  This may confuse ourself when 
 early inliner decide
 +      to inline call to function clone, because function clones don't have 
 parameter list
 +      in ipa-prop matching their signature.  */
 +   if (ipa_node_params_vector)
 +     return 0;
 +
  #ifdef ENABLE_CHECKING
    verify_cgraph_node (node);
  #endif



Re: [RFC] Context sensitive inline analysis

2011-04-28 Thread David Edelsohn
Honza,

I continue to receive an ICE:

/farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
/tmp/20110427/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
internal compiler error: vector VEC(tree,base) index domain error, in
evaluate_conditions_for_edge at ipa-inline-analysis.c:537

I was able to bootstrap with GCC just prior to your patch on Friday.

- David

On Wed, Apr 27, 2011 at 10:44 AM, Jan Hubicka hubi...@ucw.cz wrote:

 This may have caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48791

 Oops, yes, it is mine.  The insertion hook at expansion time is incorrectly 
 called
 after function is expanded, not before.
 ipa-prop should deregister itself earlier, but that can be done incrementally.
 I am testing the following and will commit if testing succeeds.

 Index: cgraphunit.c
 ===
 --- cgraphunit.c        (revision 173025)
 +++ cgraphunit.c        (working copy)
 @@ -233,6 +233,7 @@ cgraph_process_new_functions (void)
          cgraph_finalize_function (fndecl, false);
          cgraph_mark_reachable_node (node);
          output = true;
 +          cgraph_call_function_insertion_hooks (node);
          break;

        case CGRAPH_STATE_IPA:
 @@ -258,12 +259,14 @@ cgraph_process_new_functions (void)
          free_dominance_info (CDI_DOMINATORS);
          pop_cfun ();
          current_function_decl = NULL;
 +          cgraph_call_function_insertion_hooks (node);
          break;

        case CGRAPH_STATE_EXPANSION:
          /* Functions created during expansion shall be compiled
             directly.  */
          node-process = 0;
 +          cgraph_call_function_insertion_hooks (node);
          cgraph_expand_function (node);
          break;

 @@ -271,7 +274,6 @@ cgraph_process_new_functions (void)
          gcc_unreachable ();
          break;
        }
 -      cgraph_call_function_insertion_hooks (node);
       varpool_analyze_pending_decls ();
     }
   return output;



Re: [RFC] Context sensitive inline analysis

2011-04-27 Thread H.J. Lu
On Wed, Apr 27, 2011 at 5:16 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 I don't really have testcase for the HP nor AIX ICE, however I can reproduce 
 same ICE when I hack x86 to
 not use ctors/dtors.  This patch fixes it - the problem is that ipa-prop 
 ignore newly added functions
 (the global ctor built) while ipa-inline not and ipa-inline does use ipa-prop 
 for its analysis.
 Fixed by adding the corresponding hook to ipa-prop, regstestedbootstrapped 
 x86_64-linux with the
 hack and comitted.  Let me know if it fixes your problem or not.

 Honza

        * ipa-prop.c (function_insertion_hook_holder): New holder.
        (ipa_add_new_function): New function.
        (ipa_register_cgraph_hooks, ipa_unregister_cgraph_hooks): 
 Register/deregister
        holder.

This may have caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48791


-- 
H.J.


Re: [RFC] Context sensitive inline analysis

2011-04-27 Thread Jan Hubicka
 
 This may have caused:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48791

Oops, yes, it is mine.  The insertion hook at expansion time is incorrectly 
called 
after function is expanded, not before.  
ipa-prop should deregister itself earlier, but that can be done incrementally.
I am testing the following and will commit if testing succeeds.

Index: cgraphunit.c
===
--- cgraphunit.c(revision 173025)
+++ cgraphunit.c(working copy)
@@ -233,6 +233,7 @@ cgraph_process_new_functions (void)
  cgraph_finalize_function (fndecl, false);
  cgraph_mark_reachable_node (node);
  output = true;
+  cgraph_call_function_insertion_hooks (node);
  break;
 
case CGRAPH_STATE_IPA:
@@ -258,12 +259,14 @@ cgraph_process_new_functions (void)
  free_dominance_info (CDI_DOMINATORS);
  pop_cfun ();
  current_function_decl = NULL;
+  cgraph_call_function_insertion_hooks (node);
  break;
 
case CGRAPH_STATE_EXPANSION:
  /* Functions created during expansion shall be compiled
 directly.  */
  node-process = 0;
+  cgraph_call_function_insertion_hooks (node);
  cgraph_expand_function (node);
  break;
 
@@ -271,7 +274,6 @@ cgraph_process_new_functions (void)
  gcc_unreachable ();
  break;
}
-  cgraph_call_function_insertion_hooks (node);
   varpool_analyze_pending_decls ();
 }
   return output;


Re: [RFC] Context sensitive inline analysis

2011-04-26 Thread Jan Hubicka
 Honza,
 
 This patch causes a bootstrap failure when building libstdc++ on AIX:
 
 In file included from
 /farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
 /tmp/20110423/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
 internal compiler error: vector VEC(tree,base) index domain error, in
 evaulate_conditions_for_edge at ipa-inline-analysis.c:466

Hi,
similar error was reported for HP, too.  I will look into it now.  I hoped it
is same as the Toon's problem (that hack I removed caused quite bad propagation
across unitialized datastructured)

Yesterday I analyzed last problem I reproduced Mozilla and those are due to the
fact that we don't do type compatibility checking when doing indirect inlining
and in LTO type merging. So different than this one.
 
 I do not know if this is related to the WPA failure reported by Toon.
 
 Also, I think you mean evaluate not evaulate in the description
 and new function names.
Duh, will fix that!
Honza
 
 Thanks, David


Re: [RFC] Context sensitive inline analysis

2011-04-25 Thread David Edelsohn
Honza,

This patch causes a bootstrap failure when building libstdc++ on AIX:

In file included from
/farm/dje/src/src/libstdc++-v3/include/precompiled/stdc++.h:94:0:
/tmp/20110423/powerpc-ibm-aix5.3.0.0/libstdc++-v3/include/valarray:1163:1:
internal compiler error: vector VEC(tree,base) index domain error, in
evaulate_conditions_for_edge at ipa-inline-analysis.c:466

I do not know if this is related to the WPA failure reported by Toon.

Also, I think you mean evaluate not evaulate in the description
and new function names.

Thanks, David


Re: [RFC] Context sensitive inline analysis

2011-04-23 Thread Jan Hubicka
  The problem is that cgraph_node-uid will be sparse after merging.  I 
  wonder if we want
  to solve this by adding new uids to the analyzed nodes that will be denser? 
  Most of summaries
  are actually attached to the analyzed nodes only.
 
 Can't we re-number the UIDs after merging?

Well, at the moment we read unmerged summaries and merge them later, so it
would man shuffling the per-pass data (that is not big deal to do) but also
won't save peak memory use.  The analyzed function uids are by nature always
denser and would have additional advantage of staying more or less dense after
merging (i.e. only comdat funcitons will cause holes). Those will be easilly
used by the inline clones.

Well, I guess I could deffer this for later - once I get class hiarchy on the
callgrpah/varpool, I might pretty much want to have analyzed nodes to inherit
from unanalyzed so we don't have useless data on unanalyzed nodes in general.

Honza
 
 Richard.
 
  Sadly libxul won't build again for apparently problem:
  [Leaving LTRANS /abuild/jh/tmp//ccIgav2O.args]
  [Leaving LTRANS libxul.so.ltrans.out]
  g++: warning: -pipe ignored because -save-temps specified
  Reading command line options: libxul.so.ltrans0.olto1: error: ELF section 
  name out of range
 
  It seems that for some irrational reason we now decide to stream everything 
  into
  single partition that is bad idea, but still our ELF infrastructure should 
  not
  give up.
  the .o file seems wrong:
  jh@evans:/abuild/jh/build-mozilla-new11-lto-noelfhackO3/toolkit/library 
  objdump -h libxul.so.ltrans0.o
  BFD: libxul.so.ltrans0.o: invalid string offset 4088662 = 348 for section 
  `.shstrtab'
  BFD: libxul.so.ltrans0.o: invalid string offset 407 = 348 for section 
  `(null)'
  objdump: libxul.so.ltrans0.o: File format not recognized
 
 
  Honza
 


Re: [RFC] Context sensitive inline analysis

2011-04-22 Thread Jan Hubicka
Hi,
the patch also solves inliner compile time problems for mozilla:
 garbage collection:  15.88 ( 4%) usr   0.00 ( 0%) sys  15.89 ( 4%) wall
   0 kB ( 0%) ggc
 callgraph optimization:   3.10 ( 1%) usr   0.00 ( 0%) sys   3.09 ( 1%) wall   
15604 kB ( 1%) ggc
 varpool construction  :   0.69 ( 0%) usr   0.01 ( 0%) sys   0.69 ( 0%) wall   
51621 kB ( 3%) ggc
 ipa cp:   1.99 ( 1%) usr   0.08 ( 1%) sys   2.06 ( 1%) wall  
123497 kB ( 8%) ggc
 ipa lto gimple in :   0.04 ( 0%) usr   0.02 ( 0%) sys   0.07 ( 0%) wall
   0 kB ( 0%) ggc
 ipa lto gimple out:  11.70 ( 3%) usr   0.58 ( 8%) sys  12.29 ( 3%) wall
   0 kB ( 0%) ggc
 ipa lto decl in   : 318.89 (81%) usr   3.73 (53%) sys 323.19 (80%) wall  
722318 kB (47%) ggc
 ipa lto decl out  :  10.45 ( 3%) usr   0.23 ( 3%) sys  10.67 ( 3%) wall
   0 kB ( 0%) ggc
 ipa lto decl init I/O :   0.13 ( 0%) usr   0.04 ( 1%) sys   0.16 ( 0%) wall
  31 kB ( 0%) ggc
 ipa lto cgraph I/O:   1.88 ( 0%) usr   0.26 ( 4%) sys   2.14 ( 1%) wall  
433578 kB (28%) ggc
 ipa lto decl merge:  20.51 ( 5%) usr   0.14 ( 2%) sys  20.65 ( 5%) wall
 962 kB ( 0%) ggc
 ipa lto cgraph merge  :   2.43 ( 1%) usr   0.00 ( 0%) sys   2.43 ( 1%) wall   
14538 kB ( 1%) ggc
 whopr wpa :   0.59 ( 0%) usr   0.02 ( 0%) sys   0.62 ( 0%) wall
   1 kB ( 0%) ggc
 whopr wpa I/O :   0.61 ( 0%) usr   1.75 (25%) sys   2.38 ( 1%) wall
   0 kB ( 0%) ggc
 ipa reference :   1.02 ( 0%) usr   0.00 ( 0%) sys   1.02 ( 0%) wall
   0 kB ( 0%) ggc
 ipa profile   :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall
   0 kB ( 0%) ggc
 ipa pure const:   0.85 ( 0%) usr   0.02 ( 0%) sys   0.89 ( 0%) wall
   0 kB ( 0%) ggc
 parser:   0.66 ( 0%) usr   0.00 ( 0%) sys   0.66 ( 0%) wall   
10372 kB ( 1%) ggc
 inline heuristics :   1.22 ( 0%) usr   0.07 ( 1%) sys   1.28 ( 0%) wall  
159368 kB (10%) ggc
 callgraph verifier:   0.11 ( 0%) usr   0.02 ( 0%) sys   0.12 ( 0%) wall
   0 kB ( 0%) ggc
 varconst  :   0.02 ( 0%) usr   0.03 ( 0%) sys   0.03 ( 0%) wall
   0 kB ( 0%) ggc
 unaccounted todo  :   0.74 ( 0%) usr   0.00 ( 0%) sys   0.76 ( 0%) wall
   0 kB ( 0%) ggc
 TOTAL : 394.08 7.10   401.76
1533113 kB

one second for inlining seems acceptable.  There is however growth from 20MB to
159MB of inliner GGC usage.  It is because of moving inline_summary vector into
GGC memory.  ipa-cp summaries seems to have similar footprint as seen above.

The problem is that cgraph_node-uid will be sparse after merging.  I wonder if 
we want
to solve this by adding new uids to the analyzed nodes that will be denser? 
Most of summaries
are actually attached to the analyzed nodes only.

Sadly libxul won't build again for apparently problem:
[Leaving LTRANS /abuild/jh/tmp//ccIgav2O.args]
[Leaving LTRANS libxul.so.ltrans.out]
g++: warning: -pipe ignored because -save-temps specified
Reading command line options: libxul.so.ltrans0.olto1: error: ELF section name 
out of range

It seems that for some irrational reason we now decide to stream everything into
single partition that is bad idea, but still our ELF infrastructure should not
give up.
the .o file seems wrong:
jh@evans:/abuild/jh/build-mozilla-new11-lto-noelfhackO3/toolkit/library 
objdump -h libxul.so.ltrans0.o  
BFD: libxul.so.ltrans0.o: invalid string offset 4088662 = 348 for section 
`.shstrtab'
BFD: libxul.so.ltrans0.o: invalid string offset 407 = 348 for section `(null)'
objdump: libxul.so.ltrans0.o: File format not recognized


Honza