Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging

2013-06-14 Thread Richard Biener
On Fri, 14 Jun 2013, Jan Hubicka wrote:

   
   Ok, not streaming and comparing TREE_USED gets it improved to
  
  I will try to gather better data tomorrow. My mozilla build died on disk 
  space,
  but according to stats we are now at about 7GB of GGC memory after merging.
  I was playing with the following patch that implements testing whether types
  are same in my (probably naive and wrong) understanding of ODR rule in C++
 
 So i can confirm that we now need 3GB of TMP space instead of 8GB with earlier
 version of patch.  I will compare to mainline tomorrow, but I think it is
 about the same.
  phase opt and generate  :  96.39 ( 9%) usr  40.45 (45%) sys 136.91 (12%) 
 wall  271042 kB ( 7%) ggc
  phase stream in : 457.87 (43%) usr   8.38 ( 9%) sys 466.44 (40%) 
 wall 3798844 kB (93%) ggc
  phase stream out: 509.39 (48%) usr  40.82 (46%) sys 550.88 (48%) 
 wall7149 kB ( 0%) ggc
  ipa cp  :  13.62 ( 1%) usr   5.00 ( 6%) sys  18.61 ( 2%) 
 wall  425204 kB (10%) ggc
  ipa inlining heuristics :  60.52 ( 6%) usr  36.15 (40%) sys  96.71 ( 8%) 
 wall 1353370 kB (33%) ggc
  ipa lto decl in : 346.94 (33%) usr   5.49 ( 6%) sys 352.60 (31%) 
 wall7042 kB ( 0%) ggc
  ipa lto decl out: 481.19 (45%) usr  23.28 (26%) sys 504.68 (44%) 
 wall   0 kB ( 0%) ggc
  TOTAL :1063.6789.65  1154.26
 4078436 kB
 
 So we are still bound by streaming. I am running -flto-report overnight.
 
 My ODR patch finds 36377 matches and also weird looking mismatches of type:
  record_type 0x7fbd30d46dc8 sockaddr_storage BLK
 size integer_cst 0x7fbd416bc1e0 type integer_type 0x7fbd415660a8 
 bitsizetype constant 1024
 unit size integer_cst 0x7fbd416bc700 type integer_type 0x7fbd41566000 
 sizetype constant 128
 align 64 symtab 0 alias set -1 canonical type 0x7fbd30f0bc78
 fields field_decl 0x7fbd30e99ed8 ss_family
 type integer_type 0x7fbd3b98c000 sa_family_t public unsigned HI
 size integer_cst 0x7fbd41555fe0 constant 16
 unit size integer_cst 0x7fbd4156a000 constant 2
 align 16 symtab 0 alias set -1 canonical type 0x7fbd41566540 
 precision 16 min integer_cst 0x7fbd4156a020 0 max integer_cst 
 0x7fbd41555fc0 65535
 unsigned nonlocal HI file /usr/include/bits/socket.h line 189 col 0 
 size integer_cst 0x7fbd41555fe0 16 unit size integer_cst 0x7fbd4156a000 2
 align 16 offset_align 128
 offset integer_cst 0x7fbd41555d60 constant 0
 bit offset integer_cst 0x7fbd41555de0 constant 0 context 
 record_type 0x7fbd30d46dc8 sockaddr_storage
 chain field_decl 0x7fbd30e99000 __ss_align type integer_type 
 0x7fbd415667e0 long unsigned int
 unsigned nonlocal DI file /usr/include/bits/socket.h line 190 col  0
 size integer_cst 0x7fbd41555d20 constant 64
 unit size integer_cst 0x7fbd41555d40 constant 8
 align 64 offset_align 128 offset integer_cst 0x7fbd41555d60 0 
 bit offset integer_cst 0x7fbd41555d20 64 context record_type 
 0x7fbd30d46dc8 sockaddr_storage chain field_decl 0x7fbd30e99e40 
 __ss_padding context translation_unit_decl 0x7fbd30cbc2e0 D.967968
 chain type_decl 0x7fbd30d47da8 sockaddr_storage
  record_type 0x7fbd30f0bc78 sockaddr_storage BLK
 size integer_cst 0x7fbd416bc1e0 type integer_type 0x7fbd415660a8 
 bitsizetype constant 1024
 unit size integer_cst 0x7fbd416bc700 type integer_type 0x7fbd41566000 
 sizetype constant 128
 align 64 symtab 0 alias set -1 canonical type 0x7fbd30f0bc78
 fields field_decl 0x7fbd30ef9558 ss_family
 type integer_type 0x7fbd3b98c000 sa_family_t public unsigned HI
 size integer_cst 0x7fbd41555fe0 constant 16
 unit size integer_cst 0x7fbd4156a000 constant 2
 align 16 symtab 0 alias set -1 canonical type 0x7fbd41566540 
 precision 16 min integer_cst 0x7fbd4156a020 0 max integer_cst 
 0x7fbd41555fc0 65535
 unsigned HI file /usr/include/bits/socket.h line 189 col 0 size 
 integer_cst 0x7fbd41555fe0 16 unit size integer_cst 0x7fbd4156a000 2
 align 16 offset_align 128
 offset integer_cst 0x7fbd41555d60 constant 0
 bit offset integer_cst 0x7fbd41555de0 constant 0 context 
 record_type 0x7fbd30f0bc78 sockaddr_storage
 chain field_decl 0x7fbd30ef94c0 __ss_align type integer_type 
 0x7fbd415667e0 long unsigned int
 unsigned DI file /usr/include/bits/socket.h line 190 col 0
 size integer_cst 0x7fbd41555d20 constant 64
 unit size integer_cst 0x7fbd41555d40 constant 8
 align 64 offset_align 128 offset integer_cst 0x7fbd41555d60 0 
 bit offset integer_cst 0x7fbd41555d20 64 context record_type 
 0x7fbd30f0bc78 sockaddr_storage chain field_decl 0x7fbd30ef9428 
 __ss_padding context translation_unit_decl 0x7fbd30ea9f18 D.936417
 pointer_to_this pointer_type 0x7fbd30f0bd20 chain type_decl 
 0x7fbd30ea9398 D.938243
 
 that 

Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging

2013-06-14 Thread Jan Hubicka
 On Fri, 14 Jun 2013, Jan Hubicka wrote:
 

Ok, not streaming and comparing TREE_USED gets it improved to
   
   I will try to gather better data tomorrow. My mozilla build died on disk 
   space,
   but according to stats we are now at about 7GB of GGC memory after 
   merging.
   I was playing with the following patch that implements testing whether 
   types
   are same in my (probably naive and wrong) understanding of ODR rule in C++
  
  So i can confirm that we now need 3GB of TMP space instead of 8GB with 
  earlier
  version of patch.  I will compare to mainline tomorrow, but I think it is
  about the same.
   phase opt and generate  :  96.39 ( 9%) usr  40.45 (45%) sys 136.91 (12%) 
  wall  271042 kB ( 7%) ggc
   phase stream in : 457.87 (43%) usr   8.38 ( 9%) sys 466.44 (40%) 
  wall 3798844 kB (93%) ggc
   phase stream out: 509.39 (48%) usr  40.82 (46%) sys 550.88 (48%) 
  wall7149 kB ( 0%) ggc
   ipa cp  :  13.62 ( 1%) usr   5.00 ( 6%) sys  18.61 ( 2%) 
  wall  425204 kB (10%) ggc
   ipa inlining heuristics :  60.52 ( 6%) usr  36.15 (40%) sys  96.71 ( 8%) 
  wall 1353370 kB (33%) ggc
   ipa lto decl in : 346.94 (33%) usr   5.49 ( 6%) sys 352.60 (31%) 
  wall7042 kB ( 0%) ggc
   ipa lto decl out: 481.19 (45%) usr  23.28 (26%) sys 504.68 (44%) 
  wall   0 kB ( 0%) ggc
   TOTAL :1063.6789.65  1154.26   
   4078436 kB
  
  So we are still bound by streaming. I am running -flto-report overnight.
  
  My ODR patch finds 36377 matches and also weird looking mismatches of type:
   record_type 0x7fbd30d46dc8 sockaddr_storage BLK
  size integer_cst 0x7fbd416bc1e0 type integer_type 0x7fbd415660a8 
  bitsizetype constant 1024
  unit size integer_cst 0x7fbd416bc700 type integer_type 0x7fbd41566000 
  sizetype constant 128
  align 64 symtab 0 alias set -1 canonical type 0x7fbd30f0bc78
  fields field_decl 0x7fbd30e99ed8 ss_family
  type integer_type 0x7fbd3b98c000 sa_family_t public unsigned HI
  size integer_cst 0x7fbd41555fe0 constant 16
  unit size integer_cst 0x7fbd4156a000 constant 2
  align 16 symtab 0 alias set -1 canonical type 0x7fbd41566540 
  precision 16 min integer_cst 0x7fbd4156a020 0 max integer_cst 
  0x7fbd41555fc0 65535
  unsigned nonlocal HI file /usr/include/bits/socket.h line 189 col 0 
  size integer_cst 0x7fbd41555fe0 16 unit size integer_cst 0x7fbd4156a000 
  2
  align 16 offset_align 128
  offset integer_cst 0x7fbd41555d60 constant 0
  bit offset integer_cst 0x7fbd41555de0 constant 0 context 
  record_type 0x7fbd30d46dc8 sockaddr_storage
  chain field_decl 0x7fbd30e99000 __ss_align type integer_type 
  0x7fbd415667e0 long unsigned int
  unsigned nonlocal DI file /usr/include/bits/socket.h line 190 
  col 0
  size integer_cst 0x7fbd41555d20 constant 64
  unit size integer_cst 0x7fbd41555d40 constant 8
  align 64 offset_align 128 offset integer_cst 0x7fbd41555d60 0 
  bit offset integer_cst 0x7fbd41555d20 64 context record_type 
  0x7fbd30d46dc8 sockaddr_storage chain field_decl 0x7fbd30e99e40 
  __ss_padding context translation_unit_decl 0x7fbd30cbc2e0 D.967968
  chain type_decl 0x7fbd30d47da8 sockaddr_storage
   record_type 0x7fbd30f0bc78 sockaddr_storage BLK
  size integer_cst 0x7fbd416bc1e0 type integer_type 0x7fbd415660a8 
  bitsizetype constant 1024
  unit size integer_cst 0x7fbd416bc700 type integer_type 0x7fbd41566000 
  sizetype constant 128
  align 64 symtab 0 alias set -1 canonical type 0x7fbd30f0bc78
  fields field_decl 0x7fbd30ef9558 ss_family
  type integer_type 0x7fbd3b98c000 sa_family_t public unsigned HI
  size integer_cst 0x7fbd41555fe0 constant 16
  unit size integer_cst 0x7fbd4156a000 constant 2
  align 16 symtab 0 alias set -1 canonical type 0x7fbd41566540 
  precision 16 min integer_cst 0x7fbd4156a020 0 max integer_cst 
  0x7fbd41555fc0 65535
  unsigned HI file /usr/include/bits/socket.h line 189 col 0 size 
  integer_cst 0x7fbd41555fe0 16 unit size integer_cst 0x7fbd4156a000 2
  align 16 offset_align 128
  offset integer_cst 0x7fbd41555d60 constant 0
  bit offset integer_cst 0x7fbd41555de0 constant 0 context 
  record_type 0x7fbd30f0bc78 sockaddr_storage
  chain field_decl 0x7fbd30ef94c0 __ss_align type integer_type 
  0x7fbd415667e0 long unsigned int
  unsigned DI file /usr/include/bits/socket.h line 190 col 0
  size integer_cst 0x7fbd41555d20 constant 64
  unit size integer_cst 0x7fbd41555d40 constant 8
  align 64 offset_align 128 offset integer_cst 0x7fbd41555d60 0 
  bit offset integer_cst 0x7fbd41555d20 64 context record_type 
  0x7fbd30f0bc78 sockaddr_storage chain field_decl 0x7fbd30ef9428 
  __ss_padding context translation_unit_decl 0x7fbd30ea9f18 

Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging

2013-06-13 Thread Richard Biener
On Wed, 12 Jun 2013, Richard Biener wrote:

 
 The following patch re-writes LTO type merging completely with the
 goal to move as much work as possible to the compile step, away
 from WPA time.  At the same time the merging itself gets very
 conservative but also more general - it now merges arbitrary trees,
 not only types, but only if they are bit-identical and have the
 same outgoing tree pointers.
 
 Especially the latter means that we now have to merge SCCs of trees
 together and either take the whole SCC as prevailing or throw it
 away.  Moving work to the compile step means that we compute
 SCCs and their hashes there, re-organizing streaming to stream
 tree bodies as SCC blocks together with the computed hash.
 
 When we ask the streamer to output a tree T then it now has
 to DFS walk all tree pointers, collecting SCCs of not yet
 streamed trees and output them like the following:
 
  { LTO_tree_scc, N, hash, entry_len,
{ header1, header2, ... headerN },
{ bits1, refs1, bits2, refs2, ... bitsN, refsN } }
  { LTO_tree_scc, 1, hash, header, bits, refs }
  { LTO_tree_scc, M, hash, entry_len,
{ header1, header2, ... headerM },
{ bits1, refs1, bits2, refs2, ... bitsM, refsM } }
  LTO_tree_pickle_reference to T
 
 with tree references in refsN always being LTO_tree_pickle_references
 instead of starting a new tree inline.  That results in at most
 N extra LTO_tree_pickle_references for N streamed trees, together
 with the LTO_tree_scc wrapping overhead this causes a slight
 increase in LTO object size (around 10% last time I measured, which
 was before some additional optimization went in).
 
 The overhead also happens on the LTRANS file producing side
 which now has to do the DFS walk and stream the extra data.
 It doesn't do the hashing though as on the LTRANS consumer
 side no merging is performed.
 
 The patch preserves the core of the old merging code to compare
 with the new code and output some statistics.  That means that
 if you build with -flto-report[-wpa] you get an additional
 compile-time and memory overhead.
 
 For reference here are the stats when LTO bootstrapping for
 stage2 cc1:
 
 WPA statistics
 [WPA] read 2494507 SCCs of average size 2.380067
 [WPA] 5937095 tree bodies read in total
 [WPA] tree SCC table: size 524287, 286280 elements, collision ratio: 
 0.806376
 [WPA] tree SCC max chain length 11 (size 1)
 [WPA] Compared 403361 SCCs, 6226 collisions (0.015435)
 [WPA] Merged 399980 SCCs
 [WPA] Merged 2438250 tree bodies
 [WPA] Merged 192475 types
 [WPA] 195422 types prevailed
 [WPA] Old merging code merges an additional 54582 types of which 21083 are 
 in the same SCC with their prevailing variant
 
 this says that we've streamed in 5937095 tree bodies in
 2494507 SCCs (so the average SCC size is small), of those
 we were able to immediately ggc_free 399980 SCCs because they
 already existed in identical form (16% of the SCCs, 41% of the trees
 and 49% of the types).  The old merging code forced the merge
 of an additional 54582 types (but 21083 of them it merged with
 a type that is in the same SCC, that is, it changed the shape
 of the SCC and collapsed parts of it - something that is
 suspicious).
 
 The patch was LTO bootstrapped (testing currently running) on
 x86_64-unknown-linux-gnu and I've built SPEC2k6 with -Ofast -g -flto
 and did a test run of the binaries which shows that
 currently  471.omnetpp, 483.xalancbmk and 447.dealII fail
 (471.omnetpp segfaults in __cxxabiv1::__dynamic_cast) - these
 fails were introduced quite recently likely due to the improved
 FUNCTION_DECL and VAR_DECL merging and the cgraph fixup Honza did.

The following incremental patch fixes that.

Index: trunk/gcc/lto-symtab.c
===
--- trunk.orig/gcc/lto-symtab.c 2013-06-12 16:47:38.0 +0200
+++ trunk/gcc/lto-symtab.c  2013-06-12 17:00:12.664126423 +0200
@@ -96,9 +96,6 @@ lto_varpool_replace_node (struct varpool
 
   ipa_clone_referring ((symtab_node)prevailing_node, 
vnode-symbol.ref_list);
 
-  /* Be sure we can garbage collect the initializer.  */
-  if (DECL_INITIAL (vnode-symbol.decl))
-DECL_INITIAL (vnode-symbol.decl) = error_mark_node;
   /* Finally remove the replaced node.  */
   varpool_remove_node (vnode);
 }
Index: trunk/gcc/varpool.c
===
--- trunk.orig/gcc/varpool.c2013-06-12 13:13:06.0 +0200
+++ trunk/gcc/varpool.c 2013-06-12 17:01:46.088248807 +0200
@@ -77,15 +77,8 @@ varpool_remove_node (struct varpool_node
 
 /* Renove node initializer when it is no longer needed.  */
 void
-varpool_remove_initializer (struct varpool_node *node)
+varpool_remove_initializer (struct varpool_node *)
 {
-  if (DECL_INITIAL (node-symbol.decl)
-   !DECL_IN_CONSTANT_POOL (node-symbol.decl)
-  /* Keep vtables for BINFO folding.  */
-   !DECL_VIRTUAL_P (node-symbol.decl)
-  /* FIXME: http://gcc.gnu.org/PR55395 */
-   

Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging

2013-06-13 Thread Jan Hubicka
 
 Ok, not streaming and comparing TREE_USED gets it improved to

I will try to gather better data tomorrow. My mozilla build died on disk space,
but according to stats we are now at about 7GB of GGC memory after merging.
I was playing with the following patch that implements testing whether types
are same in my (probably naive and wrong) understanding of ODR rule in C++

It prints type pairs that seems same and then it verifies that they are having
same names and they are in same namespaces and records. On Javascript there are
5000 types found same by devirtualization code this way that are not having
the same MAIN VARIANT.

I gess those trees may be good starting point for you to look why they are not 
merged.

I suppose that once we have maintenable code base we can get into more
aggressive merging in special cases.  Requiring trees to be exactly same is a
good default behaviour.  We however may take advantage of extra knowledge.  FE
may tag types/decls that are subject to ODR rule and for those we can reduce
the hash to be based only on name+context and we can even output sane
diagnostic on mismatches.

Simiarly I think it would help a lot if we proactively merged !can_prevail_p
decls with matching types into those that can prevail by hashing PUBLIC decls
only by their assembler name.  Merging those should subsequently allow
collapsing the types that are otherwise kept separate just because associated
vtables are having differences in EXTERNAL and PUBLIC flags on the methods and
such.

Index: tree.c
===
--- tree.c  (revision 200064)
+++ tree.c  (working copy)
@@ -11618,6 +11711,91 @@ lhd_gcc_personality (void)
   return gcc_eh_personality_decl;
 }
 
+/* For languages with One Definition Rule, work out if
+   decls are actually the same even if the tree representation
+   differs.  This handles only decls appearing in TYPE_NAME
+   and TYPE_CONTEXT.  That is NAMESPACE_DECL, TYPE_DECL,
+   RECORD_TYPE and IDENTIFIER_NODE.  */
+
+static bool
+decls_same_for_odr (tree decl1, tree decl2)
+{
+  if (decl1 == decl2)
+return true;
+  if (!decl1 || !decl2)
+{
+  fprintf (stderr, Nesting mismatch\n);
+  debug_tree (decl1);
+  debug_tree (decl2);
+  return false;
+}
+  if (TREE_CODE (decl1) != TREE_CODE (decl2))
+{
+  fprintf (stderr, Code mismatch\n);
+  debug_tree (decl1);
+  debug_tree (decl2);
+  return false;
+}
+  if (TREE_CODE (decl1) == TRANSLATION_UNIT_DECL)
+return true;
+  if (TREE_CODE (decl1) != NAMESPACE_DECL
+   TREE_CODE (decl1) != RECORD_TYPE
+   TREE_CODE (decl1) != TYPE_DECL)
+{
+  fprintf (stderr, Decl type mismatch\n);
+  debug_tree (decl1);
+  return false;
+}
+  if (!DECL_NAME (decl1))
+{
+  fprintf (stderr, Anonymous; name mysmatch\n);
+  debug_tree (decl1);
+  return false;
+}
+  if (!decls_same_for_odr (DECL_NAME (decl1), DECL_NAME (decl2)))
+return false;
+  return decls_same_for_odr (DECL_CONTEXT (decl1),
+DECL_CONTEXT (decl2));
+}
+
+/* For languages with One Definition Rule, work out if
+   types are same even if the tree representation differs. 
+   This is non-trivial for LTO where minnor differences in
+   the type representation may have prevented type merging
+   to merge two copies of otherwise equivalent type.  */
+
+static bool
+types_same_for_odr (tree type1, tree type2)
+{
+  type1 = TYPE_MAIN_VARIANT (type1);
+  type2 = TYPE_MAIN_VARIANT (type2);
+  if (type1 == type2)
+return true;
+  if (!type1 || !type2)
+return false;
+
+  /* If types are not structuraly same, do not bother to contnue.
+ Match in the remainder of code would mean ODR violation.  */
+  if (!types_compatible_p (type1, type2))
+return false;
+
+  debug_tree (type1);
+  debug_tree (type2);
+  if (!TYPE_NAME (type1))
+{
+  fprintf (stderr, Anonymous; name mysmatch\n);
+  return false;
+}
+  if (!decls_same_for_odr (TYPE_NAME (type1), TYPE_NAME (type2)))
+return false;
+  if (!decls_same_for_odr (TYPE_CONTEXT (type1), TYPE_CONTEXT (type2)))
+return false;
+  fprintf (stderr, type match!\n);
+  gcc_assert (in_lto_p);
+
+  return true;
+}
+
 /* Try to find a base info of BINFO that would have its field decl at offset
OFFSET within the BINFO type and which is of EXPECTED_TYPE.  If it can be
found, return, otherwise return NULL_TREE.  */
@@ -11633,8 +11811,8 @@ get_binfo_at_offset (tree binfo, HOST_WI
   tree fld;
   int i;
 
-  if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (expected_type))
- return binfo;
+  if (types_same_for_odr (type, expected_type))
+return binfo;
   if (offset  0)
return NULL_TREE;
 
@@ -11663,7 +11841,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
{
  tree base_binfo, found_binfo = NULL_TREE;
  for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
-   

Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging

2013-06-13 Thread Jan Hubicka
  
  Ok, not streaming and comparing TREE_USED gets it improved to
 
 I will try to gather better data tomorrow. My mozilla build died on disk 
 space,
 but according to stats we are now at about 7GB of GGC memory after merging.
 I was playing with the following patch that implements testing whether types
 are same in my (probably naive and wrong) understanding of ODR rule in C++

So i can confirm that we now need 3GB of TMP space instead of 8GB with earlier
version of patch.  I will compare to mainline tomorrow, but I think it is
about the same.
 phase opt and generate  :  96.39 ( 9%) usr  40.45 (45%) sys 136.91 (12%) wall  
271042 kB ( 7%) ggc
 phase stream in : 457.87 (43%) usr   8.38 ( 9%) sys 466.44 (40%) wall 
3798844 kB (93%) ggc
 phase stream out: 509.39 (48%) usr  40.82 (46%) sys 550.88 (48%) wall  
  7149 kB ( 0%) ggc
 ipa cp  :  13.62 ( 1%) usr   5.00 ( 6%) sys  18.61 ( 2%) wall  
425204 kB (10%) ggc
 ipa inlining heuristics :  60.52 ( 6%) usr  36.15 (40%) sys  96.71 ( 8%) wall 
1353370 kB (33%) ggc
 ipa lto decl in : 346.94 (33%) usr   5.49 ( 6%) sys 352.60 (31%) wall  
  7042 kB ( 0%) ggc
 ipa lto decl out: 481.19 (45%) usr  23.28 (26%) sys 504.68 (44%) wall  
 0 kB ( 0%) ggc
 TOTAL :1063.6789.65  1154.26
4078436 kB

So we are still bound by streaming. I am running -flto-report overnight.

My ODR patch finds 36377 matches and also weird looking mismatches of type:
 record_type 0x7fbd30d46dc8 sockaddr_storage BLK
size integer_cst 0x7fbd416bc1e0 type integer_type 0x7fbd415660a8 
bitsizetype constant 1024
unit size integer_cst 0x7fbd416bc700 type integer_type 0x7fbd41566000 
sizetype constant 128
align 64 symtab 0 alias set -1 canonical type 0x7fbd30f0bc78
fields field_decl 0x7fbd30e99ed8 ss_family
type integer_type 0x7fbd3b98c000 sa_family_t public unsigned HI
size integer_cst 0x7fbd41555fe0 constant 16
unit size integer_cst 0x7fbd4156a000 constant 2
align 16 symtab 0 alias set -1 canonical type 0x7fbd41566540 
precision 16 min integer_cst 0x7fbd4156a020 0 max integer_cst 0x7fbd41555fc0 
65535
unsigned nonlocal HI file /usr/include/bits/socket.h line 189 col 0 
size integer_cst 0x7fbd41555fe0 16 unit size integer_cst 0x7fbd4156a000 2
align 16 offset_align 128
offset integer_cst 0x7fbd41555d60 constant 0
bit offset integer_cst 0x7fbd41555de0 constant 0 context record_type 
0x7fbd30d46dc8 sockaddr_storage
chain field_decl 0x7fbd30e99000 __ss_align type integer_type 
0x7fbd415667e0 long unsigned int
unsigned nonlocal DI file /usr/include/bits/socket.h line 190 col 0
size integer_cst 0x7fbd41555d20 constant 64
unit size integer_cst 0x7fbd41555d40 constant 8
align 64 offset_align 128 offset integer_cst 0x7fbd41555d60 0 bit 
offset integer_cst 0x7fbd41555d20 64 context record_type 0x7fbd30d46dc8 
sockaddr_storage chain field_decl 0x7fbd30e99e40 __ss_padding context 
translation_unit_decl 0x7fbd30cbc2e0 D.967968
chain type_decl 0x7fbd30d47da8 sockaddr_storage
 record_type 0x7fbd30f0bc78 sockaddr_storage BLK
size integer_cst 0x7fbd416bc1e0 type integer_type 0x7fbd415660a8 
bitsizetype constant 1024
unit size integer_cst 0x7fbd416bc700 type integer_type 0x7fbd41566000 
sizetype constant 128
align 64 symtab 0 alias set -1 canonical type 0x7fbd30f0bc78
fields field_decl 0x7fbd30ef9558 ss_family
type integer_type 0x7fbd3b98c000 sa_family_t public unsigned HI
size integer_cst 0x7fbd41555fe0 constant 16
unit size integer_cst 0x7fbd4156a000 constant 2
align 16 symtab 0 alias set -1 canonical type 0x7fbd41566540 
precision 16 min integer_cst 0x7fbd4156a020 0 max integer_cst 0x7fbd41555fc0 
65535
unsigned HI file /usr/include/bits/socket.h line 189 col 0 size 
integer_cst 0x7fbd41555fe0 16 unit size integer_cst 0x7fbd4156a000 2
align 16 offset_align 128
offset integer_cst 0x7fbd41555d60 constant 0
bit offset integer_cst 0x7fbd41555de0 constant 0 context record_type 
0x7fbd30f0bc78 sockaddr_storage
chain field_decl 0x7fbd30ef94c0 __ss_align type integer_type 
0x7fbd415667e0 long unsigned int
unsigned DI file /usr/include/bits/socket.h line 190 col 0
size integer_cst 0x7fbd41555d20 constant 64
unit size integer_cst 0x7fbd41555d40 constant 8
align 64 offset_align 128 offset integer_cst 0x7fbd41555d60 0 bit 
offset integer_cst 0x7fbd41555d20 64 context record_type 0x7fbd30f0bc78 
sockaddr_storage chain field_decl 0x7fbd30ef9428 __ss_padding context 
translation_unit_decl 0x7fbd30ea9f18 D.936417
pointer_to_this pointer_type 0x7fbd30f0bd20 chain type_decl 
0x7fbd30ea9398 D.938243

that mismatch because we run into following difference:
 type_decl 0x7fbd30d47da8 sockaddr_storage
type record_type 0x7fbd30d46dc8 

Re: [PATCH][RFC] Re-write LTO type merging again, do tree merging

2013-06-13 Thread Jan Hubicka
   
   Ok, not streaming and comparing TREE_USED gets it improved to
  
  I will try to gather better data tomorrow. My mozilla build died on disk 
  space,
  but according to stats we are now at about 7GB of GGC memory after merging.
  I was playing with the following patch that implements testing whether types
  are same in my (probably naive and wrong) understanding of ODR rule in C++
 
 So i can confirm that we now need 3GB of TMP space instead of 8GB with earlier
 version of patch.  I will compare to mainline tomorrow, but I think it is
 about the same.
  phase opt and generate  :  96.39 ( 9%) usr  40.45 (45%) sys 136.91 (12%) 
 wall  271042 kB ( 7%) ggc
  phase stream in : 457.87 (43%) usr   8.38 ( 9%) sys 466.44 (40%) 
 wall 3798844 kB (93%) ggc
  phase stream out: 509.39 (48%) usr  40.82 (46%) sys 550.88 (48%) 
 wall7149 kB ( 0%) ggc
  ipa cp  :  13.62 ( 1%) usr   5.00 ( 6%) sys  18.61 ( 2%) 
 wall  425204 kB (10%) ggc
  ipa inlining heuristics :  60.52 ( 6%) usr  36.15 (40%) sys  96.71 ( 8%) 
 wall 1353370 kB (33%) ggc
  ipa lto decl in : 346.94 (33%) usr   5.49 ( 6%) sys 352.60 (31%) 
 wall7042 kB ( 0%) ggc
  ipa lto decl out: 481.19 (45%) usr  23.28 (26%) sys 504.68 (44%) 
 wall   0 kB ( 0%) ggc
  TOTAL :1063.6789.65  1154.26
 4078436 kB
 
 So we are still bound by streaming. I am running -flto-report overnight.
[WPA] read 43363300 SCCs of average size 2.264113
[WPA] 98179403 tree bodies read in total
[WPA] tree SCC table: size 16777213, 6422251 elements, collision ratio: 0.811639
[WPA] tree SCC max chain length 88 (size 1)
[WPA] Compared 16544560 SCCs, 275298 collisions (0.016640)
[WPA] Merged 16458553 SCCs
[WPA] Merged 46453870 tree bodies
[WPA] Merged 9535385 types
[WPA] 6771259 types prevailed (21348860 associated trees)
[WPA] Old merging code merges an additional 1759918 types of which 379059 are 
in the same SCC with their prevailing variant (19696849 and 15301625 associated 
trees)
[WPA] GIMPLE canonical type table: size 131071, 77875 elements, 6771394 
searches, 1528380 collisions (ratio: 0.225711)
[WPA] GIMPLE canonical type hash table: size 16777213, 6771339 elements, 
23174504 searches, 21075518 collisions (ratio: 0.909427)

[LTRANS] read 228296 SCCs of average size 11.882460
[LTRANS] 2712718 tree bodies read in total
[LTRANS] GIMPLE canonical type table: size 16381, 7025 elements, 704670 
searches, 24040 collisions (ratio: 0.034115)
[LTRANS] GIMPLE canonical type hash table: size 1048573, 704613 elements, 
2269381 searches, 2021919 collisions (ratio: 0.890956)

We manage to get stuck in one of ltranses on LRA
 LRA hard reg assignment : 476.07 (44%) usr   0.03 ( 0%) sys 476.08 (44%) wall  
 0 kB ( 0%) ggc

2860712.1151  lto1 alloc_page(unsigned int)
3564  1.5094  lto1 record_reg_classes(int, int, 
rtx_def**, machine_mode*, char const**, rtx_def*, reg_class*)
3235  1.3700  libc-2.11.1.so   _int_malloc
3056  1.2942  lto1 ggc_set_mark(void const*)
2646  1.1206  lto1 gt_ggc_mx_lang_tree_node(void*)
2539  1.0753  lto1 bitmap_set_bit(bitmap_head_def*, int)
2333  0.9880  opreport /usr/bin/opreport
2210  0.9359  lto1 for_each_rtx_1(rtx_def*, int, int 
(*)(rtx_def**, void*), void*)
2133  0.9033  lto1 constrain_operands(int)
2128  0.9012  lto1 lookup_page_table_entry(void const*)
1586  0.6717  lto1 preprocess_constraints()

While GGC  memory is now under 7GB after type streaming and we GGC just once in 
WPA, the TOP usage still goes to about 12GB.

With the ODR patch there are 424 devirtualizations happening during WPA and 
some extra (do not have stats for)
during ltrans.

Honza