Re: Make ipa-ref somewhat less stupid

2014-06-17 Thread Martin Liška


On 06/16/2014 10:01 AM, Jan Hubicka wrote:

On 06/10/2014 08:34 AM, Jan Hubicka wrote:

Hi,
ipa-reference is somewhat stupid and builds its data sets for all variables 
including
addressable and public one just to prune them out after all bitmaps are 
constructed.
This used to make sense when the profile generation happened at compile time, 
but
since ipa_ref datastructure was intrdocued this is a nonsense.

Martin: It may be interesting to check if this solves the memory use issues with
chrome.  We also may be able to re-enable ipa-ref with profile-generate as
I think all the datastructures are considered to have address taken.

Hi,
there is a link to chromium stats: 
https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing

Both compilation were run with '-flto=6', where the upper graph adds 
'-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation 
process takes twice longer with profile generation. Yeah, chromium contains a 
really big code base :)

Yep, I wonder why WPA takes so much longer. Do you think you can build lto1
with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report
-fpost-ipa-mem-report -fmem-report -Q and send me the output?  It would be nice
to push Chromium under 4GB of WPA :)

There's report you requested: 
https://drive.google.com/file/d/0B0pisUJ80pO1RlRRTVBxUG5vSlE/edit?usp=sharing , 
produced by -fno-profile-generate. With enabled -fprofile-generate, WPA stage 
cannot fit to 24GB memory with enabled memory stats.

Martin



Thanks a lot!
Honza




Re: Make ipa-ref somewhat less stupid

2014-06-16 Thread Jan Hubicka
 On 06/10/2014 08:34 AM, Jan Hubicka wrote:
 Hi,
 ipa-reference is somewhat stupid and builds its data sets for all variables 
 including
 addressable and public one just to prune them out after all bitmaps are 
 constructed.
 This used to make sense when the profile generation happened at compile 
 time, but
 since ipa_ref datastructure was intrdocued this is a nonsense.
 
 Martin: It may be interesting to check if this solves the memory use issues 
 with
 chrome.  We also may be able to re-enable ipa-ref with profile-generate as
 I think all the datastructures are considered to have address taken.
 
 Hi,
there is a link to chromium stats: 
 https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing
 
 Both compilation were run with '-flto=6', where the upper graph adds 
 '-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation 
 process takes twice longer with profile generation. Yeah, chromium contains a 
 really big code base :)

Yep, I wonder why WPA takes so much longer. Do you think you can build lto1
with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report
-fpost-ipa-mem-report -fmem-report -Q and send me the output?  It would be nice
to push Chromium under 4GB of WPA :)

Thanks a lot!
Honza


Re: Make ipa-ref somewhat less stupid

2014-06-13 Thread Martin Liška

On 06/10/2014 08:34 AM, Jan Hubicka wrote:

Hi,
ipa-reference is somewhat stupid and builds its data sets for all variables 
including
addressable and public one just to prune them out after all bitmaps are 
constructed.
This used to make sense when the profile generation happened at compile time, 
but
since ipa_ref datastructure was intrdocued this is a nonsense.

Martin: It may be interesting to check if this solves the memory use issues with
chrome.  We also may be able to re-enable ipa-ref with profile-generate as
I think all the datastructures are considered to have address taken.


Hi,
   there is a link to chromium stats: 
https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing

Both compilation were run with '-flto=6', where the upper graph adds 
'-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation 
process takes twice longer with profile generation. Yeah, chromium contains a 
really big code base :)

Martin




Honza

Bootstrapped/regtested x86_64-linux.

* ipa-reference.c (is_proper_for_analysis): Exclude addressable and 
public
vars.
(intersect_static_var_sets): Remove.
(propagate): Do not prune local statics.
Index: ipa-reference.c
===
--- ipa-reference.c (revision 211364)
+++ ipa-reference.c (working copy)
@@ -243,6 +243,17 @@ is_proper_for_analysis (tree t)
if (TREE_READONLY (t))
  return false;
  
+  /* We can not track variables with address taken.  */

+  if (TREE_ADDRESSABLE (t))
+return false;
+
+  /* TODO: We could track public variables that are not addressable, but 
currently
+ frontends don't give us those.  */
+  if (TREE_PUBLIC (t))
+return false;
+
+  /* TODO: Check aliases.  */
+
/* This is a variable we care about.  Check if we have seen it
   before, and if not add it the set of variables we care about.  */
if (all_module_statics
@@ -312,26 +323,6 @@ union_static_var_sets (bitmap x, bitmap
return x == all_module_statics;
  }
  
-/* Compute X = Y, taking into account the possibility that

-   X may become the maximum set.  */
-
-static bool
-intersect_static_var_sets (bitmap x, bitmap y)
-{
-  if (x != all_module_statics)
-{
-  bitmap_and_into (x, y);
-  /* As with union_static_var_sets, reducing to the maximum
-set as early as possible is an overall win.  */
-  if (bitmap_equal_p (x, all_module_statics))
-   {
- BITMAP_FREE (x);
- x = all_module_statics;
-   }
-}
-  return x == all_module_statics;
-}
-
  /* Return a copy of SET on the bitmap obstack containing SET.
 But if SET is NULL or the maximum set, return that instead.  */
  
@@ -669,7 +660,6 @@ static unsigned int

  propagate (void)
  {
struct cgraph_node *node;
-  varpool_node *vnode;
struct cgraph_node **order =
  XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
int order_pos;
@@ -681,25 +671,6 @@ propagate (void)
ipa_discover_readonly_nonaddressable_vars ();
generate_summary ();
  
-  /* Now we know what vars are really statics; prune out those that aren't.  */

-  FOR_EACH_VARIABLE (vnode)
-if (vnode-externally_visible
-   || TREE_ADDRESSABLE (vnode-decl)
-   || TREE_READONLY (vnode-decl)
-   || !is_proper_for_analysis (vnode-decl)
-   || !vnode-definition)
-  bitmap_clear_bit (all_module_statics, DECL_UID (vnode-decl));
-
-  /* Forget info we collected just for fun on variables that turned out to be
- non-local.  */
-  FOR_EACH_DEFINED_FUNCTION (node)
-{
-  ipa_reference_local_vars_info_t node_l;
-  node_l = get_reference_vars_info (node)-local;
-  intersect_static_var_sets (node_l-statics_read, all_module_statics);
-  intersect_static_var_sets (node_l-statics_written, all_module_statics);
-}
-
/* Propagate the local information through the call graph to produce
   the global information.  All the nodes within a cycle will have
   the same info so we collapse cycles first.  Then we can do the




Make ipa-ref somewhat less stupid

2014-06-10 Thread Jan Hubicka
Hi,
ipa-reference is somewhat stupid and builds its data sets for all variables 
including
addressable and public one just to prune them out after all bitmaps are 
constructed.
This used to make sense when the profile generation happened at compile time, 
but
since ipa_ref datastructure was intrdocued this is a nonsense.

Martin: It may be interesting to check if this solves the memory use issues with
chrome.  We also may be able to re-enable ipa-ref with profile-generate as
I think all the datastructures are considered to have address taken.

Honza

Bootstrapped/regtested x86_64-linux.

* ipa-reference.c (is_proper_for_analysis): Exclude addressable and 
public
vars.
(intersect_static_var_sets): Remove.
(propagate): Do not prune local statics.
Index: ipa-reference.c
===
--- ipa-reference.c (revision 211364)
+++ ipa-reference.c (working copy)
@@ -243,6 +243,17 @@ is_proper_for_analysis (tree t)
   if (TREE_READONLY (t))
 return false;
 
+  /* We can not track variables with address taken.  */
+  if (TREE_ADDRESSABLE (t))
+return false;
+
+  /* TODO: We could track public variables that are not addressable, but 
currently
+ frontends don't give us those.  */
+  if (TREE_PUBLIC (t))
+return false;
+
+  /* TODO: Check aliases.  */
+
   /* This is a variable we care about.  Check if we have seen it
  before, and if not add it the set of variables we care about.  */
   if (all_module_statics
@@ -312,26 +323,6 @@ union_static_var_sets (bitmap x, bitmap
   return x == all_module_statics;
 }
 
-/* Compute X = Y, taking into account the possibility that
-   X may become the maximum set.  */
-
-static bool
-intersect_static_var_sets (bitmap x, bitmap y)
-{
-  if (x != all_module_statics)
-{
-  bitmap_and_into (x, y);
-  /* As with union_static_var_sets, reducing to the maximum
-set as early as possible is an overall win.  */
-  if (bitmap_equal_p (x, all_module_statics))
-   {
- BITMAP_FREE (x);
- x = all_module_statics;
-   }
-}
-  return x == all_module_statics;
-}
-
 /* Return a copy of SET on the bitmap obstack containing SET.
But if SET is NULL or the maximum set, return that instead.  */
 
@@ -669,7 +660,6 @@ static unsigned int
 propagate (void)
 {
   struct cgraph_node *node;
-  varpool_node *vnode;
   struct cgraph_node **order =
 XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
   int order_pos;
@@ -681,25 +671,6 @@ propagate (void)
   ipa_discover_readonly_nonaddressable_vars ();
   generate_summary ();
 
-  /* Now we know what vars are really statics; prune out those that aren't.  */
-  FOR_EACH_VARIABLE (vnode)
-if (vnode-externally_visible
-   || TREE_ADDRESSABLE (vnode-decl)
-   || TREE_READONLY (vnode-decl)
-   || !is_proper_for_analysis (vnode-decl)
-   || !vnode-definition)
-  bitmap_clear_bit (all_module_statics, DECL_UID (vnode-decl));
-
-  /* Forget info we collected just for fun on variables that turned out to be
- non-local.  */
-  FOR_EACH_DEFINED_FUNCTION (node)
-{
-  ipa_reference_local_vars_info_t node_l;
-  node_l = get_reference_vars_info (node)-local;
-  intersect_static_var_sets (node_l-statics_read, all_module_statics);
-  intersect_static_var_sets (node_l-statics_written, all_module_statics);
-}
-
   /* Propagate the local information through the call graph to produce
  the global information.  All the nodes within a cycle will have
  the same info so we collapse cycles first.  Then we can do the