Re: Make ipa-ref somewhat less stupid
On 06/16/2014 10:01 AM, Jan Hubicka wrote: On 06/10/2014 08:34 AM, Jan Hubicka wrote: Hi, ipa-reference is somewhat stupid and builds its data sets for all variables including addressable and public one just to prune them out after all bitmaps are constructed. This used to make sense when the profile generation happened at compile time, but since ipa_ref datastructure was intrdocued this is a nonsense. Martin: It may be interesting to check if this solves the memory use issues with chrome. We also may be able to re-enable ipa-ref with profile-generate as I think all the datastructures are considered to have address taken. Hi, there is a link to chromium stats: https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing Both compilation were run with '-flto=6', where the upper graph adds '-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation process takes twice longer with profile generation. Yeah, chromium contains a really big code base :) Yep, I wonder why WPA takes so much longer. Do you think you can build lto1 with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report -fpost-ipa-mem-report -fmem-report -Q and send me the output? It would be nice to push Chromium under 4GB of WPA :) There's report you requested: https://drive.google.com/file/d/0B0pisUJ80pO1RlRRTVBxUG5vSlE/edit?usp=sharing , produced by -fno-profile-generate. With enabled -fprofile-generate, WPA stage cannot fit to 24GB memory with enabled memory stats. Martin Thanks a lot! Honza
Re: Make ipa-ref somewhat less stupid
On 06/10/2014 08:34 AM, Jan Hubicka wrote: Hi, ipa-reference is somewhat stupid and builds its data sets for all variables including addressable and public one just to prune them out after all bitmaps are constructed. This used to make sense when the profile generation happened at compile time, but since ipa_ref datastructure was intrdocued this is a nonsense. Martin: It may be interesting to check if this solves the memory use issues with chrome. We also may be able to re-enable ipa-ref with profile-generate as I think all the datastructures are considered to have address taken. Hi, there is a link to chromium stats: https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing Both compilation were run with '-flto=6', where the upper graph adds '-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation process takes twice longer with profile generation. Yeah, chromium contains a really big code base :) Yep, I wonder why WPA takes so much longer. Do you think you can build lto1 with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report -fpost-ipa-mem-report -fmem-report -Q and send me the output? It would be nice to push Chromium under 4GB of WPA :) Thanks a lot! Honza
Re: Make ipa-ref somewhat less stupid
On 06/10/2014 08:34 AM, Jan Hubicka wrote: Hi, ipa-reference is somewhat stupid and builds its data sets for all variables including addressable and public one just to prune them out after all bitmaps are constructed. This used to make sense when the profile generation happened at compile time, but since ipa_ref datastructure was intrdocued this is a nonsense. Martin: It may be interesting to check if this solves the memory use issues with chrome. We also may be able to re-enable ipa-ref with profile-generate as I think all the datastructures are considered to have address taken. Hi, there is a link to chromium stats: https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing Both compilation were run with '-flto=6', where the upper graph adds '-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation process takes twice longer with profile generation. Yeah, chromium contains a really big code base :) Martin Honza Bootstrapped/regtested x86_64-linux. * ipa-reference.c (is_proper_for_analysis): Exclude addressable and public vars. (intersect_static_var_sets): Remove. (propagate): Do not prune local statics. Index: ipa-reference.c === --- ipa-reference.c (revision 211364) +++ ipa-reference.c (working copy) @@ -243,6 +243,17 @@ is_proper_for_analysis (tree t) if (TREE_READONLY (t)) return false; + /* We can not track variables with address taken. */ + if (TREE_ADDRESSABLE (t)) +return false; + + /* TODO: We could track public variables that are not addressable, but currently + frontends don't give us those. */ + if (TREE_PUBLIC (t)) +return false; + + /* TODO: Check aliases. */ + /* This is a variable we care about. Check if we have seen it before, and if not add it the set of variables we care about. */ if (all_module_statics @@ -312,26 +323,6 @@ union_static_var_sets (bitmap x, bitmap return x == all_module_statics; } -/* Compute X = Y, taking into account the possibility that - X may become the maximum set. */ - -static bool -intersect_static_var_sets (bitmap x, bitmap y) -{ - if (x != all_module_statics) -{ - bitmap_and_into (x, y); - /* As with union_static_var_sets, reducing to the maximum -set as early as possible is an overall win. */ - if (bitmap_equal_p (x, all_module_statics)) - { - BITMAP_FREE (x); - x = all_module_statics; - } -} - return x == all_module_statics; -} - /* Return a copy of SET on the bitmap obstack containing SET. But if SET is NULL or the maximum set, return that instead. */ @@ -669,7 +660,6 @@ static unsigned int propagate (void) { struct cgraph_node *node; - varpool_node *vnode; struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes); int order_pos; @@ -681,25 +671,6 @@ propagate (void) ipa_discover_readonly_nonaddressable_vars (); generate_summary (); - /* Now we know what vars are really statics; prune out those that aren't. */ - FOR_EACH_VARIABLE (vnode) -if (vnode-externally_visible - || TREE_ADDRESSABLE (vnode-decl) - || TREE_READONLY (vnode-decl) - || !is_proper_for_analysis (vnode-decl) - || !vnode-definition) - bitmap_clear_bit (all_module_statics, DECL_UID (vnode-decl)); - - /* Forget info we collected just for fun on variables that turned out to be - non-local. */ - FOR_EACH_DEFINED_FUNCTION (node) -{ - ipa_reference_local_vars_info_t node_l; - node_l = get_reference_vars_info (node)-local; - intersect_static_var_sets (node_l-statics_read, all_module_statics); - intersect_static_var_sets (node_l-statics_written, all_module_statics); -} - /* Propagate the local information through the call graph to produce the global information. All the nodes within a cycle will have the same info so we collapse cycles first. Then we can do the
Make ipa-ref somewhat less stupid
Hi, ipa-reference is somewhat stupid and builds its data sets for all variables including addressable and public one just to prune them out after all bitmaps are constructed. This used to make sense when the profile generation happened at compile time, but since ipa_ref datastructure was intrdocued this is a nonsense. Martin: It may be interesting to check if this solves the memory use issues with chrome. We also may be able to re-enable ipa-ref with profile-generate as I think all the datastructures are considered to have address taken. Honza Bootstrapped/regtested x86_64-linux. * ipa-reference.c (is_proper_for_analysis): Exclude addressable and public vars. (intersect_static_var_sets): Remove. (propagate): Do not prune local statics. Index: ipa-reference.c === --- ipa-reference.c (revision 211364) +++ ipa-reference.c (working copy) @@ -243,6 +243,17 @@ is_proper_for_analysis (tree t) if (TREE_READONLY (t)) return false; + /* We can not track variables with address taken. */ + if (TREE_ADDRESSABLE (t)) +return false; + + /* TODO: We could track public variables that are not addressable, but currently + frontends don't give us those. */ + if (TREE_PUBLIC (t)) +return false; + + /* TODO: Check aliases. */ + /* This is a variable we care about. Check if we have seen it before, and if not add it the set of variables we care about. */ if (all_module_statics @@ -312,26 +323,6 @@ union_static_var_sets (bitmap x, bitmap return x == all_module_statics; } -/* Compute X = Y, taking into account the possibility that - X may become the maximum set. */ - -static bool -intersect_static_var_sets (bitmap x, bitmap y) -{ - if (x != all_module_statics) -{ - bitmap_and_into (x, y); - /* As with union_static_var_sets, reducing to the maximum -set as early as possible is an overall win. */ - if (bitmap_equal_p (x, all_module_statics)) - { - BITMAP_FREE (x); - x = all_module_statics; - } -} - return x == all_module_statics; -} - /* Return a copy of SET on the bitmap obstack containing SET. But if SET is NULL or the maximum set, return that instead. */ @@ -669,7 +660,6 @@ static unsigned int propagate (void) { struct cgraph_node *node; - varpool_node *vnode; struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes); int order_pos; @@ -681,25 +671,6 @@ propagate (void) ipa_discover_readonly_nonaddressable_vars (); generate_summary (); - /* Now we know what vars are really statics; prune out those that aren't. */ - FOR_EACH_VARIABLE (vnode) -if (vnode-externally_visible - || TREE_ADDRESSABLE (vnode-decl) - || TREE_READONLY (vnode-decl) - || !is_proper_for_analysis (vnode-decl) - || !vnode-definition) - bitmap_clear_bit (all_module_statics, DECL_UID (vnode-decl)); - - /* Forget info we collected just for fun on variables that turned out to be - non-local. */ - FOR_EACH_DEFINED_FUNCTION (node) -{ - ipa_reference_local_vars_info_t node_l; - node_l = get_reference_vars_info (node)-local; - intersect_static_var_sets (node_l-statics_read, all_module_statics); - intersect_static_var_sets (node_l-statics_written, all_module_statics); -} - /* Propagate the local information through the call graph to produce the global information. All the nodes within a cycle will have the same info so we collapse cycles first. Then we can do the