[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #19 from Martin Jambor jamborm at gcc dot gnu.org --- Author: jamborm Date: Fri May 23 15:52:20 2014 New Revision: 210864 URL: http://gcc.gnu.org/viewcvs?rev=210864root=gccview=rev Log: 2014-05-23 Martin Jambor mjam...@suse.cz PR tree-optimization/53787 * params.def (PARAM_IPA_MAX_AA_STEPS): New param. * ipa-prop.h (ipa_node_params): Rename uses_analysis_done to analysis_done, update all uses. * ipa-prop.c: Include domwalk.h (param_analysis_info): Removed. (param_aa_status): New type. (ipa_bb_info): Likewise. (func_body_info): Likewise. (ipa_get_bb_info): New function. (aa_overwalked): Likewise. (find_dominating_aa_status): Likewise. (parm_bb_aa_status_for_bb): Likewise. (parm_preserved_before_stmt_p): Changed to use new param AA info. (load_from_unmodified_param): Accept func_body_info as a parameter instead of parms_ainfo. (parm_ref_data_preserved_p): Changed to use new param AA info. (parm_ref_data_pass_through_p): Likewise. (ipa_load_from_parm_agg_1): Likewise. Update callers. (compute_complex_assign_jump_func): Changed to use new param AA info. (compute_complex_ancestor_jump_func): Likewise. (ipa_compute_jump_functions_for_edge): Likewise. (ipa_compute_jump_functions): Removed. (ipa_compute_jump_functions_for_bb): New function. (ipa_analyze_indirect_call_uses): Likewise, moved variable declarations down. (ipa_analyze_virtual_call_uses): Accept func_body_info instead of node and info, moved variable declarations down. (ipa_analyze_call_uses): Accept and pass on func_body_info instead of node and info. (ipa_analyze_stmt_uses): Likewise. (ipa_analyze_params_uses): Removed. (ipa_analyze_params_uses_in_bb): New function. (ipa_analyze_controlled_uses): Likewise. (free_ipa_bb_info): Likewise. (analysis_dom_walker): New class. (ipa_analyze_node): Handle node-specific forbidden analysis, initialize and free func_body_info, use dominator walker. (ipcp_modif_dom_walker): New class. (ipcp_transform_function): Create and free func_body_info, use ipcp_modif_dom_walker, moved a lot of functionality there. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/invoke.texi trunk/gcc/ipa-prop.c trunk/gcc/ipa-prop.h trunk/gcc/params.def
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #18 from Igor Zamyatin izamyatin at gmail dot com --- Martin, I checked the patch and can confirm it gives necessary speedup on the test (UMTmk_1.1) Thanks!
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #17 from Martin Jambor jamborm at gcc dot gnu.org --- Created attachment 32136 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32136action=edit Patch doing ipa-prop function body analysis in dominator order Yuri, this patch should make the requested propagation happen even in the benchmark attached to comment #14. Can you please verify it works for you? Does it speed up anything for you? Thanks.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #16 from Martin Jambor jamborm at gcc dot gnu.org 2013-01-25 18:32:39 UTC --- I do have a caller of the clone (in the WPA dump): init_.constprop.2/71 (init_.constprop.2) @0x7f10180f06f0 Type: function ... Clone of init_/41 ... Called by: driver_.constprop.1/70 (1.00 per call) Calls: memcpy/49 (1.00 per call) that is not the problem. The problem is that the pass-through jump function for npart does not have the agg_preserved flag set. Ido not yet know why that is the case, nevertheless it means the value is not propagated to init. I will have a detailed look, thanks a lot for the testcase.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #14 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-22 15:32:06 UTC --- Created attachment 29250 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29250 testcase in F90 Reproducer for IPA_CP
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment #15 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-22 15:33:33 UTC --- We checked that for the attached simple test-case IPA_CP is done but it does not work for the real bench UMTmk_1.1 it does not work. In this bench we have the following chain of stmts: UMTmk: npart = 16 call driver(Size, Geom, npart, storePsi) driver: call init(Size, Geom, npart, storePsi, next,omega,abdym,sigvol,qc, TPSIC,PSIC,PSIB,PSIFP,CUREZ) and we did not see that value 16 for npart has been propagated (if so the innermost loops with npart upper bound will be completely unrolled). If we look at call graph for init we see that it does not have callee in graph: init_.constprop.2/72 (init_.constprop.2) @0x7f0874ee3b90 Type: function Visibility: used_from_other_partition public visibility_specified visibility:hidden References: Referring: Read from file: /tmp/ccGZySlu.ltrans2.o Clone of init_.2535/55 Function flags: analyzed local finalized Called by: ... I put into attachment the whole bench for investigation.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #13 from Martin Jambor jamborm at gcc dot gnu.org 2012-11-08 14:43:41 UTC --- So, this now works as expected, the testcase is even in the testsuite. The creation of aggregate jump function is still quite rudimentary so it is possible that in more complex scenarios, the propagation might not take place (testcases welcome) and even in the propagation phase there are still a few things wanting. Nevertheless, those potential shortcomings should be subjects to separate requests/PRs/whatever. Thanks for reporting and for the testcase.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #12 from Martin Jambor jamborm at gcc dot gnu.org 2012-11-07 15:56:00 UTC --- Author: jamborm Date: Wed Nov 7 15:55:54 2012 New Revision: 193298 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=193298 Log: 2012-11-07 Martin Jambor mjam...@suse.cz PR tree-optimization/53787 * ipa-cp.c (ipcp_value_source): New field offset. (ipcp_agg_lattice): New type. (ipcp_param_lattices): Likewise, move virt_call from ipcp_lattice here. (ipcp_agg_lattice_pool): New variable. (ipa_get_parm_lattices): New function. (ipa_get_lattice): Turned into ipa_get_scalar_lat, use the above. Adjusted all callers. (print_lattice): New function. (print_all_lattices): Use the above, also print aggregate lattices. (set_agg_lats_to_bottom): New function. (set_agg_lats_contain_variable): Likewise. (set_all_contains_variable): Likewise. (initialize_node_lattices): Also handle aggregate lattices, set virt_call in ipcp_param_lattices. (add_value_source): Handle offsets. (add_value_to_lattice): Likewise. (add_scalar_value_to_lattice): New function. (propagate_vals_accross_pass_through): Use add_scalar_value_to_lattice. (propagate_vals_accross_ancestor): Likewise. (propagate_accross_jump_function): Renamed to propagate_scalar_accross_jump_function, use add_scalar_value_to_lattice. (set_check_aggs_by_ref): New function. (merge_agg_lats_step): Likewise. (set_chain_of_aglats_contains_variable): Likewise. (merge_aggregate_lattices): Likewise. (propagate_constants_accross_call): Also handle aggregate lattices. (hint_time_bonus): New function. (context_independent_aggregate_values): Likewise. (gather_context_independent_values): Also handle agggregate values. (agg_jmp_p_vec_for_t_vec): New function. (estimate_local_effects): Also handle agggregate values. (add_all_node_vals_to_toposort): Likewise. (ipcp_propagate_stage): Use struct ipcp_param_lattices. (get_clone_agg_value): New function. (cgraph_edge_brings_value_p): Also handle agggregate values. (create_specialized_node): Likewise. (find_more_values_for_callers_subset): Rename to find_more_scalar_values_for_callers_subset. Modify dump. (copy_plats_to_inter): New function. (intersect_with_plats): Likewise. (agg_replacements_to_vector): Likewise. (intersect_with_agg_replacements): Likewise. (find_aggregate_values_for_callers_subset): Likewise. (known_aggs_to_agg_replacement_list): Likewise. (cgraph_edge_brings_all_scalars_for_node): Likewise. (cgraph_edge_brings_all_agg_vals_for_node): Likewise. (perhaps_add_new_callers): Old functionality moved to cgraph_edge_brings_all_scalars_for_node, call it and cgraph_edge_brings_all_agg_vals_for_node. (ipcp_val_in_agg_replacements_p): New function. (decide_about_value): New function. (decide_whether_version_node): A lot of functionality moved to decide_about_value. Also handle agggregate values. (ipcp_driver): Also allocate ipcp_agg_lattice_pool. (pass_ipa_cp): Fill in new entries. * ipa-prop.c (ipa_node_agg_replacements): New variable. (free_parms_ainfo): New function. (ipa_analyze_node): Use free_parms_ainfo to free stuff. (ipa_find_agg_cst_for_param): Do not rely on offset ordering. (ipa_set_node_agg_value_chain): New function. (ipa_node_removal_hook): Also handle ipa_node_agg_replacements. (ipa_node_duplication_hook): Likewise. (ipa_free_all_structures_after_ipa_cp): Also free ipcp_agg_lattice_pool. (ipa_free_all_structures_after_iinln): Likewise. (ipa_dump_agg_replacement_values): New function. (write_agg_replacement_chain): Likewise. (read_agg_replacement_chain): Likewise. (ipa_prop_write_all_agg_replacement): Likewise. (read_replacements_section): Likewise. (ipa_prop_read_all_agg_replacement): Likewise. (adjust_agg_replacement_values): Likewise. (ipcp_transform_function): Likewise. * ipa-prop.h: Also define heap vector of ipa_agg_jf_item_t and of ipa_agg_jump_function_t. (ipa_node_params): Make lattices an array of ipcp_param_lattices. (ipa_agg_replacement_value): New type and its vector. (ipa_set_node_agg_value_chain) Declare. (ipa_node_agg_replacements): Likewise. (ipa_get_agg_replacements_for_node): New function. (ipcp_agg_lattice_pool): Declare. (ipa_dump_agg_replacement_values): Likewise. (ipa_prop_write_all_agg_replacement): Likewise. (ipa_prop_read_all_agg_replacement): Likewise. (ipcp_transform_function): Likewise. * ipa-inline-analysis.c (estimate_ipcp_clone_size_and_time): Pass around known aggregates and hints. * ipa-inline.h: include ipa-prop.h. (estimate_ipcp_clone_size_and_time): Adjust declaration. *
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #11 from Martin Jambor jamborm at gcc dot gnu.org 2012-08-30 15:58:40 UTC --- The aggregate functions and their use in inlining/ipa-cp heuristics is in, at least with my PHI predicate computing patch which I re-submitted today we even get a predicate for known loop iterations for function init today. This means that even today the function in your app should be inlined much more likely. In order to propagate stuff without inlining, IPA-CP must be enhanced which is something I am still only working on.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #10 from Martin Jambor jamborm at gcc dot gnu.org 2012-07-27 09:34:41 UTC --- (In reply to comment #9) Shouldn't IPA-CP be able to do this already? It does appear to handle CONST_DECLs already... Only if it finds them in the call statement itself, it relies on early constant propagation to get the constants there. But (AFAIK) nothing propagates (even scalar) constants through non-gimple-registers and n is not a register because it has its address taken.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added CC||steven at gcc dot gnu.org --- Comment #9 from Steven Bosscher steven at gcc dot gnu.org 2012-07-26 22:49:16 UTC --- (In reply to comment #8) Now if we could somehow propagate 10 into the actual argument of the call statement, IPA-CP should pick it up and propagate it into the caller. Another alternative is to construct an aggregate jump function for it when we have them. I'll keep this testcase in mind when working on them. Shouldn't IPA-CP be able to do this already? It does appear to handle CONST_DECLs already...
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |ASSIGNED AssignedTo|unassigned at gcc dot |jamborm at gcc dot gnu.org |gnu.org | --- Comment #8 from Martin Jambor jamborm at gcc dot gnu.org 2012-07-20 19:59:02 UTC --- (In reply to comment #6) This has nothing to do with LTO - with a single compilation unit you can use -fwhole-program. The issue is that Fortran passes parameters by reference and our interprocedural constant-propagation pass does not know how to deal with that. The IPA SRA pass which is supposed to fix that decides that init cannot have its signature changed. Martin, can you check why? I think we ought to optimize this with -O3 -fwhole-program -fno-inline. IPA-SRA is not really an IPA pass and even with -fwhole-program it cannot change signatures of functions which might be called from other compilation units (without creating clones). In the testcase, _init is called by MAIN in the following way: integer(kind=4) n; bb 2: n = 10; init_ (x, n); Now if we could somehow propagate 10 into the actual argument of the call statement, IPA-CP should pick it up and propagate it into the caller. Another alternative is to construct an aggregate jump function for it when we have them. I'll keep this testcase in mind when working on them.
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 --- Comment #7 from Igor Zamyatin izamyatin at gmail dot com 2012-07-19 19:09:49 UTC --- Any thoughts here?
[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization CC||jamborm at gcc dot gnu.org Component|lto |tree-optimization Summary|Possible lto improvement|Possible IPA-SRA / IPA-CP ||improvement --- Comment #6 from Richard Guenther rguenth at gcc dot gnu.org 2012-06-28 10:08:13 UTC --- This has nothing to do with LTO - with a single compilation unit you can use -fwhole-program. The issue is that Fortran passes parameters by reference and our interprocedural constant-propagation pass does not know how to deal with that. The IPA SRA pass which is supposed to fix that decides that init cannot have its signature changed. Martin, can you check why? I think we ought to optimize this with -O3 -fwhole-program -fno-inline.