[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2014-05-23 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

--- Comment #19 from Martin Jambor jamborm at gcc dot gnu.org ---
Author: jamborm
Date: Fri May 23 15:52:20 2014
New Revision: 210864

URL: http://gcc.gnu.org/viewcvs?rev=210864root=gccview=rev
Log:
2014-05-23  Martin Jambor  mjam...@suse.cz

PR tree-optimization/53787
* params.def (PARAM_IPA_MAX_AA_STEPS): New param.
* ipa-prop.h (ipa_node_params): Rename uses_analysis_done to
analysis_done, update all uses.
* ipa-prop.c: Include domwalk.h
(param_analysis_info): Removed.
(param_aa_status): New type.
(ipa_bb_info): Likewise.
(func_body_info): Likewise.
(ipa_get_bb_info): New function.
(aa_overwalked): Likewise.
(find_dominating_aa_status): Likewise.
(parm_bb_aa_status_for_bb): Likewise.
(parm_preserved_before_stmt_p): Changed to use new param AA info.
(load_from_unmodified_param): Accept func_body_info as a parameter
instead of parms_ainfo.
(parm_ref_data_preserved_p): Changed to use new param AA info.
(parm_ref_data_pass_through_p): Likewise.
(ipa_load_from_parm_agg_1): Likewise.  Update callers.
(compute_complex_assign_jump_func): Changed to use new param AA info.
(compute_complex_ancestor_jump_func): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.
(ipa_compute_jump_functions): Removed.
(ipa_compute_jump_functions_for_bb): New function.
(ipa_analyze_indirect_call_uses): Likewise, moved variable
declarations down.
(ipa_analyze_virtual_call_uses): Accept func_body_info instead of node
and info, moved variable declarations down.
(ipa_analyze_call_uses): Accept and pass on func_body_info instead of
node and info.
(ipa_analyze_stmt_uses): Likewise.
(ipa_analyze_params_uses): Removed.
(ipa_analyze_params_uses_in_bb): New function.
(ipa_analyze_controlled_uses): Likewise.
(free_ipa_bb_info): Likewise.
(analysis_dom_walker): New class.
(ipa_analyze_node): Handle node-specific forbidden analysis,
initialize and free func_body_info, use dominator walker.
(ipcp_modif_dom_walker): New class.
(ipcp_transform_function): Create and free func_body_info, use
ipcp_modif_dom_walker, moved a lot of functionality there.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/invoke.texi
trunk/gcc/ipa-prop.c
trunk/gcc/ipa-prop.h
trunk/gcc/params.def


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2014-02-28 Thread izamyatin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

--- Comment #18 from Igor Zamyatin izamyatin at gmail dot com ---
Martin,

I checked the patch and can confirm it gives necessary speedup on the test
(UMTmk_1.1)
Thanks!


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2014-02-14 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

--- Comment #17 from Martin Jambor jamborm at gcc dot gnu.org ---
Created attachment 32136
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32136action=edit
Patch doing ipa-prop function body analysis in dominator order

Yuri, this patch should make the requested propagation happen even in
the benchmark attached to comment #14.  Can you please verify it works
for you?  Does it speed up anything for you?  Thanks.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2013-01-25 Thread jamborm at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787



--- Comment #16 from Martin Jambor jamborm at gcc dot gnu.org 2013-01-25 
18:32:39 UTC ---

I do have a caller of the clone (in the WPA dump):



init_.constprop.2/71 (init_.constprop.2) @0x7f10180f06f0

  Type: function

  ...

  Clone of init_/41

  ...

  Called by: driver_.constprop.1/70 (1.00 per call) 

  Calls: memcpy/49 (1.00 per call) 



that is not the problem.  The problem is that the pass-through jump

function for npart does not have the agg_preserved flag set.  Ido not

yet know why that is the case, nevertheless it means the value is not

propagated to init.  I will have a detailed look, thanks a lot for

the testcase.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2013-01-22 Thread ysrumyan at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787



--- Comment #14 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-22 
15:32:06 UTC ---

Created attachment 29250

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29250

testcase in F90



Reproducer for IPA_CP


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2013-01-22 Thread ysrumyan at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787



Yuri Rumyantsev ysrumyan at gmail dot com changed:



   What|Removed |Added



 CC||ysrumyan at gmail dot com



--- Comment #15 from Yuri Rumyantsev ysrumyan at gmail dot com 2013-01-22 
15:33:33 UTC ---

We checked that for the attached simple test-case IPA_CP is done but it does

not work for the real bench UMTmk_1.1 it does not work. In this bench we have

the following chain of stmts:



UMTmk:

npart = 16

call driver(Size, Geom, npart, storePsi)



driver:

   call init(Size, Geom, npart, storePsi,  

 next,omega,abdym,sigvol,qc,   

 TPSIC,PSIC,PSIB,PSIFP,CUREZ)



and we did not see that value 16 for npart has been propagated (if so the

innermost loops with npart upper bound will be completely unrolled).



If we look at call graph for init we see that it does not have callee in graph:



init_.constprop.2/72 (init_.constprop.2) @0x7f0874ee3b90

  Type: function

  Visibility: used_from_other_partition public visibility_specified

visibility:hidden

  References: 

  Referring: 

  Read from file: /tmp/ccGZySlu.ltrans2.o

  Clone of init_.2535/55

  Function flags: analyzed local finalized

  Called by: 

...

I put into attachment the whole bench for investigation.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-11-08 Thread jamborm at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787



Martin Jambor jamborm at gcc dot gnu.org changed:



   What|Removed |Added



 Status|ASSIGNED|RESOLVED

 Resolution||FIXED



--- Comment #13 from Martin Jambor jamborm at gcc dot gnu.org 2012-11-08 
14:43:41 UTC ---

So, this now works as expected, the testcase is even in the testsuite.

The creation of aggregate jump function is still quite rudimentary so

it is possible that in more complex scenarios, the propagation might

not take place (testcases welcome) and even in the propagation phase

there are still a few things wanting.  Nevertheless, those potential

shortcomings should be subjects to separate requests/PRs/whatever.



Thanks for reporting and for the testcase.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-11-07 Thread jamborm at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787



--- Comment #12 from Martin Jambor jamborm at gcc dot gnu.org 2012-11-07 
15:56:00 UTC ---

Author: jamborm

Date: Wed Nov  7 15:55:54 2012

New Revision: 193298



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=193298

Log:

2012-11-07  Martin Jambor  mjam...@suse.cz



PR tree-optimization/53787

* ipa-cp.c (ipcp_value_source): New field offset.

(ipcp_agg_lattice): New type.

(ipcp_param_lattices): Likewise, move virt_call from ipcp_lattice here.

(ipcp_agg_lattice_pool): New variable.

(ipa_get_parm_lattices): New function.

(ipa_get_lattice): Turned into ipa_get_scalar_lat, use the above.

Adjusted all callers.

(print_lattice): New function.

(print_all_lattices): Use the above, also print aggregate lattices.

(set_agg_lats_to_bottom): New function.

(set_agg_lats_contain_variable): Likewise.

(set_all_contains_variable): Likewise.

(initialize_node_lattices): Also handle aggregate lattices, set

virt_call in ipcp_param_lattices.

(add_value_source): Handle offsets.

(add_value_to_lattice): Likewise.

(add_scalar_value_to_lattice): New function.

(propagate_vals_accross_pass_through): Use add_scalar_value_to_lattice.

(propagate_vals_accross_ancestor): Likewise.

(propagate_accross_jump_function): Renamed to

propagate_scalar_accross_jump_function, use

add_scalar_value_to_lattice.

(set_check_aggs_by_ref): New function.

(merge_agg_lats_step): Likewise.

(set_chain_of_aglats_contains_variable): Likewise.

(merge_aggregate_lattices): Likewise.

(propagate_constants_accross_call): Also handle aggregate lattices.

(hint_time_bonus): New function.

(context_independent_aggregate_values): Likewise.

(gather_context_independent_values): Also handle agggregate values.

(agg_jmp_p_vec_for_t_vec): New function.

(estimate_local_effects): Also handle agggregate values.

(add_all_node_vals_to_toposort): Likewise.

(ipcp_propagate_stage): Use struct ipcp_param_lattices.

(get_clone_agg_value): New function.

(cgraph_edge_brings_value_p): Also handle agggregate values.

(create_specialized_node): Likewise.

(find_more_values_for_callers_subset): Rename to

find_more_scalar_values_for_callers_subset.  Modify dump.

(copy_plats_to_inter): New function.

(intersect_with_plats): Likewise.

(agg_replacements_to_vector): Likewise.

(intersect_with_agg_replacements): Likewise.

(find_aggregate_values_for_callers_subset): Likewise.

(known_aggs_to_agg_replacement_list): Likewise.

(cgraph_edge_brings_all_scalars_for_node): Likewise.

(cgraph_edge_brings_all_agg_vals_for_node): Likewise.

(perhaps_add_new_callers): Old functionality moved to

cgraph_edge_brings_all_scalars_for_node, call it and

cgraph_edge_brings_all_agg_vals_for_node.

(ipcp_val_in_agg_replacements_p): New function.

(decide_about_value): New function.

(decide_whether_version_node): A lot of functionality moved to

decide_about_value.  Also handle agggregate values.

(ipcp_driver): Also allocate ipcp_agg_lattice_pool.

(pass_ipa_cp): Fill in new entries.

* ipa-prop.c (ipa_node_agg_replacements): New variable.

(free_parms_ainfo): New function.

(ipa_analyze_node): Use free_parms_ainfo to free stuff.

(ipa_find_agg_cst_for_param): Do not rely on offset ordering.

(ipa_set_node_agg_value_chain): New function.

(ipa_node_removal_hook): Also handle ipa_node_agg_replacements.

(ipa_node_duplication_hook): Likewise.

(ipa_free_all_structures_after_ipa_cp): Also free ipcp_agg_lattice_pool.

(ipa_free_all_structures_after_iinln): Likewise.

(ipa_dump_agg_replacement_values): New function.

(write_agg_replacement_chain): Likewise.

(read_agg_replacement_chain): Likewise.

(ipa_prop_write_all_agg_replacement): Likewise.

(read_replacements_section): Likewise.

(ipa_prop_read_all_agg_replacement): Likewise.

(adjust_agg_replacement_values): Likewise.

(ipcp_transform_function): Likewise.

* ipa-prop.h: Also define heap vector of ipa_agg_jf_item_t and of

ipa_agg_jump_function_t.

(ipa_node_params): Make lattices an array of ipcp_param_lattices.

(ipa_agg_replacement_value): New type and its vector.

(ipa_set_node_agg_value_chain) Declare.

(ipa_node_agg_replacements): Likewise.

(ipa_get_agg_replacements_for_node): New function.

(ipcp_agg_lattice_pool): Declare.

(ipa_dump_agg_replacement_values): Likewise.

(ipa_prop_write_all_agg_replacement): Likewise.

(ipa_prop_read_all_agg_replacement): Likewise.

(ipcp_transform_function): Likewise.

* ipa-inline-analysis.c (estimate_ipcp_clone_size_and_time): Pass around

known aggregates and hints.

* ipa-inline.h: include ipa-prop.h.

(estimate_ipcp_clone_size_and_time): Adjust declaration.

* 

[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-08-30 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

--- Comment #11 from Martin Jambor jamborm at gcc dot gnu.org 2012-08-30 
15:58:40 UTC ---
The aggregate functions and their use in inlining/ipa-cp heuristics is
in, at least with my PHI predicate computing patch which I
re-submitted today we even get a predicate for known loop iterations
for function init today.  This means that even today the function in
your app should be inlined much more likely.  In order to propagate
stuff without inlining, IPA-CP must be enhanced which is something I
am still only working on.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-07-27 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

--- Comment #10 from Martin Jambor jamborm at gcc dot gnu.org 2012-07-27 
09:34:41 UTC ---
(In reply to comment #9)
 Shouldn't IPA-CP be able to do this already? It does appear to handle
 CONST_DECLs already...

Only if it finds them in the call statement itself, it relies on early
constant propagation to get the constants there.  But (AFAIK) nothing
propagates (even scalar) constants through non-gimple-registers and n
is not a register because it has its address taken.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-07-26 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

 CC||steven at gcc dot gnu.org

--- Comment #9 from Steven Bosscher steven at gcc dot gnu.org 2012-07-26 
22:49:16 UTC ---
(In reply to comment #8)
 Now if we could somehow propagate 10 into the actual argument of the
 call statement, IPA-CP should pick it up and propagate it into the
 caller.  Another alternative is to construct an aggregate jump
 function for it when we have them.  I'll keep this testcase in mind
 when working on them.

Shouldn't IPA-CP be able to do this already? It does appear to handle
CONST_DECLs already...


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-07-20 Thread jamborm at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

Martin Jambor jamborm at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |ASSIGNED
 AssignedTo|unassigned at gcc dot   |jamborm at gcc dot gnu.org
   |gnu.org |

--- Comment #8 from Martin Jambor jamborm at gcc dot gnu.org 2012-07-20 
19:59:02 UTC ---
(In reply to comment #6)
 This has nothing to do with LTO - with a single compilation unit you can
 use -fwhole-program.  The issue is that Fortran passes parameters by reference
 and our interprocedural constant-propagation pass does not know how to deal
 with that.  The IPA SRA pass which is supposed to fix that decides that
 init cannot have its signature changed.  Martin, can you check why?
 I think we ought to optimize this with -O3 -fwhole-program -fno-inline.

IPA-SRA is not really an IPA pass and even with -fwhole-program it
cannot change signatures of functions which might be called from other
compilation units (without creating clones).

In the testcase, _init is called by MAIN in the following way:

  integer(kind=4) n;

  bb 2:
  n = 10;
  init_ (x, n);

Now if we could somehow propagate 10 into the actual argument of the
call statement, IPA-CP should pick it up and propagate it into the
caller.  Another alternative is to construct an aggregate jump
function for it when we have them.  I'll keep this testcase in mind
when working on them.


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-07-19 Thread izamyatin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

--- Comment #7 from Igor Zamyatin izamyatin at gmail dot com 2012-07-19 
19:09:49 UTC ---
Any thoughts here?


[Bug tree-optimization/53787] Possible IPA-SRA / IPA-CP improvement

2012-06-28 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53787

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 CC||jamborm at gcc dot gnu.org
  Component|lto |tree-optimization
Summary|Possible lto improvement|Possible IPA-SRA / IPA-CP
   ||improvement

--- Comment #6 from Richard Guenther rguenth at gcc dot gnu.org 2012-06-28 
10:08:13 UTC ---
This has nothing to do with LTO - with a single compilation unit you can
use -fwhole-program.  The issue is that Fortran passes parameters by reference
and our interprocedural constant-propagation pass does not know how to deal
with that.  The IPA SRA pass which is supposed to fix that decides that
init cannot have its signature changed.  Martin, can you check why?
I think we ought to optimize this with -O3 -fwhole-program -fno-inline.