On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor <mjam...@suse.cz> wrote: > Hi, > > the patch below is the second (and last) revived type-based > devirtualization patch that did not make it to 4.6. It deals with > virtual calls from the function in which the there is also the object > declaration: > > void foo() > { > A a; > > a.foo (); > } > > Normally, the front-end would do the devirtualization on its own, > however, early-inlining can create these situations too. Since there > is nothing interprocedural going on, the current inlining and IPA-CP > devirtualization bits are of no help. We do not do type-based > devirtualization in OBJ_TYPE_REF folding either, because the necessary > type-changing checks might make it quite expensive.
But in the above case - a is not address-taken - we do not need type-based devirtualization. So, can you explain why we do not forward the proper vtable from the vtable store that would lead to the known-type info? Thanks, Richard. > Thus, this patch > introduces a new pass to do that. > > The patch basically piggy-tails on the intraprocedural > devirtualization mechanism, trying to construct a known-type jump > function for all objects in OBJ_TYPE_REF calls and then immediately > using it like we do in IPA-CP. > > The original patch was doing this as a part of > pass_rebuild_cgraph_edges. Honza did not like this idea so I made it > a separate pass. First, I scheduled it after > pass_rebuild_cgraph_edges and was only traversing indirect edges, > avoiding a sweep over all of the IL. Unfortunately, this does not > work in one scenario. If the newly-known destination of a virtual > call is known not to throw, we may have to purge some EH CFG edges and > potentially basic blocks. If these basic blocks contain calls > (typically calls to object destructors), we may end up having stale > call edges in the call graph... and our current approach to that > problem is to call pass_rebuild_cgraph_edges. I think that I was not > running into this problem when the mechanism was a part of that pass > just because of pure luck. Anyway, this is why I eventually opted for > a sweep over the statements. > > My best guess is that it is probably worth it, but only because the > overhead should be still fairly low. The pass triggers quite a number > of times when building libstdc++ and it can speed up an artificial > testcase which I will attach from over 20 seconds to 7s on my older > desktop - it is very similar to the one I posted with the previous > patch but this time the object constructors must not get early inlined > but the function multiply_matrices has to. Currently I have problems > compiling Firefox even without LTO so I don't have any numbers from it > either. IIRC, Honza did not see this patch trigger there when he > tried the ancient version almost a year go. On the other hand, Maxim > claimed that the impact can be noticeable on some code base he is > concerned about. > > I have successfully bootstrapped and tested the patch on x86_64-linux. > What do you think, should we include this in trunk? > > Thanks, > > Martin > > > 2011-10-31 Martin Jambor <mjam...@suse.cz> > > * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... > * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported and > updated all callers. > (intraprocedural_devirtualization): New function. > (gate_intra_devirtualization): Likewise. > (pass_intra_devirt): New pass. > * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. > * passes.c (init_optimization_passes): Schedule pass_intra_devirt. > * tree-pass.h (pass_intra_devirt): Declare. > > * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. > * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. > > > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C > =================================================================== > --- /dev/null > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C > @@ -0,0 +1,62 @@ > +/* Verify that virtual calls are folded even when a typecast to an > + ancestor is involved along the way. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-devirt" } */ > + > +extern "C" void abort (void); > + > +class A > +{ > +public: > + int data; > + virtual int foo (int i); > +}; > + > + > +class B : public A > +{ > +public: > + __attribute__ ((noinline)) B(); > + virtual int foo (int i); > +}; > + > +int __attribute__ ((noinline)) A::foo (int i) > +{ > + return i + 1; > +} > + > +int __attribute__ ((noinline)) B::foo (int i) > +{ > + return i + 2; > +} > + > +int __attribute__ ((noinline,noclone)) get_input(void) > +{ > + return 1; > +} > + > +__attribute__ ((noinline)) B::B() > +{ > +} > + > +static inline int middleman_1 (class A *obj, int i) > +{ > + return obj->foo (i); > +} > + > +static inline int middleman_2 (class B *obj, int i) > +{ > + return middleman_1 (obj, i); > +} > + > +int main (int argc, char *argv[]) > +{ > + class B b; > + > + if (middleman_2 (&b, get_input ()) != 3) > + abort (); > + return 0; > +} > + > +/* { dg-final { scan-tree-dump "Immediately devirtualizing > call.*into.*B::foo" "devirt" } } */ > +/* { dg-final { cleanup-tree-dump "devirt" } } */ > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C > =================================================================== > --- /dev/null > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C > @@ -0,0 +1,95 @@ > +/* Verify that virtual calls are folded even when a typecast to an > + ancestor is involved along the way. */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-devirt-slim" } */ > + > +extern "C" void abort (void); > + > +class Distraction > +{ > +public: > + float f; > + double d; > + Distraction () > + { > + f = 8.3; > + d = 10.2; > + } > + virtual float bar (float z); > +}; > + > +class A > +{ > +public: > + int data; > + virtual int foo (int i); > +}; > + > +class B : public A > +{ > +public: > + int data_2; > + virtual int foo (int i); > + virtual int baz (int i); > +}; > + > + > +class C : public Distraction, public B > +{ > +public: > + __attribute__ ((noinline)) C(); > + virtual int foo (int i); > +}; > + > +float __attribute__ ((noinline)) Distraction::bar (float z) > +{ > + f += z; > + return f/2; > +} > + > +int __attribute__ ((noinline)) A::foo (int i) > +{ > + return i + 1; > +} > + > +int __attribute__ ((noinline)) B::foo (int i) > +{ > + return i + 2; > +} > + > +int __attribute__ ((noinline)) B::baz (int i) > +{ > + return i * 15; > +} > + > +int __attribute__ ((noinline)) C::foo (int i) > +{ > + return i + 3; > +} > + > +int __attribute__ ((noinline,noclone)) get_input(void) > +{ > + return 1; > +} > + > +static inline int middleman (class A *obj, int i) > +{ > + return obj->foo (i); > +} > + > +__attribute__ ((noinline)) C::C() > +{ > +} > + > +int main (int argc, char *argv[]) > +{ > + class C c; > + > + if (middleman (&c, get_input ()) != 4) > + abort (); > + > + return 0; > +} > + > +/* { dg-final { scan-tree-dump "Immediately devirtualizing > call.*into.*C::.*foo" "devirt" } } */ > +/* { dg-final { cleanup-tree-dump "devirt" } } */ > Index: src/gcc/ipa-cp.c > =================================================================== > --- src.orig/gcc/ipa-cp.c > +++ src/gcc/ipa-cp.c > @@ -674,20 +674,6 @@ ipa_get_jf_ancestor_result (struct ipa_j > return NULL_TREE; > } > > -/* Extract the acual BINFO being described by JFUNC which must be a known > type > - jump function. */ > - > -static tree > -ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc) > -{ > - tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type); > - if (!base_binfo) > - return NULL_TREE; > - return get_binfo_at_offset (base_binfo, > - jfunc->value.known_type.offset, > - jfunc->value.known_type.component_type); > -} > - > /* Determine whether JFUNC evaluates to a known value (that is either a > constant or a binfo) and if so, return it. Otherwise return NULL. INFO > describes the caller node so that pass-through jump functions can be > @@ -699,7 +685,7 @@ ipa_value_from_jfunc (struct ipa_node_pa > if (jfunc->type == IPA_JF_CONST) > return jfunc->value.constant; > else if (jfunc->type == IPA_JF_KNOWN_TYPE) > - return ipa_value_from_known_type_jfunc (jfunc); > + return ipa_binfo_from_known_type_jfunc (jfunc); > else if (jfunc->type == IPA_JF_PASS_THROUGH > || jfunc->type == IPA_JF_ANCESTOR) > { > @@ -1006,7 +992,7 @@ propagate_accross_jump_function (struct > > if (jfunc->type == IPA_JF_KNOWN_TYPE) > { > - val = ipa_value_from_known_type_jfunc (jfunc); > + val = ipa_binfo_from_known_type_jfunc (jfunc); > if (!val) > return set_lattice_contains_variable (dest_lat); > } > Index: src/gcc/ipa-prop.c > =================================================================== > --- src.orig/gcc/ipa-prop.c > +++ src/gcc/ipa-prop.c > @@ -3069,3 +3069,107 @@ ipa_update_after_lto_read (void) > if (node->analyzed) > ipa_initialize_node_params (node); > } > + > +/* Extract the acual BINFO being described by JFUNC which must be a known > type > + jump function. */ > + > +tree > +ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *jfunc) > +{ > + tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type); > + if (!base_binfo) > + return NULL_TREE; > + return get_binfo_at_offset (base_binfo, > + jfunc->value.known_type.offset, > + jfunc->value.known_type.component_type); > +} > + > +/* Intraprocedural (early) type-based devirtualization pass. Looks at all > call > + statements and attempts type-base devirtualization on those calling an > + OBJ_TYPE_REF. */ > + > +static unsigned int > +intraprocedural_devirtualization (void) > +{ > + basic_block bb; > + gimple_stmt_iterator gsi; > + bool cfg_changed = false; > + > + FOR_EACH_BB (bb) > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + if (is_gimple_call (gsi_stmt (gsi))) > + { > + tree binfo, token, fndecl; > + gimple call = gsi_stmt (gsi); > + tree target = gimple_call_fn (call); > + struct ipa_jump_func jfunc; > + > + if (TREE_CODE (target) != OBJ_TYPE_REF) > + continue; > + > + jfunc.type = IPA_JF_UNKNOWN; > + compute_known_type_jump_func (OBJ_TYPE_REF_OBJECT (target), &jfunc, > + call); > + if (jfunc.type != IPA_JF_KNOWN_TYPE) > + continue; > + binfo = ipa_binfo_from_known_type_jfunc (&jfunc); > + if (!binfo) > + continue; > + token = OBJ_TYPE_REF_TOKEN (target); > + fndecl = gimple_get_virt_method_for_binfo (tree_low_cst (token, 1), > + binfo); > + if (!fndecl) > + continue; > + > + if (dump_file) > + { > + fprintf (dump_file, "ipa-prop: Immediately devirtualizing call > "); > + print_gimple_expr (dump_file, call, 0, 0); > + } > + > + gimple_call_set_fndecl (call, fndecl); > + update_stmt (call); > + if (maybe_clean_eh_stmt (call) > + && gimple_purge_dead_eh_edges (bb)) > + cfg_changed = true; > + > + if (dump_file) > + { > + fprintf (dump_file, " into "); > + print_gimple_expr (dump_file, call, 0, 0); > + fprintf (dump_file, "\n"); > + } > + } > + > + > + if (cfg_changed) > + return TODO_cleanup_cfg; > + else > + return 0; > +} > + > +static bool > +gate_intra_devirtualization (void) > +{ > + return flag_devirtualize != 0; > +} > + > + > +struct gimple_opt_pass pass_intra_devirt = > + { > + { > + GIMPLE_PASS, > + "devirt", /* name */ > + gate_intra_devirtualization, /* gate */ > + intraprocedural_devirtualization, /* execute */ > + NULL, /* sub */ > + NULL, /* next */ > + 0, /* static_pass_number */ > + TV_CGRAPH, /* tv_id */ > + PROP_cfg, /* > properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + 0, /* todo_flags_finish */ > + } > + }; > Index: src/gcc/ipa-prop.h > =================================================================== > --- src.orig/gcc/ipa-prop.h > +++ src/gcc/ipa-prop.h > @@ -347,6 +347,7 @@ bool ipa_propagate_indirect_call_infos ( > > /* Indirect edge and binfo processing. */ > struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, > tree); > +tree ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *); > > /* Functions related to both. */ > void ipa_analyze_node (struct cgraph_node *); > Index: src/gcc/passes.c > =================================================================== > --- src.orig/gcc/passes.c > +++ src/gcc/passes.c > @@ -1225,6 +1225,7 @@ init_optimization_passes (void) > NEXT_PASS (pass_cleanup_eh); > NEXT_PASS (pass_profile); > NEXT_PASS (pass_local_pure_const); > + NEXT_PASS (pass_intra_devirt); > /* Split functions creates parts that are not run through > early optimizations again. It is thus good idea to do this > late. */ > Index: src/gcc/tree-pass.h > =================================================================== > --- src.orig/gcc/tree-pass.h > +++ src/gcc/tree-pass.h > @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_trace > extern struct gimple_opt_pass pass_warn_unused_result; > extern struct gimple_opt_pass pass_split_functions; > extern struct gimple_opt_pass pass_feedback_split_functions; > +extern struct gimple_opt_pass pass_intra_devirt; > > /* IPA Passes */ > extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >