Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)
 
 We can dig it more to later understand why only 4.9 hits the problem.

This would be very interesting, because 4.9 ought to be better here
(removing more virtuals early) than previous compilers.
There was number of changes in 4.9 that may affect this - some fixes at
C++ side giving middle end more inline candidates and also this change
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html
So perhaps most of your virtual functions are !COMDAT!EXTERNAL, but that
does not seem to make much sense to me either :(
 
 My size results with -fno-devirtualize-speculatively is out. It
 shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
 O2 compile.

Hmm, this is interesting, too. 1.68% is definitly a lot more than I would
expect or have I seen on other testcases.  You can take a look at summaries in
-fdump-ipa-devirt pass.

In opts.c I have:
case OPT_fprofile_use:
  if (!opts_set-x_flag_branch_probabilities)
...
  /* Indirect call profiling should do all useful transformations
 speculative devirtualization does.  */
  if (!opts_set-x_flag_devirtualize_speculatively
   opts-x_flag_value_profile_transformations)
opts-x_flag_devirtualize_speculatively = false;

so perhaps this hunk is somehow skipped with LIPO?

Speculative devirtualization is somehwat less useful (may have more falce
positives) without LTO depending on how your headers are constructed.
It would be interesting to see if it does a lot of mistakes on your codebase.
(this can be easily done by forcing it to run with profile feedback, too and
it will tell you when its speculation differs from speculation already there).
 
 By the way, you mentioned 'hacking the
 ipa.c:walk_polymorphic_call_targets to not make the possible targets
 as
 reachable' -- is that something worth doing in trunk?   With that, we
 can probably just turn off speculative devirtualization.

Well, the check is there to enable inlining. Disabling it for
-fprofile-generate will result in lost profile samples for virtual functions.
Disabling it by default will prevent inlining of devirtualized calls making
devirtualization not really useful.
Perhaps with LIPO situation is bit different because you bring in the other
module just to inline the call as you describe.

One thing I can imagine doing is to make inliner consider the reachable
(in post-inlining sense, that is after removing extern inlines and virtual
functions) calls with priority and account only those to unit growth model.
This would make it more consistent over -fdevirtualize and more realistic
about resulting code size.

I sort of considered this option but did not have any good data suggesting
I should implement it.

In general it would be nice to understand this problem.  Also I plan to do
some retunning for 5.0 so it would be nice to know if you have other issues
with 4.9? (I did not closely followed Google branch changes, so if you can
point out those that are relevant for IPA tuning, I would be very interested
to see what problems you hit).

Honza
 
 David
 
 
 
 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) i 
can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable virtual 
functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
 Hi Honza,
 
 As David says, we will do some more experiments with the change you
 suggest and speculative devirtualization, but we needed to make this
 change in part to get an internal release out. One of the issues was a
 recent change to cp/decl2.c to make virtual function decls needed
 under flag_devirtualize.

I think that one is not in current 4.9 branch (only in mainline). 

Honza
 
 Thanks!
 Teresa
 
 On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 
  The virtual functions will be emitted in some modules, right? If they
  are hot, they will be included with the auxiliary modules. Note that
  LIPO indirect call profile will point to the comdat copy that is
  picked by the linker in the instrumentation build, so it guarantees to
  exist.
 
  If you have COMDAT virtual inline function that is used by module FOO and
  LIPO's indirect inlining will work out that it goes to comdat copy of 
  module BAR
  (that won the merging).  It will make C++ parser to also parse BAR to get
  the function, but callgarph builder will ignore it, too.
  (this is new logic in analyze_function that with -fno-devirtualize will just
  prevent all virtual functions not directly reachable to appear in symbol 
  table)
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)
 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) i 
can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable virtual 
functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
Honza
 
 
 
 -- 
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
On Sun, Oct 19, 2014 at 1:19 AM, Jan Hubicka hubi...@ucw.cz wrote:
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)

 We can dig it more to later understand why only 4.9 hits the problem.

 This would be very interesting, because 4.9 ought to be better here
 (removing more virtuals early) than previous compilers.
 There was number of changes in 4.9 that may affect this - some fixes at
 C++ side giving middle end more inline candidates and also this change
 https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html
 So perhaps most of your virtual functions are !COMDAT!EXTERNAL, but that
 does not seem to make much sense to me either :(

 My size results with -fno-devirtualize-speculatively is out. It
 shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
 O2 compile.

 Hmm, this is interesting, too. 1.68% is definitly a lot more than I would
 expect or have I seen on other testcases.  You can take a look at summaries in
 -fdump-ipa-devirt pass.

 In opts.c I have:
 case OPT_fprofile_use:
   if (!opts_set-x_flag_branch_probabilities)
 ...
   /* Indirect call profiling should do all useful transformations
  speculative devirtualization does.  */
   if (!opts_set-x_flag_devirtualize_speculatively
opts-x_flag_value_profile_transformations)
 opts-x_flag_devirtualize_speculatively = false;

 so perhaps this hunk is somehow skipped with LIPO?

The 1.68% size reduction I measured is with plain O2 compilation, not LIPO.


 Speculative devirtualization is somehwat less useful (may have more falce
 positives) without LTO depending on how your headers are constructed.
 It would be interesting to see if it does a lot of mistakes on your codebase.
 (this can be easily done by forcing it to run with profile feedback, too and
 it will tell you when its speculation differs from speculation already there).

 By the way, you mentioned 'hacking the
 ipa.c:walk_polymorphic_call_targets to not make the possible targets
 as
 reachable' -- is that something worth doing in trunk?   With that, we
 can probably just turn off speculative devirtualization.

 Well, the check is there to enable inlining. Disabling it for
 -fprofile-generate will result in lost profile samples for virtual functions.
 Disabling it by default will prevent inlining of devirtualized calls making
 devirtualization not really useful.
 Perhaps with LIPO situation is bit different because you bring in the other
 module just to inline the call as you describe.

What I meant is whether it is suitable for plain build ? I have not
looked at the details.


 One thing I can imagine doing is to make inliner consider the reachable
 (in post-inlining sense, that is after removing extern inlines and virtual
 functions) calls with priority and account only those to unit growth model.
 This would make it more consistent over -fdevirtualize and more realistic
 about resulting code size.

 I sort of considered this option but did not have any good data suggesting
 I should implement it.

 In general it would be nice to understand this problem.  Also I plan to do
 some retunning for 5.0 so it would be nice to know if you have other issues
 with 4.9? (I did not closely followed Google branch changes, so if you can
 point out those that are relevant for IPA tuning, I would be very interested
 to see what problems you hit).

Ok I will collect a list of inlining related changes and let you know latter.

thanks,

David


 Honza

 David



 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) 
i can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable 
virtual functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
   

Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
  In opts.c I have:
  case OPT_fprofile_use:
if (!opts_set-x_flag_branch_probabilities)
  ...
/* Indirect call profiling should do all useful transformations
   speculative devirtualization does.  */
if (!opts_set-x_flag_devirtualize_speculatively
 opts-x_flag_value_profile_transformations)
  opts-x_flag_devirtualize_speculatively = false;
 
  so perhaps this hunk is somehow skipped with LIPO?
 
 The 1.68% size reduction I measured is with plain O2 compilation, not LIPO.

OK, in that case I may look into trimming down the speculation with non-LTO 
compilation.
The code makes assumptions about completeness of type hiearchy that without LTO 
are
quite far reached. I tested it on firefox and there it seemed to do mostly
good job, so I decided to try to enable it for non-LTO, too.

It would be nice to understand what really happens in your case and if there
are lot of wrong speculations or if the speculated calls simply happens to not
be on hot paths in your benchmark.
 
 
  Speculative devirtualization is somehwat less useful (may have more falce
  positives) without LTO depending on how your headers are constructed.
  It would be interesting to see if it does a lot of mistakes on your 
  codebase.
  (this can be easily done by forcing it to run with profile feedback, too and
  it will tell you when its speculation differs from speculation already 
  there).
 
  By the way, you mentioned 'hacking the
  ipa.c:walk_polymorphic_call_targets to not make the possible targets
  as
  reachable' -- is that something worth doing in trunk?   With that, we
  can probably just turn off speculative devirtualization.
 
  Well, the check is there to enable inlining. Disabling it for
  -fprofile-generate will result in lost profile samples for virtual 
  functions.
  Disabling it by default will prevent inlining of devirtualized calls making
  devirtualization not really useful.
  Perhaps with LIPO situation is bit different because you bring in the other
  module just to inline the call as you describe.
 
 What I meant is whether it is suitable for plain build ? I have not
 looked at the details.

Yes, dropping the reachability walk should just work, only result in less
inlining.

Honza
 
 
  One thing I can imagine doing is to make inliner consider the reachable
  (in post-inlining sense, that is after removing extern inlines and virtual
  functions) calls with priority and account only those to unit growth model.
  This would make it more consistent over -fdevirtualize and more realistic
  about resulting code size.
 
  I sort of considered this option but did not have any good data suggesting
  I should implement it.
 
  In general it would be nice to understand this problem.  Also I plan to do
  some retunning for 5.0 so it would be nice to know if you have other issues
  with 4.9? (I did not closely followed Google branch changes, so if you can
  point out those that are relevant for IPA tuning, I would be very interested
  to see what problems you hit).
 
 Ok I will collect a list of inlining related changes and let you know latter.
 
 thanks,
 
 David
 
 
  Honza
 
  David
 
 
 
  
   Honza
  
   David
  
  
   
Honza
   
David
   
   
On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Disabling devirtualization reduces code size, both for 
 instrumentation (because
 many more virtual functions are kept longer and therefore 
 instrumented) and for
 normal optimization.

 OK, with profile instrumentation (that you seem to try to 
 minimize) i can see
 how you get noticeably more counters because virtual functions are 
 kept longer.
 (note that 4.9 is a lot more agressive on removing unreacable 
 virtual functions
 than earlier compilers).

 Instead of disabling -fdevirtualize completely (that will get you 
 more indirect
 calls and thus more topn profiling) you may consider just hacking
 ipa.c:walk_polymorphic_call_targets to not make the possible 
 targets as
 reachable. (see the conditional on before_inlining_p).

 Of course this will get you less devirtualization (but with LTO 
 the difference
 should not be big - perhaps I could make switch for that for 
 mainline) and less
 accurate profiles when you get speculative devirtualization via 
 topn.

 I would be very interested to see how much difference this makes.

 Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
It was in upstream 4.9 branch, but got reverted in r215061 to fix
PR/61214 and PR/62224.

David

On Sun, Oct 19, 2014 at 1:21 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi Honza,

 As David says, we will do some more experiments with the change you
 suggest and speculative devirtualization, but we needed to make this
 change in part to get an internal release out. One of the issues was a
 recent change to cp/decl2.c to make virtual function decls needed
 under flag_devirtualize.

 I think that one is not in current 4.9 branch (only in mainline).

 Honza

 Thanks!
 Teresa

 On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 
  The virtual functions will be emitted in some modules, right? If they
  are hot, they will be included with the auxiliary modules. Note that
  LIPO indirect call profile will point to the comdat copy that is
  picked by the linker in the instrumentation build, so it guarantees to
  exist.
 
  If you have COMDAT virtual inline function that is used by module FOO and
  LIPO's indirect inlining will work out that it goes to comdat copy of 
  module BAR
  (that won the merging).  It will make C++ parser to also parse BAR to get
  the function, but callgarph builder will ignore it, too.
  (this is new logic in analyze_function that with -fno-devirtualize will 
  just
  prevent all virtual functions not directly reachable to appear in symbol 
  table)
 
  I am surprised you hit the size limits with 4.9 only - for quite some time
  we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
  first
  that works harder to drop them early (because I hit the problem with LTO
  where they artifically bloat the size of LTO object files)
 
  Honza
 
  David
 
 
  
   Honza
  
   David
  
  
   On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
Disabling devirtualization reduces code size, both for 
instrumentation (because
many more virtual functions are kept longer and therefore 
instrumented) and for
normal optimization.
   
OK, with profile instrumentation (that you seem to try to minimize) 
i can see
how you get noticeably more counters because virtual functions are 
kept longer.
(note that 4.9 is a lot more agressive on removing unreacable 
virtual functions
than earlier compilers).
   
Instead of disabling -fdevirtualize completely (that will get you 
more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets 
as
reachable. (see the conditional on before_inlining_p).
   
Of course this will get you less devirtualization (but with LTO the 
difference
should not be big - perhaps I could make switch for that for 
mainline) and less
accurate profiles when you get speculative devirtualization via topn.
   
I would be very interested to see how much difference this makes.
   
Honza



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Jan Hubicka
 It was in upstream 4.9 branch, but got reverted in r215061 to fix
 PR/61214 and PR/62224.

Was your tests done with this change or without?

Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-19 Thread Xinliang David Li
No, Google branch is not yet synced to 4.9 head.

David

On Sun, Oct 19, 2014 at 1:11 PM, Jan Hubicka hubi...@ucw.cz wrote:
 It was in upstream 4.9 branch, but got reverted in r215061 to fix
 PR/61214 and PR/62224.

 Was your tests done with this change or without?

 Honza


[GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Teresa Johnson
Disabling devirtualization reduces code size, both for instrumentation (because
many more virtual functions are kept longer and therefore instrumented) and for
normal optimization.

Patch attached. Passes internal testing and regression tests. Ok for google/4_9?

Thanks,
Teresa

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Disabling devirtualization reduces code size, both for instrumentation (because
many more virtual functions are kept longer and therefore instrumented) and for
normal optimization.

Passes internal testing and regression tests. Ok for google/4_9?

2014-10-18  Teresa Johnson  tejohn...@google.com

gcc/:
Google ref b/17945455
* opts.c (finish_options): Disable -fdevirtualize

gcc/testsuite/:
Google ref b/17945455
* testsuite/g++.dg/ipa/devirt-10.C: Enable devirtualization for test.
* testsuite/g++.dg/ipa/devirt-11.C: Ditto.
* testsuite/g++.dg/ipa/devirt-12.C: Ditto.
* testsuite/g++.dg/ipa/devirt-13.C: Ditto.
* testsuite/g++.dg/ipa/devirt-14.C: Ditto.
* testsuite/g++.dg/ipa/devirt-15.C: Ditto.
* testsuite/g++.dg/ipa/devirt-16.C: Ditto.
* testsuite/g++.dg/ipa/devirt-17.C: Ditto.
* testsuite/g++.dg/ipa/devirt-18.C: Ditto.
* testsuite/g++.dg/ipa/devirt-19.C: Ditto.
* testsuite/g++.dg/ipa/devirt-1.C: Ditto.
* testsuite/g++.dg/ipa/devirt-20.C: Ditto.
* testsuite/g++.dg/ipa/devirt-21.C: Ditto.
* testsuite/g++.dg/ipa/devirt-22.C: Ditto.
* testsuite/g++.dg/ipa/devirt-23.C: Ditto.
* testsuite/g++.dg/ipa/devirt-24.C: Ditto.
* testsuite/g++.dg/ipa/devirt-25.C: Ditto.
* testsuite/g++.dg/ipa/devirt-26.C: Ditto.
* testsuite/g++.dg/ipa/devirt-27.C: Ditto.
* testsuite/g++.dg/ipa/devirt-28.C: Ditto.
* testsuite/g++.dg/ipa/devirt-29.C: Ditto.
* testsuite/g++.dg/ipa/devirt-2.C: Ditto.
* testsuite/g++.dg/ipa/devirt-30.C: Ditto.
* testsuite/g++.dg/ipa/devirt-31.C: Ditto.
* testsuite/g++.dg/ipa/devirt-39.C: Ditto.
* testsuite/g++.dg/ipa/devirt-3.C: Ditto.
* testsuite/g++.dg/ipa/devirt-4.C: Ditto.
* testsuite/g++.dg/ipa/devirt-5.C: Ditto.
* testsuite/g++.dg/ipa/devirt-6.C: Ditto.
* testsuite/g++.dg/ipa/devirt-7.C: Ditto.
* testsuite/g++.dg/ipa/devirt-9.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-1.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-2.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-3.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-4.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-5.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-6.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-7.C: Ditto.
* testsuite/g++.dg/ipa/devirt-c-8.C: Ditto.
* testsuite/g++.dg/ipa/devirt-d-1.C: Ditto.
* testsuite/g++.dg/ipa/devirt-g-1.C: Ditto.
* testsuite/g++.dg/ipa/imm-devirt-1.C: Ditto.
* testsuite/g++.dg/ipa/imm-devirt-2.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-1.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-2.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-3.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-4.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-5.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-7.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-8.C: Ditto.
* testsuite/g++.dg/ipa/ivinline-9.C: Ditto.
* testsuite/g++.dg/ipa/pr60600.C: Ditto.
* testsuite/g++.dg/ipa/pr60640-4.C: Ditto.
* testsuite/g++.dg/ipa/type-inheritance-1.C: Ditto.
* testsuite/g++.dg/opt/devirt1.C: Ditto.
* testsuite/g++.dg/opt/devirt2.C: Ditto.
* testsuite/g++.dg/opt/devirt3.C: Ditto.

Index: opts.c
===
--- opts.c  (revision 216286)
+++ opts.c  (working copy)
@@ -485,7 +485,6 @@ static const struct default_options default_option
 { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
-{ OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 },
Index: testsuite/g++.dg/ipa/devirt-10.C
===
--- testsuite/g++.dg/ipa/devirt-10.C(revision 216286)
+++ testsuite/g++.dg/ipa/devirt-10.C(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining } */
+/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining 
-fdevirtualize } */
 class wxPaintEvent {  };
 struct wxDCBase
 { 
Index: testsuite/g++.dg/ipa/devirt-11.C
===
--- testsuite/g++.dg/ipa/devirt-11.C

Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Jan Hubicka
 Disabling devirtualization reduces code size, both for instrumentation 
 (because
 many more virtual functions are kept longer and therefore instrumented) and 
 for
 normal optimization.

Can you refer bit more on the problem? Full devirtualization should not
increase code size, speculative devirtualization does somewhat, but by sub 1%.

Honza
 
 Patch attached. Passes internal testing and regression tests. Ok for 
 google/4_9?
 
 Thanks,
 Teresa
 
 -- 
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

 Disabling devirtualization reduces code size, both for instrumentation 
 (because
 many more virtual functions are kept longer and therefore instrumented) and 
 for
 normal optimization.
 
 Passes internal testing and regression tests. Ok for google/4_9?
 
 2014-10-18  Teresa Johnson  tejohn...@google.com
 
 gcc/:
   Google ref b/17945455
   * opts.c (finish_options): Disable -fdevirtualize
 
 gcc/testsuite/:
   Google ref b/17945455
   * testsuite/g++.dg/ipa/devirt-10.C: Enable devirtualization for test.
   * testsuite/g++.dg/ipa/devirt-11.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-12.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-13.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-14.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-15.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-16.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-17.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-18.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-19.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-1.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-20.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-21.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-22.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-23.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-24.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-25.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-26.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-27.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-28.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-29.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-2.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-30.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-31.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-39.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-3.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-4.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-5.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-6.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-7.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-9.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-1.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-2.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-3.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-4.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-5.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-6.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-7.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-c-8.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-d-1.C: Ditto.
   * testsuite/g++.dg/ipa/devirt-g-1.C: Ditto.
   * testsuite/g++.dg/ipa/imm-devirt-1.C: Ditto.
   * testsuite/g++.dg/ipa/imm-devirt-2.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-1.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-2.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-3.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-4.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-5.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-7.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-8.C: Ditto.
   * testsuite/g++.dg/ipa/ivinline-9.C: Ditto.
   * testsuite/g++.dg/ipa/pr60600.C: Ditto.
   * testsuite/g++.dg/ipa/pr60640-4.C: Ditto.
   * testsuite/g++.dg/ipa/type-inheritance-1.C: Ditto.
   * testsuite/g++.dg/opt/devirt1.C: Ditto.
   * testsuite/g++.dg/opt/devirt2.C: Ditto.
   * testsuite/g++.dg/opt/devirt3.C: Ditto.
 
 Index: opts.c
 ===
 --- opts.c(revision 216286)
 +++ opts.c(working copy)
 @@ -485,7 +485,6 @@ static const struct default_options default_option
  { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 },
 -{ OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 },
  { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 },
 Index: testsuite/g++.dg/ipa/devirt-10.C
 ===
 --- testsuite/g++.dg/ipa/devirt-10.C  (revision 216286)
 +++ testsuite/g++.dg/ipa/devirt-10.C  (working copy)
 @@ -1,5 +1,5 @@
  /* { dg-do compile } */
 -/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining } 
 */
 +/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining 
 -fdevirtualize } */
  class wxPaintEvent {  };
  struct wxDCBase
  { 
 

Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Jan Hubicka
 Disabling devirtualization reduces code size, both for instrumentation 
 (because
 many more virtual functions are kept longer and therefore instrumented) and 
 for
 normal optimization.

OK, with profile instrumentation (that you seem to try to minimize) i can see
how you get noticeably more counters because virtual functions are kept longer.
(note that 4.9 is a lot more agressive on removing unreacable virtual functions
than earlier compilers).

Instead of disabling -fdevirtualize completely (that will get you more indirect
calls and thus more topn profiling) you may consider just hacking
ipa.c:walk_polymorphic_call_targets to not make the possible targets as
reachable. (see the conditional on before_inlining_p).

Of course this will get you less devirtualization (but with LTO the difference
should not be big - perhaps I could make switch for that for mainline) and less
accurate profiles when you get speculative devirtualization via topn.

I would be very interested to see how much difference this makes.

Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Xinliang David Li
ok.

David

On Sat, Oct 18, 2014 at 9:25 AM, Teresa Johnson tejohn...@google.com wrote:
 Disabling devirtualization reduces code size, both for instrumentation 
 (because
 many more virtual functions are kept longer and therefore instrumented) and 
 for
 normal optimization.

 Patch attached. Passes internal testing and regression tests. Ok for 
 google/4_9?

 Thanks,
 Teresa

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Xinliang David Li
Devirtualization needs more tuning to be usable internally. We have
seen 1.6% size increase on average, but unnoticable performance
improvement with this option on. I would rather use the size budget to
make inliner more aggressive :)

For instrumentation build, some of our builds will reach the build
machine limit with this option on.

In LIPO mode, the compiler uses indirect target  profile counter to
build fake edges in order to keep the targets alive -- this is
probably the reason we do not see an performance impact.

David


On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Disabling devirtualization reduces code size, both for instrumentation 
 (because
 many more virtual functions are kept longer and therefore instrumented) and 
 for
 normal optimization.

 OK, with profile instrumentation (that you seem to try to minimize) i can see
 how you get noticeably more counters because virtual functions are kept 
 longer.
 (note that 4.9 is a lot more agressive on removing unreacable virtual 
 functions
 than earlier compilers).

 Instead of disabling -fdevirtualize completely (that will get you more 
 indirect
 calls and thus more topn profiling) you may consider just hacking
 ipa.c:walk_polymorphic_call_targets to not make the possible targets as
 reachable. (see the conditional on before_inlining_p).

 Of course this will get you less devirtualization (but with LTO the difference
 should not be big - perhaps I could make switch for that for mainline) and 
 less
 accurate profiles when you get speculative devirtualization via topn.

 I would be very interested to see how much difference this makes.

 Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Jan Hubicka
 Devirtualization needs more tuning to be usable internally. We have
 seen 1.6% size increase on average, but unnoticable performance

Does that happen with -fno-devirtualize-speculatively?

 improvement with this option on. I would rather use the size budget to
 make inliner more aggressive :)

:) Well, that is what devirtualization is supposed to do. All it does is 
converting
indirect calls to direct calls that can increase size only if extra inlining 
happens.
Speculative devirtualization will increase code size, but 1% seems quite off - 
you
would need very many virtual calls in the code to get close to that. Also it 
turns
itself off with profile feedback. You can try -fno-devirtualize-speculatively

Thinking about it perhaps what really happens is the following:  You have a lot
of potentially reachable virtual functions in header and those are kept alive 
until
inlining because one of virtual calls in your unit may be devirtualized to them
(change to 4.8 is that we no longer keep all virtual functions, just those that 
may
be reached). This increase unit size estimates and thus gives inliner more 
budget
to inline until it hits inline-unit-growth limits.
 
 For instrumentation build, some of our builds will reach the build
 machine limit with this option on.

Can you try to just disable the reachability code in ipa.c (as I wrote earlier)
and see if it makes difference?
 
 In LIPO mode, the compiler uses indirect target  profile counter to
 build fake edges in order to keep the targets alive -- this is
 probably the reason we do not see an performance impact.

Does it?  With -fno-devirtualize you won't even get the virtual functions added
into the callgraph unless you disable the logic in cgraphunit.c

Honza
 
 David
 
 
 On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Disabling devirtualization reduces code size, both for instrumentation 
  (because
  many more virtual functions are kept longer and therefore instrumented) 
  and for
  normal optimization.
 
  OK, with profile instrumentation (that you seem to try to minimize) i can 
  see
  how you get noticeably more counters because virtual functions are kept 
  longer.
  (note that 4.9 is a lot more agressive on removing unreacable virtual 
  functions
  than earlier compilers).
 
  Instead of disabling -fdevirtualize completely (that will get you more 
  indirect
  calls and thus more topn profiling) you may consider just hacking
  ipa.c:walk_polymorphic_call_targets to not make the possible targets as
  reachable. (see the conditional on before_inlining_p).
 
  Of course this will get you less devirtualization (but with LTO the 
  difference
  should not be big - perhaps I could make switch for that for mainline) and 
  less
  accurate profiles when you get speculative devirtualization via topn.
 
  I would be very interested to see how much difference this makes.
 
  Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Xinliang David Li
On Sat, Oct 18, 2014 at 3:23 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Devirtualization needs more tuning to be usable internally. We have
 seen 1.6% size increase on average, but unnoticable performance

 Does that happen with -fno-devirtualize-speculatively?


Not sure -- I will do some experiment.

 improvement with this option on. I would rather use the size budget to
 make inliner more aggressive :)

 :) Well, that is what devirtualization is supposed to do. All it does is 
 converting
 indirect calls to direct calls that can increase size only if extra inlining 
 happens.
 Speculative devirtualization will increase code size, but 1% seems quite off 
 - you
 would need very many virtual calls in the code to get close to that. Also it 
 turns
 itself off with profile feedback. You can try -fno-devirtualize-speculatively

 Thinking about it perhaps what really happens is the following:  You have a 
 lot
 of potentially reachable virtual functions in header and those are kept alive 
 until
 inlining because one of virtual calls in your unit may be devirtualized to 
 them
 (change to 4.8 is that we no longer keep all virtual functions, just those 
 that may
 be reached). This increase unit size estimates and thus gives inliner more 
 budget
 to inline until it hits inline-unit-growth limits.

This is likely -- we will study this more in details later.


 For instrumentation build, some of our builds will reach the build
 machine limit with this option on.

 Can you try to just disable the reachability code in ipa.c (as I wrote 
 earlier)
 and see if it makes difference?

This is certainly better if also solves the problem, but will do that
later after our internal release is out. We hit the size and slow
instrumentation build issue pretty late in the cycle.


 In LIPO mode, the compiler uses indirect target  profile counter to
 build fake edges in order to keep the targets alive -- this is
 probably the reason we do not see an performance impact.

 Does it?  With -fno-devirtualize you won't even get the virtual functions 
 added
 into the callgraph unless you disable the logic in cgraphunit.c

The virtual functions will be emitted in some modules, right? If they
are hot, they will be included with the auxiliary modules. Note that
LIPO indirect call profile will point to the comdat copy that is
picked by the linker in the instrumentation build, so it guarantees to
exist.

David



 Honza

 David


 On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Disabling devirtualization reduces code size, both for instrumentation 
  (because
  many more virtual functions are kept longer and therefore instrumented) 
  and for
  normal optimization.
 
  OK, with profile instrumentation (that you seem to try to minimize) i can 
  see
  how you get noticeably more counters because virtual functions are kept 
  longer.
  (note that 4.9 is a lot more agressive on removing unreacable virtual 
  functions
  than earlier compilers).
 
  Instead of disabling -fdevirtualize completely (that will get you more 
  indirect
  calls and thus more topn profiling) you may consider just hacking
  ipa.c:walk_polymorphic_call_targets to not make the possible targets as
  reachable. (see the conditional on before_inlining_p).
 
  Of course this will get you less devirtualization (but with LTO the 
  difference
  should not be big - perhaps I could make switch for that for mainline) and 
  less
  accurate profiles when you get speculative devirtualization via topn.
 
  I would be very interested to see how much difference this makes.
 
  Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Jan Hubicka
 
 The virtual functions will be emitted in some modules, right? If they
 are hot, they will be included with the auxiliary modules. Note that
 LIPO indirect call profile will point to the comdat copy that is
 picked by the linker in the instrumentation build, so it guarantees to
 exist.

If you have COMDAT virtual inline function that is used by module FOO and
LIPO's indirect inlining will work out that it goes to comdat copy of module BAR
(that won the merging).  It will make C++ parser to also parse BAR to get
the function, but callgarph builder will ignore it, too.
(this is new logic in analyze_function that with -fno-devirtualize will just
prevent all virtual functions not directly reachable to appear in symbol table)

I am surprised you hit the size limits with 4.9 only - for quite some time
we keep all virtual functions in callgarph until inlining. In fact 4.9 is first
that works harder to drop them early (because I hit the problem with LTO
where they artifically bloat the size of LTO object files)

Honza
 
 David
 
 
 
  Honza
 
  David
 
 
  On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
   Disabling devirtualization reduces code size, both for instrumentation 
   (because
   many more virtual functions are kept longer and therefore instrumented) 
   and for
   normal optimization.
  
   OK, with profile instrumentation (that you seem to try to minimize) i 
   can see
   how you get noticeably more counters because virtual functions are kept 
   longer.
   (note that 4.9 is a lot more agressive on removing unreacable virtual 
   functions
   than earlier compilers).
  
   Instead of disabling -fdevirtualize completely (that will get you more 
   indirect
   calls and thus more topn profiling) you may consider just hacking
   ipa.c:walk_polymorphic_call_targets to not make the possible targets as
   reachable. (see the conditional on before_inlining_p).
  
   Of course this will get you less devirtualization (but with LTO the 
   difference
   should not be big - perhaps I could make switch for that for mainline) 
   and less
   accurate profiles when you get speculative devirtualization via topn.
  
   I would be very interested to see how much difference this makes.
  
   Honza


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Teresa Johnson
Hi Honza,

As David says, we will do some more experiments with the change you
suggest and speculative devirtualization, but we needed to make this
change in part to get an internal release out. One of the issues was a
recent change to cp/decl2.c to make virtual function decls needed
under flag_devirtualize.

Thanks!
Teresa

On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote:

 The virtual functions will be emitted in some modules, right? If they
 are hot, they will be included with the auxiliary modules. Note that
 LIPO indirect call profile will point to the comdat copy that is
 picked by the linker in the instrumentation build, so it guarantees to
 exist.

 If you have COMDAT virtual inline function that is used by module FOO and
 LIPO's indirect inlining will work out that it goes to comdat copy of module 
 BAR
 (that won the merging).  It will make C++ parser to also parse BAR to get
 the function, but callgarph builder will ignore it, too.
 (this is new logic in analyze_function that with -fno-devirtualize will just
 prevent all virtual functions not directly reachable to appear in symbol 
 table)

 I am surprised you hit the size limits with 4.9 only - for quite some time
 we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
 first
 that works harder to drop them early (because I hit the problem with LTO
 where they artifically bloat the size of LTO object files)

 Honza

 David


 
  Honza
 
  David
 
 
  On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
   Disabling devirtualization reduces code size, both for instrumentation 
   (because
   many more virtual functions are kept longer and therefore 
   instrumented) and for
   normal optimization.
  
   OK, with profile instrumentation (that you seem to try to minimize) i 
   can see
   how you get noticeably more counters because virtual functions are kept 
   longer.
   (note that 4.9 is a lot more agressive on removing unreacable virtual 
   functions
   than earlier compilers).
  
   Instead of disabling -fdevirtualize completely (that will get you more 
   indirect
   calls and thus more topn profiling) you may consider just hacking
   ipa.c:walk_polymorphic_call_targets to not make the possible targets as
   reachable. (see the conditional on before_inlining_p).
  
   Of course this will get you less devirtualization (but with LTO the 
   difference
   should not be big - perhaps I could make switch for that for mainline) 
   and less
   accurate profiles when you get speculative devirtualization via topn.
  
   I would be very interested to see how much difference this makes.
  
   Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Disable -fdevirtualize by default

2014-10-18 Thread Xinliang David Li
On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote:

 The virtual functions will be emitted in some modules, right? If they
 are hot, they will be included with the auxiliary modules. Note that
 LIPO indirect call profile will point to the comdat copy that is
 picked by the linker in the instrumentation build, so it guarantees to
 exist.

 If you have COMDAT virtual inline function that is used by module FOO and
 LIPO's indirect inlining will work out that it goes to comdat copy of module 
 BAR
 (that won the merging).  It will make C++ parser to also parse BAR to get
 the function, but callgarph builder will ignore it, too.
 (this is new logic in analyze_function that with -fno-devirtualize will just
 prevent all virtual functions not directly reachable to appear in symbol 
 table)

For LIPO, modules are parsed independently -- their initial callgraphs
are also mostly isolated (except for builtin nodes) before LIPO
linking happens. LIPO will guarantee that an aux module is processed
exactly the same as (before tree-profile pass) in the the
instrumentation build.


 I am surprised you hit the size limits with 4.9 only - for quite some time
 we keep all virtual functions in callgarph until inlining. In fact 4.9 is 
 first
 that works harder to drop them early (because I hit the problem with LTO
 where they artifically bloat the size of LTO object files)

We can dig it more to later understand why only 4.9 hits the problem.

My size results with -fno-devirtualize-speculatively is out. It
shrinks size by 1.68% -- slightly more than -fdevirtualize can do in
O2 compile.

By the way, you mentioned 'hacking the
ipa.c:walk_polymorphic_call_targets to not make the possible targets
as
reachable' -- is that something worth doing in trunk?   With that, we
can probably just turn off speculative devirtualization.

David




 Honza

 David


 
  Honza
 
  David
 
 
  On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote:
   Disabling devirtualization reduces code size, both for instrumentation 
   (because
   many more virtual functions are kept longer and therefore 
   instrumented) and for
   normal optimization.
  
   OK, with profile instrumentation (that you seem to try to minimize) i 
   can see
   how you get noticeably more counters because virtual functions are kept 
   longer.
   (note that 4.9 is a lot more agressive on removing unreacable virtual 
   functions
   than earlier compilers).
  
   Instead of disabling -fdevirtualize completely (that will get you more 
   indirect
   calls and thus more topn profiling) you may consider just hacking
   ipa.c:walk_polymorphic_call_targets to not make the possible targets as
   reachable. (see the conditional on before_inlining_p).
  
   Of course this will get you less devirtualization (but with LTO the 
   difference
   should not be big - perhaps I could make switch for that for mainline) 
   and less
   accurate profiles when you get speculative devirtualization via topn.
  
   I would be very interested to see how much difference this makes.
  
   Honza