Re: [GOOGLE] Disable -fdevirtualize by default
I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) We can dig it more to later understand why only 4.9 hits the problem. This would be very interesting, because 4.9 ought to be better here (removing more virtuals early) than previous compilers. There was number of changes in 4.9 that may affect this - some fixes at C++ side giving middle end more inline candidates and also this change https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html So perhaps most of your virtual functions are !COMDAT!EXTERNAL, but that does not seem to make much sense to me either :( My size results with -fno-devirtualize-speculatively is out. It shrinks size by 1.68% -- slightly more than -fdevirtualize can do in O2 compile. Hmm, this is interesting, too. 1.68% is definitly a lot more than I would expect or have I seen on other testcases. You can take a look at summaries in -fdump-ipa-devirt pass. In opts.c I have: case OPT_fprofile_use: if (!opts_set-x_flag_branch_probabilities) ... /* Indirect call profiling should do all useful transformations speculative devirtualization does. */ if (!opts_set-x_flag_devirtualize_speculatively opts-x_flag_value_profile_transformations) opts-x_flag_devirtualize_speculatively = false; so perhaps this hunk is somehow skipped with LIPO? Speculative devirtualization is somehwat less useful (may have more falce positives) without LTO depending on how your headers are constructed. It would be interesting to see if it does a lot of mistakes on your codebase. (this can be easily done by forcing it to run with profile feedback, too and it will tell you when its speculation differs from speculation already there). By the way, you mentioned 'hacking the ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable' -- is that something worth doing in trunk? With that, we can probably just turn off speculative devirtualization. Well, the check is there to enable inlining. Disabling it for -fprofile-generate will result in lost profile samples for virtual functions. Disabling it by default will prevent inlining of devirtualized calls making devirtualization not really useful. Perhaps with LIPO situation is bit different because you bring in the other module just to inline the call as you describe. One thing I can imagine doing is to make inliner consider the reachable (in post-inlining sense, that is after removing extern inlines and virtual functions) calls with priority and account only those to unit growth model. This would make it more consistent over -fdevirtualize and more realistic about resulting code size. I sort of considered this option but did not have any good data suggesting I should implement it. In general it would be nice to understand this problem. Also I plan to do some retunning for 5.0 so it would be nice to know if you have other issues with 4.9? (I did not closely followed Google branch changes, so if you can point out those that are relevant for IPA tuning, I would be very interested to see what problems you hit). Honza David Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
Hi Honza, As David says, we will do some more experiments with the change you suggest and speculative devirtualization, but we needed to make this change in part to get an internal release out. One of the issues was a recent change to cp/decl2.c to make virtual function decls needed under flag_devirtualize. I think that one is not in current 4.9 branch (only in mainline). Honza Thanks! Teresa On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote: The virtual functions will be emitted in some modules, right? If they are hot, they will be included with the auxiliary modules. Note that LIPO indirect call profile will point to the comdat copy that is picked by the linker in the instrumentation build, so it guarantees to exist. If you have COMDAT virtual inline function that is used by module FOO and LIPO's indirect inlining will work out that it goes to comdat copy of module BAR (that won the merging). It will make C++ parser to also parse BAR to get the function, but callgarph builder will ignore it, too. (this is new logic in analyze_function that with -fno-devirtualize will just prevent all virtual functions not directly reachable to appear in symbol table) I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
On Sun, Oct 19, 2014 at 1:19 AM, Jan Hubicka hubi...@ucw.cz wrote: I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) We can dig it more to later understand why only 4.9 hits the problem. This would be very interesting, because 4.9 ought to be better here (removing more virtuals early) than previous compilers. There was number of changes in 4.9 that may affect this - some fixes at C++ side giving middle end more inline candidates and also this change https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html So perhaps most of your virtual functions are !COMDAT!EXTERNAL, but that does not seem to make much sense to me either :( My size results with -fno-devirtualize-speculatively is out. It shrinks size by 1.68% -- slightly more than -fdevirtualize can do in O2 compile. Hmm, this is interesting, too. 1.68% is definitly a lot more than I would expect or have I seen on other testcases. You can take a look at summaries in -fdump-ipa-devirt pass. In opts.c I have: case OPT_fprofile_use: if (!opts_set-x_flag_branch_probabilities) ... /* Indirect call profiling should do all useful transformations speculative devirtualization does. */ if (!opts_set-x_flag_devirtualize_speculatively opts-x_flag_value_profile_transformations) opts-x_flag_devirtualize_speculatively = false; so perhaps this hunk is somehow skipped with LIPO? The 1.68% size reduction I measured is with plain O2 compilation, not LIPO. Speculative devirtualization is somehwat less useful (may have more falce positives) without LTO depending on how your headers are constructed. It would be interesting to see if it does a lot of mistakes on your codebase. (this can be easily done by forcing it to run with profile feedback, too and it will tell you when its speculation differs from speculation already there). By the way, you mentioned 'hacking the ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable' -- is that something worth doing in trunk? With that, we can probably just turn off speculative devirtualization. Well, the check is there to enable inlining. Disabling it for -fprofile-generate will result in lost profile samples for virtual functions. Disabling it by default will prevent inlining of devirtualized calls making devirtualization not really useful. Perhaps with LIPO situation is bit different because you bring in the other module just to inline the call as you describe. What I meant is whether it is suitable for plain build ? I have not looked at the details. One thing I can imagine doing is to make inliner consider the reachable (in post-inlining sense, that is after removing extern inlines and virtual functions) calls with priority and account only those to unit growth model. This would make it more consistent over -fdevirtualize and more realistic about resulting code size. I sort of considered this option but did not have any good data suggesting I should implement it. In general it would be nice to understand this problem. Also I plan to do some retunning for 5.0 so it would be nice to know if you have other issues with 4.9? (I did not closely followed Google branch changes, so if you can point out those that are relevant for IPA tuning, I would be very interested to see what problems you hit). Ok I will collect a list of inlining related changes and let you know latter. thanks, David Honza David Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes.
Re: [GOOGLE] Disable -fdevirtualize by default
In opts.c I have: case OPT_fprofile_use: if (!opts_set-x_flag_branch_probabilities) ... /* Indirect call profiling should do all useful transformations speculative devirtualization does. */ if (!opts_set-x_flag_devirtualize_speculatively opts-x_flag_value_profile_transformations) opts-x_flag_devirtualize_speculatively = false; so perhaps this hunk is somehow skipped with LIPO? The 1.68% size reduction I measured is with plain O2 compilation, not LIPO. OK, in that case I may look into trimming down the speculation with non-LTO compilation. The code makes assumptions about completeness of type hiearchy that without LTO are quite far reached. I tested it on firefox and there it seemed to do mostly good job, so I decided to try to enable it for non-LTO, too. It would be nice to understand what really happens in your case and if there are lot of wrong speculations or if the speculated calls simply happens to not be on hot paths in your benchmark. Speculative devirtualization is somehwat less useful (may have more falce positives) without LTO depending on how your headers are constructed. It would be interesting to see if it does a lot of mistakes on your codebase. (this can be easily done by forcing it to run with profile feedback, too and it will tell you when its speculation differs from speculation already there). By the way, you mentioned 'hacking the ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable' -- is that something worth doing in trunk? With that, we can probably just turn off speculative devirtualization. Well, the check is there to enable inlining. Disabling it for -fprofile-generate will result in lost profile samples for virtual functions. Disabling it by default will prevent inlining of devirtualized calls making devirtualization not really useful. Perhaps with LIPO situation is bit different because you bring in the other module just to inline the call as you describe. What I meant is whether it is suitable for plain build ? I have not looked at the details. Yes, dropping the reachability walk should just work, only result in less inlining. Honza One thing I can imagine doing is to make inliner consider the reachable (in post-inlining sense, that is after removing extern inlines and virtual functions) calls with priority and account only those to unit growth model. This would make it more consistent over -fdevirtualize and more realistic about resulting code size. I sort of considered this option but did not have any good data suggesting I should implement it. In general it would be nice to understand this problem. Also I plan to do some retunning for 5.0 so it would be nice to know if you have other issues with 4.9? (I did not closely followed Google branch changes, so if you can point out those that are relevant for IPA tuning, I would be very interested to see what problems you hit). Ok I will collect a list of inlining related changes and let you know latter. thanks, David Honza David Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
It was in upstream 4.9 branch, but got reverted in r215061 to fix PR/61214 and PR/62224. David On Sun, Oct 19, 2014 at 1:21 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi Honza, As David says, we will do some more experiments with the change you suggest and speculative devirtualization, but we needed to make this change in part to get an internal release out. One of the issues was a recent change to cp/decl2.c to make virtual function decls needed under flag_devirtualize. I think that one is not in current 4.9 branch (only in mainline). Honza Thanks! Teresa On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote: The virtual functions will be emitted in some modules, right? If they are hot, they will be included with the auxiliary modules. Note that LIPO indirect call profile will point to the comdat copy that is picked by the linker in the instrumentation build, so it guarantees to exist. If you have COMDAT virtual inline function that is used by module FOO and LIPO's indirect inlining will work out that it goes to comdat copy of module BAR (that won the merging). It will make C++ parser to also parse BAR to get the function, but callgarph builder will ignore it, too. (this is new logic in analyze_function that with -fno-devirtualize will just prevent all virtual functions not directly reachable to appear in symbol table) I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
It was in upstream 4.9 branch, but got reverted in r215061 to fix PR/61214 and PR/62224. Was your tests done with this change or without? Honza
Re: [GOOGLE] Disable -fdevirtualize by default
No, Google branch is not yet synced to 4.9 head. David On Sun, Oct 19, 2014 at 1:11 PM, Jan Hubicka hubi...@ucw.cz wrote: It was in upstream 4.9 branch, but got reverted in r215061 to fix PR/61214 and PR/62224. Was your tests done with this change or without? Honza
[GOOGLE] Disable -fdevirtualize by default
Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Patch attached. Passes internal testing and regression tests. Ok for google/4_9? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Passes internal testing and regression tests. Ok for google/4_9? 2014-10-18 Teresa Johnson tejohn...@google.com gcc/: Google ref b/17945455 * opts.c (finish_options): Disable -fdevirtualize gcc/testsuite/: Google ref b/17945455 * testsuite/g++.dg/ipa/devirt-10.C: Enable devirtualization for test. * testsuite/g++.dg/ipa/devirt-11.C: Ditto. * testsuite/g++.dg/ipa/devirt-12.C: Ditto. * testsuite/g++.dg/ipa/devirt-13.C: Ditto. * testsuite/g++.dg/ipa/devirt-14.C: Ditto. * testsuite/g++.dg/ipa/devirt-15.C: Ditto. * testsuite/g++.dg/ipa/devirt-16.C: Ditto. * testsuite/g++.dg/ipa/devirt-17.C: Ditto. * testsuite/g++.dg/ipa/devirt-18.C: Ditto. * testsuite/g++.dg/ipa/devirt-19.C: Ditto. * testsuite/g++.dg/ipa/devirt-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-20.C: Ditto. * testsuite/g++.dg/ipa/devirt-21.C: Ditto. * testsuite/g++.dg/ipa/devirt-22.C: Ditto. * testsuite/g++.dg/ipa/devirt-23.C: Ditto. * testsuite/g++.dg/ipa/devirt-24.C: Ditto. * testsuite/g++.dg/ipa/devirt-25.C: Ditto. * testsuite/g++.dg/ipa/devirt-26.C: Ditto. * testsuite/g++.dg/ipa/devirt-27.C: Ditto. * testsuite/g++.dg/ipa/devirt-28.C: Ditto. * testsuite/g++.dg/ipa/devirt-29.C: Ditto. * testsuite/g++.dg/ipa/devirt-2.C: Ditto. * testsuite/g++.dg/ipa/devirt-30.C: Ditto. * testsuite/g++.dg/ipa/devirt-31.C: Ditto. * testsuite/g++.dg/ipa/devirt-39.C: Ditto. * testsuite/g++.dg/ipa/devirt-3.C: Ditto. * testsuite/g++.dg/ipa/devirt-4.C: Ditto. * testsuite/g++.dg/ipa/devirt-5.C: Ditto. * testsuite/g++.dg/ipa/devirt-6.C: Ditto. * testsuite/g++.dg/ipa/devirt-7.C: Ditto. * testsuite/g++.dg/ipa/devirt-9.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-2.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-3.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-4.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-5.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-6.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-7.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-8.C: Ditto. * testsuite/g++.dg/ipa/devirt-d-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-g-1.C: Ditto. * testsuite/g++.dg/ipa/imm-devirt-1.C: Ditto. * testsuite/g++.dg/ipa/imm-devirt-2.C: Ditto. * testsuite/g++.dg/ipa/ivinline-1.C: Ditto. * testsuite/g++.dg/ipa/ivinline-2.C: Ditto. * testsuite/g++.dg/ipa/ivinline-3.C: Ditto. * testsuite/g++.dg/ipa/ivinline-4.C: Ditto. * testsuite/g++.dg/ipa/ivinline-5.C: Ditto. * testsuite/g++.dg/ipa/ivinline-7.C: Ditto. * testsuite/g++.dg/ipa/ivinline-8.C: Ditto. * testsuite/g++.dg/ipa/ivinline-9.C: Ditto. * testsuite/g++.dg/ipa/pr60600.C: Ditto. * testsuite/g++.dg/ipa/pr60640-4.C: Ditto. * testsuite/g++.dg/ipa/type-inheritance-1.C: Ditto. * testsuite/g++.dg/opt/devirt1.C: Ditto. * testsuite/g++.dg/opt/devirt2.C: Ditto. * testsuite/g++.dg/opt/devirt3.C: Ditto. Index: opts.c === --- opts.c (revision 216286) +++ opts.c (working copy) @@ -485,7 +485,6 @@ static const struct default_options default_option { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 }, -{ OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 }, Index: testsuite/g++.dg/ipa/devirt-10.C === --- testsuite/g++.dg/ipa/devirt-10.C(revision 216286) +++ testsuite/g++.dg/ipa/devirt-10.C(working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining } */ +/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining -fdevirtualize } */ class wxPaintEvent { }; struct wxDCBase { Index: testsuite/g++.dg/ipa/devirt-11.C === --- testsuite/g++.dg/ipa/devirt-11.C
Re: [GOOGLE] Disable -fdevirtualize by default
Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Can you refer bit more on the problem? Full devirtualization should not increase code size, speculative devirtualization does somewhat, but by sub 1%. Honza Patch attached. Passes internal testing and regression tests. Ok for google/4_9? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Passes internal testing and regression tests. Ok for google/4_9? 2014-10-18 Teresa Johnson tejohn...@google.com gcc/: Google ref b/17945455 * opts.c (finish_options): Disable -fdevirtualize gcc/testsuite/: Google ref b/17945455 * testsuite/g++.dg/ipa/devirt-10.C: Enable devirtualization for test. * testsuite/g++.dg/ipa/devirt-11.C: Ditto. * testsuite/g++.dg/ipa/devirt-12.C: Ditto. * testsuite/g++.dg/ipa/devirt-13.C: Ditto. * testsuite/g++.dg/ipa/devirt-14.C: Ditto. * testsuite/g++.dg/ipa/devirt-15.C: Ditto. * testsuite/g++.dg/ipa/devirt-16.C: Ditto. * testsuite/g++.dg/ipa/devirt-17.C: Ditto. * testsuite/g++.dg/ipa/devirt-18.C: Ditto. * testsuite/g++.dg/ipa/devirt-19.C: Ditto. * testsuite/g++.dg/ipa/devirt-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-20.C: Ditto. * testsuite/g++.dg/ipa/devirt-21.C: Ditto. * testsuite/g++.dg/ipa/devirt-22.C: Ditto. * testsuite/g++.dg/ipa/devirt-23.C: Ditto. * testsuite/g++.dg/ipa/devirt-24.C: Ditto. * testsuite/g++.dg/ipa/devirt-25.C: Ditto. * testsuite/g++.dg/ipa/devirt-26.C: Ditto. * testsuite/g++.dg/ipa/devirt-27.C: Ditto. * testsuite/g++.dg/ipa/devirt-28.C: Ditto. * testsuite/g++.dg/ipa/devirt-29.C: Ditto. * testsuite/g++.dg/ipa/devirt-2.C: Ditto. * testsuite/g++.dg/ipa/devirt-30.C: Ditto. * testsuite/g++.dg/ipa/devirt-31.C: Ditto. * testsuite/g++.dg/ipa/devirt-39.C: Ditto. * testsuite/g++.dg/ipa/devirt-3.C: Ditto. * testsuite/g++.dg/ipa/devirt-4.C: Ditto. * testsuite/g++.dg/ipa/devirt-5.C: Ditto. * testsuite/g++.dg/ipa/devirt-6.C: Ditto. * testsuite/g++.dg/ipa/devirt-7.C: Ditto. * testsuite/g++.dg/ipa/devirt-9.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-2.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-3.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-4.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-5.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-6.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-7.C: Ditto. * testsuite/g++.dg/ipa/devirt-c-8.C: Ditto. * testsuite/g++.dg/ipa/devirt-d-1.C: Ditto. * testsuite/g++.dg/ipa/devirt-g-1.C: Ditto. * testsuite/g++.dg/ipa/imm-devirt-1.C: Ditto. * testsuite/g++.dg/ipa/imm-devirt-2.C: Ditto. * testsuite/g++.dg/ipa/ivinline-1.C: Ditto. * testsuite/g++.dg/ipa/ivinline-2.C: Ditto. * testsuite/g++.dg/ipa/ivinline-3.C: Ditto. * testsuite/g++.dg/ipa/ivinline-4.C: Ditto. * testsuite/g++.dg/ipa/ivinline-5.C: Ditto. * testsuite/g++.dg/ipa/ivinline-7.C: Ditto. * testsuite/g++.dg/ipa/ivinline-8.C: Ditto. * testsuite/g++.dg/ipa/ivinline-9.C: Ditto. * testsuite/g++.dg/ipa/pr60600.C: Ditto. * testsuite/g++.dg/ipa/pr60640-4.C: Ditto. * testsuite/g++.dg/ipa/type-inheritance-1.C: Ditto. * testsuite/g++.dg/opt/devirt1.C: Ditto. * testsuite/g++.dg/opt/devirt2.C: Ditto. * testsuite/g++.dg/opt/devirt3.C: Ditto. Index: opts.c === --- opts.c(revision 216286) +++ opts.c(working copy) @@ -485,7 +485,6 @@ static const struct default_options default_option { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_switch_conversion, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_cp, NULL, 1 }, -{ OPT_LEVELS_2_PLUS, OPT_fdevirtualize, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fdevirtualize_speculatively, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fipa_sra, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_falign_loops, NULL, 1 }, Index: testsuite/g++.dg/ipa/devirt-10.C === --- testsuite/g++.dg/ipa/devirt-10.C (revision 216286) +++ testsuite/g++.dg/ipa/devirt-10.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining } */ +/* { dg-options -O3 -fdump-ipa-inline -fdump-ipa-cp -fno-early-inlining -fdevirtualize } */ class wxPaintEvent { }; struct wxDCBase {
Re: [GOOGLE] Disable -fdevirtualize by default
Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
ok. David On Sat, Oct 18, 2014 at 9:25 AM, Teresa Johnson tejohn...@google.com wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. Patch attached. Passes internal testing and regression tests. Ok for google/4_9? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
Devirtualization needs more tuning to be usable internally. We have seen 1.6% size increase on average, but unnoticable performance improvement with this option on. I would rather use the size budget to make inliner more aggressive :) For instrumentation build, some of our builds will reach the build machine limit with this option on. In LIPO mode, the compiler uses indirect target profile counter to build fake edges in order to keep the targets alive -- this is probably the reason we do not see an performance impact. David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
Devirtualization needs more tuning to be usable internally. We have seen 1.6% size increase on average, but unnoticable performance Does that happen with -fno-devirtualize-speculatively? improvement with this option on. I would rather use the size budget to make inliner more aggressive :) :) Well, that is what devirtualization is supposed to do. All it does is converting indirect calls to direct calls that can increase size only if extra inlining happens. Speculative devirtualization will increase code size, but 1% seems quite off - you would need very many virtual calls in the code to get close to that. Also it turns itself off with profile feedback. You can try -fno-devirtualize-speculatively Thinking about it perhaps what really happens is the following: You have a lot of potentially reachable virtual functions in header and those are kept alive until inlining because one of virtual calls in your unit may be devirtualized to them (change to 4.8 is that we no longer keep all virtual functions, just those that may be reached). This increase unit size estimates and thus gives inliner more budget to inline until it hits inline-unit-growth limits. For instrumentation build, some of our builds will reach the build machine limit with this option on. Can you try to just disable the reachability code in ipa.c (as I wrote earlier) and see if it makes difference? In LIPO mode, the compiler uses indirect target profile counter to build fake edges in order to keep the targets alive -- this is probably the reason we do not see an performance impact. Does it? With -fno-devirtualize you won't even get the virtual functions added into the callgraph unless you disable the logic in cgraphunit.c Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
On Sat, Oct 18, 2014 at 3:23 PM, Jan Hubicka hubi...@ucw.cz wrote: Devirtualization needs more tuning to be usable internally. We have seen 1.6% size increase on average, but unnoticable performance Does that happen with -fno-devirtualize-speculatively? Not sure -- I will do some experiment. improvement with this option on. I would rather use the size budget to make inliner more aggressive :) :) Well, that is what devirtualization is supposed to do. All it does is converting indirect calls to direct calls that can increase size only if extra inlining happens. Speculative devirtualization will increase code size, but 1% seems quite off - you would need very many virtual calls in the code to get close to that. Also it turns itself off with profile feedback. You can try -fno-devirtualize-speculatively Thinking about it perhaps what really happens is the following: You have a lot of potentially reachable virtual functions in header and those are kept alive until inlining because one of virtual calls in your unit may be devirtualized to them (change to 4.8 is that we no longer keep all virtual functions, just those that may be reached). This increase unit size estimates and thus gives inliner more budget to inline until it hits inline-unit-growth limits. This is likely -- we will study this more in details later. For instrumentation build, some of our builds will reach the build machine limit with this option on. Can you try to just disable the reachability code in ipa.c (as I wrote earlier) and see if it makes difference? This is certainly better if also solves the problem, but will do that later after our internal release is out. We hit the size and slow instrumentation build issue pretty late in the cycle. In LIPO mode, the compiler uses indirect target profile counter to build fake edges in order to keep the targets alive -- this is probably the reason we do not see an performance impact. Does it? With -fno-devirtualize you won't even get the virtual functions added into the callgraph unless you disable the logic in cgraphunit.c The virtual functions will be emitted in some modules, right? If they are hot, they will be included with the auxiliary modules. Note that LIPO indirect call profile will point to the comdat copy that is picked by the linker in the instrumentation build, so it guarantees to exist. David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
The virtual functions will be emitted in some modules, right? If they are hot, they will be included with the auxiliary modules. Note that LIPO indirect call profile will point to the comdat copy that is picked by the linker in the instrumentation build, so it guarantees to exist. If you have COMDAT virtual inline function that is used by module FOO and LIPO's indirect inlining will work out that it goes to comdat copy of module BAR (that won the merging). It will make C++ parser to also parse BAR to get the function, but callgarph builder will ignore it, too. (this is new logic in analyze_function that with -fno-devirtualize will just prevent all virtual functions not directly reachable to appear in symbol table) I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza
Re: [GOOGLE] Disable -fdevirtualize by default
Hi Honza, As David says, we will do some more experiments with the change you suggest and speculative devirtualization, but we needed to make this change in part to get an internal release out. One of the issues was a recent change to cp/decl2.c to make virtual function decls needed under flag_devirtualize. Thanks! Teresa On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote: The virtual functions will be emitted in some modules, right? If they are hot, they will be included with the auxiliary modules. Note that LIPO indirect call profile will point to the comdat copy that is picked by the linker in the instrumentation build, so it guarantees to exist. If you have COMDAT virtual inline function that is used by module FOO and LIPO's indirect inlining will work out that it goes to comdat copy of module BAR (that won the merging). It will make C++ parser to also parse BAR to get the function, but callgarph builder will ignore it, too. (this is new logic in analyze_function that with -fno-devirtualize will just prevent all virtual functions not directly reachable to appear in symbol table) I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka hubi...@ucw.cz wrote: The virtual functions will be emitted in some modules, right? If they are hot, they will be included with the auxiliary modules. Note that LIPO indirect call profile will point to the comdat copy that is picked by the linker in the instrumentation build, so it guarantees to exist. If you have COMDAT virtual inline function that is used by module FOO and LIPO's indirect inlining will work out that it goes to comdat copy of module BAR (that won the merging). It will make C++ parser to also parse BAR to get the function, but callgarph builder will ignore it, too. (this is new logic in analyze_function that with -fno-devirtualize will just prevent all virtual functions not directly reachable to appear in symbol table) For LIPO, modules are parsed independently -- their initial callgraphs are also mostly isolated (except for builtin nodes) before LIPO linking happens. LIPO will guarantee that an aux module is processed exactly the same as (before tree-profile pass) in the the instrumentation build. I am surprised you hit the size limits with 4.9 only - for quite some time we keep all virtual functions in callgarph until inlining. In fact 4.9 is first that works harder to drop them early (because I hit the problem with LTO where they artifically bloat the size of LTO object files) We can dig it more to later understand why only 4.9 hits the problem. My size results with -fno-devirtualize-speculatively is out. It shrinks size by 1.68% -- slightly more than -fdevirtualize can do in O2 compile. By the way, you mentioned 'hacking the ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable' -- is that something worth doing in trunk? With that, we can probably just turn off speculative devirtualization. David Honza David Honza David On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka hubi...@ucw.cz wrote: Disabling devirtualization reduces code size, both for instrumentation (because many more virtual functions are kept longer and therefore instrumented) and for normal optimization. OK, with profile instrumentation (that you seem to try to minimize) i can see how you get noticeably more counters because virtual functions are kept longer. (note that 4.9 is a lot more agressive on removing unreacable virtual functions than earlier compilers). Instead of disabling -fdevirtualize completely (that will get you more indirect calls and thus more topn profiling) you may consider just hacking ipa.c:walk_polymorphic_call_targets to not make the possible targets as reachable. (see the conditional on before_inlining_p). Of course this will get you less devirtualization (but with LTO the difference should not be big - perhaps I could make switch for that for mainline) and less accurate profiles when you get speculative devirtualization via topn. I would be very interested to see how much difference this makes. Honza