On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > Martin, > can you, please, generate the updated systemtap with > -freorder-blocks-and-partition enabled? > > I am in favour of enabling this - it is usefull pass and it is pointless ot > have passes that are not enabled by default. > Is there reason why this would not work on other ELF target? Is it working > with Darwin and Windows?
I don't know how to test these (I don't see any machines listed in the gcc compile farm of those types). For Windows, I assume you mean MinGW, which should be enabled as it is under i386. Should I disable it there and for Darwin? > >> This patch enables -freorder-blocks-and-partition by default for x86 >> at -O2 and up. It is showing some modest gains in cpu2006 performance >> with profile feedback and -O2 on an Intel Westmere system. Specifically, >> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk >> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. > > This actually sounds very good ;) > > Lets see how the systemtap graphs goes. If we will end up with problem > of too many accesses to cold section, I would suggest making cold section > subdivided into .unlikely and .unlikely.part (we could have better name) > with the second consisting only of unlikely parts of hot&normal functions. > > This should reduce the problems we are seeing with mistakely identifying > code to be cold because of roundoff errors (and it probably makes sense > in general, too). > We will however need to update gold and ld for that. Note that I don't think this would help much unless the linker is changed to move the cold split section close to the hot section. There is probably some fine-tuning we could do eventually in the linker under -ffunction-sections without putting the split portions in a separate section. I.e. clump the split parts together within unlikely. But hopefully this can all be done later on as follow-on work to boost the performance further. >> >> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal >> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were >> configured with --enable-languages=all,obj-c++ and tested for both >> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}". >> >> It would be good to enable this for additional targets as a follow on, >> but it needs more testing for both correctness and performance on those >> other targets (i.e for correctness because I see a number of places >> in other config/*/*.c files that do some special handling under this >> option for different targets or simply disable it, so I am not sure >> how well-tested it is under different architectural constraints). >> >> Ok for trunk? >> >> Thanks, >> Teresa >> >> 2013-11-19 Teresa Johnson <tejohn...@google.com> >> >> * common/config/i386/i386-common.c: Enable >> -freorder-blocks-and-partition at -O2 and up for x86. >> * opts.c (finish_options): Only warn if -freorder-blocks-and- >> partition was set on command line. > > You probably mis doc/invoke.texi update. > Thank you for working on this! Yes, thanks. Here is the patch with the invoke.texi update. Teresa 2013-11-19 Teresa Johnson <tejohn...@google.com> * common/config/i386/i386-common.c: Enable -freorder-blocks-and-partition at -O2 and up for x86. * doc/invoke.texi: Update -freorder-blocks-and-partition default. * opts.c (finish_options): Only warn if -freorder-blocks-and- partition was set on command line. Index: common/config/i386/i386-common.c =================================================================== --- common/config/i386/i386-common.c (revision 205001) +++ common/config/i386/i386-common.c (working copy) @@ -789,6 +789,8 @@ static const struct default_options ix86_option_op { /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, + /* Enable function splitting at -O2 and higher. */ + { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 }, /* Turn off -fschedule-insns by default. It tends to make the problem with not enough registers even worse. */ { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 205001) +++ doc/invoke.texi (working copy) @@ -8172,6 +8172,8 @@ exception handling, for linkonce sections, for fun section attribute and on any architecture that does not support named sections. +Enabled for x86 at levels @option{-O2}, @option{-O3}. + @item -freorder-functions @opindex freorder-functions Reorder functions in the object file in order to Index: opts.c =================================================================== --- opts.c (revision 205001) +++ opts.c (working copy) @@ -737,9 +737,10 @@ finish_options (struct gcc_options *opts, struct g && opts->x_flag_reorder_blocks_and_partition && (ui_except == UI_SJLJ || ui_except >= UI_TARGET)) { - inform (loc, - "-freorder-blocks-and-partition does not work " - "with exceptions on this architecture"); + if (opts_set->x_flag_reorder_blocks_and_partition) + inform (loc, + "-freorder-blocks-and-partition does not work " + "with exceptions on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } @@ -752,9 +753,10 @@ finish_options (struct gcc_options *opts, struct g && opts->x_flag_reorder_blocks_and_partition && (ui_except == UI_SJLJ || ui_except >= UI_TARGET)) { - inform (loc, - "-freorder-blocks-and-partition does not support " - "unwind info on this architecture"); + if (opts_set->x_flag_reorder_blocks_and_partition) + inform (loc, + "-freorder-blocks-and-partition does not support " + "unwind info on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } @@ -769,9 +771,10 @@ finish_options (struct gcc_options *opts, struct g && targetm_common.unwind_tables_default && (ui_except == UI_SJLJ || ui_except >= UI_TARGET)))) { - inform (loc, - "-freorder-blocks-and-partition does not work " - "on this architecture"); + if (opts_set->x_flag_reorder_blocks_and_partition) + inform (loc, + "-freorder-blocks-and-partition does not work " + "on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413