> Dear Teresa and Jan, > I tried to test Teresa's patch, but I've encountered two bugs > during usage of -fprofile-generate/use (one in SPEC CPU 2006 and > Inkscape).
Thanks, this is non-LTO run. Is there a chance to get -flto version, too? So we see how things combine with -freorder-function > > This will be probably for Jan: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59266 > > second one: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59265 > > There are numbers I recorded for GIMP with and without block reordering. > > GIMP (-freorder-blocks-and-partition) > pages read (no readahead): 597 pages (4K) > > GIMP (-no-freorder-blocks-and-partition) > pages read (no readahead): 596 pages (4K) The graphs themselves seems bit odd however, why do we have so many accesses to cold section with -fno-reorder-blocks-and-partition again? Honza > > Martin > > On 19 November 2013 23:18, Teresa Johnson <tejohn...@google.com> wrote: > > On Tue, Nov 19, 2013 at 9:40 AM, Jeff Law <l...@redhat.com> wrote: > >> On 11/19/13 10:24, Teresa Johnson wrote: > >>> > >>> On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > >>>> > >>>> Martin, > >>>> can you, please, generate the updated systemtap with > >>>> -freorder-blocks-and-partition enabled? > >>>> > >>>> I am in favour of enabling this - it is usefull pass and it is pointless > >>>> ot > >>>> have passes that are not enabled by default. > >>>> Is there reason why this would not work on other ELF target? Is it > >>>> working > >>>> with Darwin and Windows? > >>> > >>> > >>> I don't know how to test these (I don't see any machines listed in the > >>> gcc compile farm of those types). For Windows, I assume you mean > >>> MinGW, which should be enabled as it is under i386. Should I disable > >>> it there and for Darwin? > >>> > >>>> > >>>>> This patch enables -freorder-blocks-and-partition by default for x86 > >>>>> at -O2 and up. It is showing some modest gains in cpu2006 performance > >>>>> with profile feedback and -O2 on an Intel Westmere system. Specifically, > >>>>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk > >>>>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions. > >>>> > >>>> > >>>> This actually sounds very good ;) > >>>> > >>>> Lets see how the systemtap graphs goes. If we will end up with problem > >>>> of too many accesses to cold section, I would suggest making cold section > >>>> subdivided into .unlikely and .unlikely.part (we could have better name) > >>>> with the second consisting only of unlikely parts of hot&normal > >>>> functions. > >>>> > >>>> This should reduce the problems we are seeing with mistakely identifying > >>>> code to be cold because of roundoff errors (and it probably makes sense > >>>> in general, too). > >>>> We will however need to update gold and ld for that. > >>> > >>> > >>> Note that I don't think this would help much unless the linker is > >>> changed to move the cold split section close to the hot section. There > >>> is probably some fine-tuning we could do eventually in the linker > >>> under -ffunction-sections without putting the split portions in a > >>> separate section. I.e. clump the split parts together within unlikely. > >>> But hopefully this can all be done later on as follow-on work to boost > >>> the performance further. > >>> > >>>>> > >>>>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal > >>>>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were > >>>>> configured with --enable-languages=all,obj-c++ and tested for both > >>>>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}". > >>>>> > >>>>> It would be good to enable this for additional targets as a follow on, > >>>>> but it needs more testing for both correctness and performance on those > >>>>> other targets (i.e for correctness because I see a number of places > >>>>> in other config/*/*.c files that do some special handling under this > >>>>> option for different targets or simply disable it, so I am not sure > >>>>> how well-tested it is under different architectural constraints). > >>>>> > >>>>> Ok for trunk? > >>>>> > >>>>> Thanks, > >>>>> Teresa > >>>>> > >>>>> 2013-11-19 Teresa Johnson <tejohn...@google.com> > >>>>> > >>>>> * common/config/i386/i386-common.c: Enable > >>>>> -freorder-blocks-and-partition at -O2 and up for x86. > >>>>> * opts.c (finish_options): Only warn if -freorder-blocks-and- > >>>>> partition was set on command line. > >>>> > >>>> > >>>> You probably mis doc/invoke.texi update. > >>>> Thank you for working on this! > >>> > >>> > >>> Yes, thanks. Here is the patch with the invoke.texi update. > >>> > >>> Teresa > >>> > >>> > >>> 2013-11-19 Teresa Johnson <tejohn...@google.com> > >>> > >>> * common/config/i386/i386-common.c: Enable > >>> -freorder-blocks-and-partition at -O2 and up for x86. > >>> * doc/invoke.texi: Update -freorder-blocks-and-partition default. > >>> * opts.c (finish_options): Only warn if -freorder-blocks-and- > >>> partition was set on command line. > >> > >> This is fine. Glad to see we're getting some good benefits from this code. > > > > Ok, thanks. It is in as r205058. > > > >> > >> FWIW, I would expect the PA to show benefits from this work -- HP showed > >> good results for block positioning and procedure splitting eons ago. A > >> secondary effect you would expect to see would be an increase in the number > >> of long branch stubs (static count), but a decrease in the number of those > >> stubs that actually execute. Measuring those effects would obviously be > >> out-of-scope. > >> > >> I totally understand that the PA is irrelevant these days, but seeing good > >> results on another architecture would go a long way to answering the > >> question "should this be enabled by default across the board?" > > > > Yep, we saw benefits for IPF on hp-ux as well. It would be great to > > eventually enable this across the board and only selectively disable. > > I know there were people interested in trying this with ARM, that > > would be a good relevant arch to try this with now to see if it can be > > enabled more widely. > > > > Teresa > > > >> > >> jeff > >> > >> > > > > > > > > -- > > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413