Re: [PATCH i386] Enable -freorder-blocks-and-partition

Jan Hubicka Thu, 28 Nov 2013 06:07:54 -0800

> Dear Teresa and Jan,
>    I tried to test Teresa's patch, but I've encountered two bugs
> during usage of -fprofile-generate/use (one in SPEC CPU 2006 and
> Inkscape).


Thanks, this is non-LTO run. Is there a chance to get -flto version, too?
So we see how things combine with -freorder-function
> 
> This will be probably for Jan:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59266
> 
> second one:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59265
> 
> There are numbers I recorded for GIMP with and without block reordering.
> 
> GIMP (-freorder-blocks-and-partition)
> pages read (no readahead): 597 pages (4K)
> 
> GIMP (-no-freorder-blocks-and-partition)
> pages read (no readahead): 596 pages (4K)

The graphs themselves seems bit odd however, why do we have so many accesses
to cold section with -fno-reorder-blocks-and-partition again?

Honza
> 
> Martin
> 
> On 19 November 2013 23:18, Teresa Johnson <tejohn...@google.com> wrote:
> > On Tue, Nov 19, 2013 at 9:40 AM, Jeff Law <l...@redhat.com> wrote:
> >> On 11/19/13 10:24, Teresa Johnson wrote:
> >>>
> >>> On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
> >>>>
> >>>> Martin,
> >>>> can you, please, generate the updated systemtap with
> >>>> -freorder-blocks-and-partition enabled?
> >>>>
> >>>> I am in favour of enabling this - it is usefull pass and it is pointless
> >>>> ot
> >>>> have passes that are not enabled by default.
> >>>> Is there reason why this would not work on other ELF target? Is it
> >>>> working
> >>>> with Darwin and Windows?
> >>>
> >>>
> >>> I don't know how to test these (I don't see any machines listed in the
> >>> gcc compile farm of those types). For Windows, I assume you mean
> >>> MinGW, which should be enabled as it is under i386. Should I disable
> >>> it there and for Darwin?
> >>>
> >>>>
> >>>>> This patch enables -freorder-blocks-and-partition by default for x86
> >>>>> at -O2 and up. It is showing some modest gains in cpu2006 performance
> >>>>> with profile feedback and -O2 on an Intel Westmere system. Specifically,
> >>>>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
> >>>>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
> >>>>
> >>>>
> >>>> This actually sounds very good ;)
> >>>>
> >>>> Lets see how the systemtap graphs goes.  If we will end up with problem
> >>>> of too many accesses to cold section, I would suggest making cold section
> >>>> subdivided into .unlikely and .unlikely.part (we could have better name)
> >>>> with the second consisting only of unlikely parts of hot&normal
> >>>> functions.
> >>>>
> >>>> This should reduce the problems we are seeing with mistakely identifying
> >>>> code to be cold because of roundoff errors (and it probably makes sense
> >>>> in general, too).
> >>>> We will however need to update gold and ld for that.
> >>>
> >>>
> >>> Note that I don't think this would help much unless the linker is
> >>> changed to move the cold split section close to the hot section. There
> >>> is probably some fine-tuning we could do eventually in the linker
> >>> under -ffunction-sections without putting the split portions in a
> >>> separate section. I.e. clump the split parts together within unlikely.
> >>> But hopefully this can all be done later on as follow-on work to boost
> >>> the performance further.
> >>>
> >>>>>
> >>>>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
> >>>>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
> >>>>> configured with --enable-languages=all,obj-c++ and tested for both
> >>>>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".
> >>>>>
> >>>>> It would be good to enable this for additional targets as a follow on,
> >>>>> but it needs more testing for both correctness and performance on those
> >>>>> other targets (i.e for correctness because I see a number of places
> >>>>> in other config/*/*.c files that do some special handling under this
> >>>>> option for different targets or simply disable it, so I am not sure
> >>>>> how well-tested it is under different architectural constraints).
> >>>>>
> >>>>> Ok for trunk?
> >>>>>
> >>>>> Thanks,
> >>>>> Teresa
> >>>>>
> >>>>> 2013-11-19  Teresa Johnson  <tejohn...@google.com>
> >>>>>
> >>>>>          * common/config/i386/i386-common.c: Enable
> >>>>>          -freorder-blocks-and-partition at -O2 and up for x86.
> >>>>>          * opts.c (finish_options): Only warn if -freorder-blocks-and-
> >>>>>          partition was set on command line.
> >>>>
> >>>>
> >>>> You probably mis doc/invoke.texi update.
> >>>> Thank you for working on this!
> >>>
> >>>
> >>> Yes, thanks. Here is the patch with the invoke.texi update.
> >>>
> >>> Teresa
> >>>
> >>>
> >>> 2013-11-19  Teresa Johnson  <tejohn...@google.com>
> >>>
> >>>          * common/config/i386/i386-common.c: Enable
> >>>          -freorder-blocks-and-partition at -O2 and up for x86.
> >>>          * doc/invoke.texi: Update -freorder-blocks-and-partition default.
> >>>          * opts.c (finish_options): Only warn if -freorder-blocks-and-
> >>>          partition was set on command line.
> >>
> >> This is fine.  Glad to see we're getting some good benefits from this code.
> >
> > Ok, thanks. It is in as r205058.
> >
> >>
> >> FWIW, I would expect the PA to show benefits from this work -- HP showed
> >> good results for block positioning and procedure splitting eons ago.  A
> >> secondary effect you would expect to see would be an increase in the number
> >> of long branch stubs (static count), but a decrease in the number of those
> >> stubs that actually execute.  Measuring those effects would obviously be
> >> out-of-scope.
> >>
> >> I totally understand that the PA is irrelevant these days, but seeing good
> >> results on another architecture would go a long way to answering the
> >> question "should this be enabled by default across the board?"
> >
> > Yep, we saw benefits for IPF on hp-ux as well. It would be great to
> > eventually enable this across the board and only selectively disable.
> > I know there were people interested in trying this with ARM, that
> > would be a good relevant arch to try this with now to see if it can be
> > enabled more widely.
> >
> > Teresa
> >
> >>
> >> jeff
> >>
> >>
> >
> >
> >
> > --
> > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH i386] Enable -freorder-blocks-and-partition

Reply via email to