On Tue, Nov 19, 2013 at 9:40 AM, Jeff Law <l...@redhat.com> wrote:
> On 11/19/13 10:24, Teresa Johnson wrote:
>>
>> On Tue, Nov 19, 2013 at 7:44 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>
>>> Martin,
>>> can you, please, generate the updated systemtap with
>>> -freorder-blocks-and-partition enabled?
>>>
>>> I am in favour of enabling this - it is usefull pass and it is pointless
>>> ot
>>> have passes that are not enabled by default.
>>> Is there reason why this would not work on other ELF target? Is it
>>> working
>>> with Darwin and Windows?
>>
>>
>> I don't know how to test these (I don't see any machines listed in the
>> gcc compile farm of those types). For Windows, I assume you mean
>> MinGW, which should be enabled as it is under i386. Should I disable
>> it there and for Darwin?
>>
>>>
>>>> This patch enables -freorder-blocks-and-partition by default for x86
>>>> at -O2 and up. It is showing some modest gains in cpu2006 performance
>>>> with profile feedback and -O2 on an Intel Westmere system. Specifically,
>>>> I am seeing consistent improvements in 401.bzip2 (1.5-3%), 483.xalancbmk
>>>> (1.5-3%), and 453.povray (2.5-3%), and no apparent regressions.
>>>
>>>
>>> This actually sounds very good ;)
>>>
>>> Lets see how the systemtap graphs goes.  If we will end up with problem
>>> of too many accesses to cold section, I would suggest making cold section
>>> subdivided into .unlikely and .unlikely.part (we could have better name)
>>> with the second consisting only of unlikely parts of hot&normal
>>> functions.
>>>
>>> This should reduce the problems we are seeing with mistakely identifying
>>> code to be cold because of roundoff errors (and it probably makes sense
>>> in general, too).
>>> We will however need to update gold and ld for that.
>>
>>
>> Note that I don't think this would help much unless the linker is
>> changed to move the cold split section close to the hot section. There
>> is probably some fine-tuning we could do eventually in the linker
>> under -ffunction-sections without putting the split portions in a
>> separate section. I.e. clump the split parts together within unlikely.
>> But hopefully this can all be done later on as follow-on work to boost
>> the performance further.
>>
>>>>
>>>> Bootstrapped and tested on x86-64-unknown-linux-gnu with a normal
>>>> bootstrap, a profiledbootstrap and an LTO profiledbootstrap. All were
>>>> configured with --enable-languages=all,obj-c++ and tested for both
>>>> 32 and 64-bit with RUNTESTFLAGS="--target_board=unix\{-m32,-m64\}".
>>>>
>>>> It would be good to enable this for additional targets as a follow on,
>>>> but it needs more testing for both correctness and performance on those
>>>> other targets (i.e for correctness because I see a number of places
>>>> in other config/*/*.c files that do some special handling under this
>>>> option for different targets or simply disable it, so I am not sure
>>>> how well-tested it is under different architectural constraints).
>>>>
>>>> Ok for trunk?
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 2013-11-19  Teresa Johnson  <tejohn...@google.com>
>>>>
>>>>          * common/config/i386/i386-common.c: Enable
>>>>          -freorder-blocks-and-partition at -O2 and up for x86.
>>>>          * opts.c (finish_options): Only warn if -freorder-blocks-and-
>>>>          partition was set on command line.
>>>
>>>
>>> You probably mis doc/invoke.texi update.
>>> Thank you for working on this!
>>
>>
>> Yes, thanks. Here is the patch with the invoke.texi update.
>>
>> Teresa
>>
>>
>> 2013-11-19  Teresa Johnson  <tejohn...@google.com>
>>
>>          * common/config/i386/i386-common.c: Enable
>>          -freorder-blocks-and-partition at -O2 and up for x86.
>>          * doc/invoke.texi: Update -freorder-blocks-and-partition default.
>>          * opts.c (finish_options): Only warn if -freorder-blocks-and-
>>          partition was set on command line.
>
> This is fine.  Glad to see we're getting some good benefits from this code.

Ok, thanks. It is in as r205058.

>
> FWIW, I would expect the PA to show benefits from this work -- HP showed
> good results for block positioning and procedure splitting eons ago.  A
> secondary effect you would expect to see would be an increase in the number
> of long branch stubs (static count), but a decrease in the number of those
> stubs that actually execute.  Measuring those effects would obviously be
> out-of-scope.
>
> I totally understand that the PA is irrelevant these days, but seeing good
> results on another architecture would go a long way to answering the
> question "should this be enabled by default across the board?"

Yep, we saw benefits for IPF on hp-ux as well. It would be great to
eventually enable this across the board and only selectively disable.
I know there were people interested in trying this with ARM, that
would be a good relevant arch to try this with now to see if it can be
enabled more widely.

Teresa

>
> jeff
>
>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to