Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Jiufu Guo via Gcc-patches Thu, 21 May 2020 19:59:53 -0700

Jiufu Guo <guoji...@linux.ibm.com> writes:

> Jan Hubicka <hubi...@ucw.cz> writes:
>
>>> Segher Boessenkool <seg...@kernel.crashing.org> writes:
>>> 
>>> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
>>> >> I think this is the wrong way to approach this.  You're doing too many
>>> >> things at once.  Try to fix the powerpc regression with the extra
>>> >> flag_rtl_unroll_loops, that could be backported.  Then you can
>>> 
>>> Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE
>>> cunroll?
>>> >> independently see whether enabling more unrolling at -O2 makes
>>> >> sense.  Because currently we _do_ unroll at -O2 when it does
>>> >> not increase size.  Its just your patches make this as aggressive
>>> >> as -O3.
>>> 
>>> I'm also thinking about enabling more cunroll at -O2 even with some size
>>> increasing.  Full cunroll enablement make it like -O3. As some
>>> discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in
>>> favor of some platforms (but not for all platforms, like x86_64?).  This
>>> would make us to have target specified hook.  Or do some generic
>>> setting: accept to unroll/peel limit times if the loop body is simple
>>> and small, together with target specific hook.
>>
>> We now have --params that can be tuned differently for -O2 and -O3 so
One thing about tunning --param based on optimization level, some times
difference function may has different optimization level.  While
--params may not be set per function, if so, --param may not work as
expect at some functions.  Not sure if this is an issue you may concern
about.


Thanks!
Jiufu

>> looking into cunroll was one of my todo for GCC 10 -O2 retuning but i did
>> not get any very conclusive benchmark results outside SPEC. 
>> I planned to return to it next stage1, so it may be good time.
>> Do you have any benchmarks on ppc?
>
> 541.leela_r, 548.exchange2_r and 557.xz_r from SPEC2017 are visbily
> affected by cunroll.  They can be used to tune cunroll, I think. 
>
>> Of couse there is no need to keep same defaults for all targets, but in
>> general having target specific defaults increases number of knobs we
>> need to check and keep up to date.
>
> Thanks,
> Jiufu
>
>>
>> Honza
>
>>> 
>>> Any comments? Thanks!
>>> Jiufu
>>> 
>>> >
>>> > Just do a separate flag (and option) for cunroll, instead?
>>> >
>>> > The RTL unroller is *the* unroller, and has been since forever.
>>> >
>>> >
>>> > Segher

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Reply via email to