Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread Richard Biener via Gcc-patches
On Mon, May 25, 2020 at 1:10 PM guojiufu  wrote:
>
> Currently option -funroll-loops controls both GIMPLE unroler and
> RTL unroller. It is not able to control GIMPLE cunroller and
> RTL unroller independently.  This patch introducing different flags
> to control them seperately, and this also provide more freedom to
> tune one of them without affecting another.
>
> This patch introduces two undocumented flags: -fcomplete-unroll-loops
> for GIMPLE cunroll, and -frtl-unroll-loops for RTL unroller.  And
> these two options are enabled by original -funroll-loops.

I don't like the new -fcomplete-unroll-loops name.  It does not
match the implementation either (the gate functions of cunroll
and cunrolli only check optimize against 2/3), given it just
controls whether unrolling can grow code size.

> -  SET_OPTION_IF_UNSET (opts, opts_set, flag_unroll_loops, value);
> +  SET_OPTION_IF_UNSET (opts, opts_set, flag_complete_unroll_loops, value);

This is also not a no-op change.

Since a new flag is not needed to fix the regression please avoid
adding -fcomplete-unroll-loops.

For -frtl-unroll-loops you should be able to use

EnabledBy(funroll-loops)

and thus you can simplify option handling accordingly.

Thanks,
Richard.

> Bootstrap and regtest pass on powerpc64le, is this ok for trunk?
>
> Jiufu
>
> ChangeLog:
> 2020-05-25  Jiufu Guo   
>
> * common.opt: Add -frtl-unroll-loops and -fcomplete-unroll-loops.
> * opts.c (enable_fdo_optimizations): Replace flag_unroll_loops
> with flag_complete_unroll_loops.
> * toplev.c (process_options): set flag_rtl_unroll_loops and
> flag_complete_unroll_loops if not explicitly set by user.
> * tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute): Replace
> flag_unroll_loops with flag_complete_unroll_loops.
> * loop-init.c (pass_loop2::gate): Replace flag_unroll_loops with
> flag_rtl_unroll_loops.
> (pass_rtl_unroll_loops::gate): Replace flag_unroll_loops with
> flag_rtl_unroll_loops.
> ---
>  gcc/common.opt  | 8 
>  gcc/loop-init.c | 6 +++---
>  gcc/opts.c  | 2 +-
>  gcc/toplev.c| 6 ++
>  gcc/tree-ssa-loop-ivcanon.c | 2 +-
>  5 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..3b5ab52bb9d 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,14 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +frtl-unroll-loops
> +Common Undocumented Var(flag_rtl_unroll_loops) Init(2) Optimization
> +; Perform rtl loop unrolling when iteration count is known.
> +
> +fcomplete-unroll-loops
> +Common Undocumented Var(flag_complete_unroll_loops) Init(2) Optimization
> +; Perform GIMPLE loop complete unrolling.
> +
>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index 401e5282907..e955068f36c 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -360,7 +360,7 @@ pass_loop2::gate (function *fun)
>if (optimize > 0
>&& (flag_move_loop_invariants
>   || flag_unswitch_loops
> - || flag_unroll_loops
> + || flag_rtl_unroll_loops
>   || (flag_branch_on_count_reg && targetm.have_doloop_end ())
>   || cfun->has_unroll))
>  return true;
> @@ -560,7 +560,7 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>  {
> -  return (flag_unroll_loops || flag_unroll_all_loops || 
> cfun->has_unroll);
> +  return (flag_rtl_unroll_loops || flag_unroll_all_loops || 
> cfun->has_unroll);
>  }
>
>virtual unsigned int execute (function *);
> @@ -576,7 +576,7 @@ pass_rtl_unroll_loops::execute (function *fun)
>if (dump_file)
> df_dump (dump_file);
>
> -  if (flag_unroll_loops)
> +  if (flag_rtl_unroll_loops)
> flags |= UAP_UNROLL;
>if (flag_unroll_all_loops)
> flags |= UAP_UNROLL_ALL;
> diff --git a/gcc/opts.c b/gcc/opts.c
> index ec3ca0720f9..64c35d8d7fc 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1702,7 +1702,7 @@ enable_fdo_optimizations (struct gcc_options *opts,
>  {
>SET_OPTION_IF_UNSET (opts, opts_set, flag_branch_probabilities, value);
>SET_OPTION_IF_UNSET (opts, opts_set, flag_profile_values, value);
> -  SET_OPTION_IF_UNSET (opts, opts_set, flag_unroll_loops, value);
> +  SET_OPTION_IF_UNSET (opts, opts_set, flag_complete_unroll_loops, value);
>SET_OPTION_IF_UNSET (opts, opts_set, flag_peel_loops, value);
>SET_OPTION_IF_UNSET (opts, opts_set, flag_tracer, value);
>SET_OPTION_IF_UNSET (opts, opts_set, flag_value_profile_transformations,
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..c2b94d33464 100644
> --- a/gcc/toplev.

Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread Segher Boessenkool
On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> On Mon, May 25, 2020 at 1:10 PM guojiufu  wrote:
> Since a new flag is not needed to fix the regression please avoid
> adding -fcomplete-unroll-loops.
> 
> For -frtl-unroll-loops you should be able to use

Erm.  That *is* a new command-line option (the internal flags I do not
care about so much: new implementation details are *good*).  And a new
name that is a mistake in my opinion, for many reasons (users do not
know and should not have to care about "rtl"; the name is not
descriptive; it is useless churn, it is not the same name as we have
had for decades now; it is adding a new option for a future where we
will do most unrolling at gimple level, a future we do not know will
ever exist, and we do not know what that will look like anyway; it is
an extra level of indirection (in the name)).

We should not have an -frtl-unroll-loops if we do not have a
-ftree-unroll-loops (or whatever).

Unrolling early is not a good idea in general (the problems with the
very trivial complete unrolling case just underline that).  But we
*should* know which code we expect to unroll later, for better costing.
Adding names like "rtl-unroll-loops" only stands in the way of getting
a better design here.


Segher


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread Richard Biener via Gcc-patches
On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool 
 wrote:
>On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
>> On Mon, May 25, 2020 at 1:10 PM guojiufu 
>wrote:
>> Since a new flag is not needed to fix the regression please avoid
>> adding -fcomplete-unroll-loops.
>> 
>> For -frtl-unroll-loops you should be able to use
>
>Erm.  That *is* a new command-line option (the internal flags I do not
>care about so much: new implementation details are *good*).  And a new
>name that is a mistake in my opinion, for many reasons (users do not
>know and should not have to care about "rtl"; the name is not
>descriptive; it is useless churn, it is not the same name as we have
>had for decades now; it is adding a new option for a future where we
>will do most unrolling at gimple level, a future we do not know will
>ever exist, and we do not know what that will look like anyway; it is
>an extra level of indirection (in the name)).
>
>We should not have an -frtl-unroll-loops if we do not have a
>-ftree-unroll-loops (or whatever).
>
>Unrolling early is not a good idea in general (the problems with the
>very trivial complete unrolling case just underline that).  But we
>*should* know which code we expect to unroll later, for better costing.
>Adding names like "rtl-unroll-loops" only stands in the way of getting
>a better design here.

You folks made ppc specific hacks instead of a better design. Those now stand 
in the way as well. But sure, simply do not expose the flag to the users, use
Var(flag_rtl_unroll_loops). My other points still stand. 

Feel free to ignore the regression part on the branch and come up with a great 
design. But don't expect to backport that then. 

Richard. 

>
>Segher



Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread David Edelsohn via Gcc-patches
On Mon, May 25, 2020 at 1:58 PM Richard Biener
 wrote:
>
> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool 
>  wrote:
> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> >> On Mon, May 25, 2020 at 1:10 PM guojiufu 
> >wrote:
> >> Since a new flag is not needed to fix the regression please avoid
> >> adding -fcomplete-unroll-loops.
> >>
> >> For -frtl-unroll-loops you should be able to use
> >
> >Erm.  That *is* a new command-line option (the internal flags I do not
> >care about so much: new implementation details are *good*).  And a new
> >name that is a mistake in my opinion, for many reasons (users do not
> >know and should not have to care about "rtl"; the name is not
> >descriptive; it is useless churn, it is not the same name as we have
> >had for decades now; it is adding a new option for a future where we
> >will do most unrolling at gimple level, a future we do not know will
> >ever exist, and we do not know what that will look like anyway; it is
> >an extra level of indirection (in the name)).
> >
> >We should not have an -frtl-unroll-loops if we do not have a
> >-ftree-unroll-loops (or whatever).
> >
> >Unrolling early is not a good idea in general (the problems with the
> >very trivial complete unrolling case just underline that).  But we
> >*should* know which code we expect to unroll later, for better costing.
> >Adding names like "rtl-unroll-loops" only stands in the way of getting
> >a better design here.
>
> You folks made ppc specific hacks instead of a better design. Those now stand 
> in the way as well. But sure, simply do not expose the flag to the users, use
> Var(flag_rtl_unroll_loops). My other points still stand.
>
> Feel free to ignore the regression part on the branch and come up with a 
> great design. But don't expect to backport that then.

I completely agree.

This path is digging a deeper and deeper hole.

- David


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread Segher Boessenkool
On Mon, May 25, 2020 at 07:58:09PM +0200, Richard Biener wrote:
> You folks made ppc specific hacks instead of a better design. Those now stand 
> in the way as well. But sure, simply do not expose the flag to the users, use
> Var(flag_rtl_unroll_loops). My other points still stand. 
> 
> Feel free to ignore the regression part on the branch and come up with a 
> great design. But don't expect to backport that then. 

I just do not think fixing the regression should make things worse for
a long time, if that can be avoided.


Segher


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-25 Thread Jiufu Guo via Gcc-patches
David Edelsohn  writes:

> On Mon, May 25, 2020 at 1:58 PM Richard Biener
>  wrote:
>>
>> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool 
>>  wrote:
>> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
>> >> On Mon, May 25, 2020 at 1:10 PM guojiufu 
>> >wrote:
>> >> Since a new flag is not needed to fix the regression please avoid
>> >> adding -fcomplete-unroll-loops.
>> >>
>> >> For -frtl-unroll-loops you should be able to use
>> >
>> >Erm.  That *is* a new command-line option (the internal flags I do not
>> >care about so much: new implementation details are *good*).  And a new
>> >name that is a mistake in my opinion, for many reasons (users do not
>> >know and should not have to care about "rtl"; the name is not
>> >descriptive; it is useless churn, it is not the same name as we have
>> >had for decades now; it is adding a new option for a future where we
>> >will do most unrolling at gimple level, a future we do not know will
>> >ever exist, and we do not know what that will look like anyway; it is
>> >an extra level of indirection (in the name)).
>> >
>> >We should not have an -frtl-unroll-loops if we do not have a
>> >-ftree-unroll-loops (or whatever).
>> >
>> >Unrolling early is not a good idea in general (the problems with the
>> >very trivial complete unrolling case just underline that).  But we
>> >*should* know which code we expect to unroll later, for better costing.
>> >Adding names like "rtl-unroll-loops" only stands in the way of getting
>> >a better design here.
>>
>> You folks made ppc specific hacks instead of a better design. Those
>> now stand in the way as well. But sure, simply do not expose the
>> flag to the users, use
>> Var(flag_rtl_unroll_loops). My other points still stand.
>>
>> Feel free to ignore the regression part on the branch and come up
>> with a great design. But don't expect to backport that then.
>
> I completely agree.

Thanks a lot for all your comments, suggestions, and tips in the
discussion.  Thank Richar, Segher, David, Hanza, and all!

I may have an explanation about the intention of this work.

We know that loop unrolling is a complex and tickly thing.  It could
help some kinds of code in a great manner.  Sometimes there are side
effects.  For different types of loop and different platforms, it may
result in different effects.
It would makes sense to tune the loop unrolling accordingly.  And so, to
help and tune loop unrolling is what we want to do.

Currently, we have loop unroller at GIMPLE part (cunroll/cunrolli) and
RTL part.  There are some options (like -funroll-loops) and --params to
control unrollers.

Through target hook, it would be helpful for different platforms to tune
unroller: checking the type of loops, check optimization level.
Existing hooks may help with something, like turn --params.

Adding separate flags(or options) may be helpful to control different
behaviors independently.  This is one reason for the patch which
introduces internal undocumented options.

One previous patch, r10-4525, is tunning for ppc at -O2. Which
implements an existing hook for rs6000 to check simple loops for RTL
unroller. For cunroll, it just enables it even increasing size at -O2
directly, without check the type of the loops.  And then the
side/negative effects of cunroll are also visible at -O2 besides
positive effects.  In PR95018, the side effect is shown on complex loop
(early exit, and more peeling).
One idea is for cunroll to tune it to avoid side effects. And if the
heuristic is suitable, it would be helpful for other usage, like -O3 and
-funroll-loops.

Thanks for any comments!

Jiufu

>
> This path is digging a deeper and deeper hole.
>
> - David


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Richard Biener via Gcc-patches
On Tue, May 26, 2020 at 6:58 AM Jiufu Guo  wrote:
>
> David Edelsohn  writes:
>
> > On Mon, May 25, 2020 at 1:58 PM Richard Biener
> >  wrote:
> >>
> >> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool 
> >>  wrote:
> >> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> >> >> On Mon, May 25, 2020 at 1:10 PM guojiufu 
> >> >wrote:
> >> >> Since a new flag is not needed to fix the regression please avoid
> >> >> adding -fcomplete-unroll-loops.
> >> >>
> >> >> For -frtl-unroll-loops you should be able to use
> >> >
> >> >Erm.  That *is* a new command-line option (the internal flags I do not
> >> >care about so much: new implementation details are *good*).  And a new
> >> >name that is a mistake in my opinion, for many reasons (users do not
> >> >know and should not have to care about "rtl"; the name is not
> >> >descriptive; it is useless churn, it is not the same name as we have
> >> >had for decades now; it is adding a new option for a future where we
> >> >will do most unrolling at gimple level, a future we do not know will
> >> >ever exist, and we do not know what that will look like anyway; it is
> >> >an extra level of indirection (in the name)).
> >> >
> >> >We should not have an -frtl-unroll-loops if we do not have a
> >> >-ftree-unroll-loops (or whatever).
> >> >
> >> >Unrolling early is not a good idea in general (the problems with the
> >> >very trivial complete unrolling case just underline that).  But we
> >> >*should* know which code we expect to unroll later, for better costing.
> >> >Adding names like "rtl-unroll-loops" only stands in the way of getting
> >> >a better design here.
> >>
> >> You folks made ppc specific hacks instead of a better design. Those
> >> now stand in the way as well. But sure, simply do not expose the
> >> flag to the users, use
> >> Var(flag_rtl_unroll_loops). My other points still stand.
> >>
> >> Feel free to ignore the regression part on the branch and come up
> >> with a great design. But don't expect to backport that then.
> >
> > I completely agree.
>
> Thanks a lot for all your comments, suggestions, and tips in the
> discussion.  Thank Richar, Segher, David, Hanza, and all!
>
> I may have an explanation about the intention of this work.
>
> We know that loop unrolling is a complex and tickly thing.  It could
> help some kinds of code in a great manner.  Sometimes there are side
> effects.  For different types of loop and different platforms, it may
> result in different effects.
> It would makes sense to tune the loop unrolling accordingly.  And so, to
> help and tune loop unrolling is what we want to do.
>
> Currently, we have loop unroller at GIMPLE part (cunroll/cunrolli) and
> RTL part.  There are some options (like -funroll-loops) and --params to
> control unrollers.
>
> Through target hook, it would be helpful for different platforms to tune
> unroller: checking the type of loops, check optimization level.
> Existing hooks may help with something, like turn --params.
>
> Adding separate flags(or options) may be helpful to control different
> behaviors independently.  This is one reason for the patch which
> introduces internal undocumented options.
>
> One previous patch, r10-4525, is tunning for ppc at -O2. Which
> implements an existing hook for rs6000 to check simple loops for RTL
> unroller. For cunroll, it just enables it even increasing size at -O2
> directly, without check the type of the loops.  And then the
> side/negative effects of cunroll are also visible at -O2 besides
> positive effects.  In PR95018, the side effect is shown on complex loop
> (early exit, and more peeling).
> One idea is for cunroll to tune it to avoid side effects. And if the
> heuristic is suitable, it would be helpful for other usage, like -O3 and
> -funroll-loops.
>
> Thanks for any comments!

For GIMPLE level transforms I don't think targets have more knowledge
than the middle-end.  In fact GIMPLE complete unrolling is about
secondary effects, removing redundancies and abstraction.  So IMHO
the correct approach is to look at individual cases and try to improve
the generic code rather than try to get better benchmark results
on a per-target manner by magical parameter tuning.

For what the RTL unroller does it indeed depends very heavily on
the target whether sth is beneficial or not.

So I'd like to see specific cases where you think cunroll should
do "better" on powerpc only but not elsewhere.

Richard.

> Jiufu
>
> >
> > This path is digging a deeper and deeper hole.
> >
> > - David


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Segher Boessenkool
Hi!

On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote:
> For GIMPLE level transforms I don't think targets have more knowledge
> than the middle-end.

Yes, certainly.

> In fact GIMPLE complete unrolling is about
> secondary effects, removing redundancies and abstraction.  So IMHO
> the correct approach is to look at individual cases and try to improve
> the generic code

Yep.

> rather than try to get better benchmark results
> on a per-target manner by magical parameter tuning.

I'm no fan of that for target-specific code either.  It's fine to be led
by benchmarks, but usually a better justification is needed.

> For what the RTL unroller does it indeed depends very heavily on
> the target whether sth is beneficial or not.

Yes :-(  And this means it will need to remain late in the pass
pipeline,  or at least the decision needs to use target information
(just like what ivopts does).

> So I'd like to see specific cases where you think cunroll should
> do "better" on powerpc only but not elsewhere.

It is probably not a good idea in general to unroll 14 times, yes :-)


Segher


Re: [PATCH 1/2] Seperate -funroll-loops for GIMPLE unroller and RTL unroller

2020-05-28 Thread Jiufu Guo via Gcc-patches
Segher Boessenkool  writes:

> Hi!
>
> On Thu, May 28, 2020 at 04:22:16PM +0200, Richard Biener wrote:
>> For GIMPLE level transforms I don't think targets have more knowledge
>> than the middle-end.
>
> Yes, certainly.
>
>> In fact GIMPLE complete unrolling is about
>> secondary effects, removing redundancies and abstraction.  So IMHO
>> the correct approach is to look at individual cases and try to improve
>> the generic code
>
> Yep.
>
>> rather than try to get better benchmark results
>> on a per-target manner by magical parameter tuning.
>
> I'm no fan of that for target-specific code either.  It's fine to be led
> by benchmarks, but usually a better justification is needed.

Thanks all,
Agree, we'd better tune it in generic code.

Jiufu

>
>> For what the RTL unroller does it indeed depends very heavily on
>> the target whether sth is beneficial or not.
>
> Yes :-(  And this means it will need to remain late in the pass
> pipeline,  or at least the decision needs to use target information
> (just like what ivopts does).
>
>> So I'd like to see specific cases where you think cunroll should
>> do "better" on powerpc only but not elsewhere.
>
> It is probably not a good idea in general to unroll 14 times, yes :-)
>
>
> Segher