Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-22 Thread Richard Biener via Gcc-patches
On Tue, 21 Sep 2021, Andre Vieira (lists) wrote:

> Hi Richi,
> 
> Thanks for the review, see below some questions.
> 
> On 21/09/2021 13:30, Richard Biener wrote:
> > On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:
> >
> >> Hi all,
> >>
> >> This patch adds the ability to define a target hook to unroll the main
> >> vectorized loop. It also introduces --param's vect-unroll and
> >> vect-unroll-reductions to control this through a command-line. I found this
> >> useful to experiment and believe can help when tuning, so I decided to
> >> leave
> >> it in.
> >> We only unroll the main loop and have disabled unrolling epilogues for now.
> >> We
> >> also do not support unrolling of any loop that has a negative step and we
> >> do
> >> not support unrolling a loop with any reduction other than a
> >> TREE_CODE_REDUCTION.
> >>
> >> Bootstrapped and regression tested on aarch64-linux-gnu as part of the
> >> series.
> > I wonder why we want to change the vector modes used for the epilogue,
> > we're either making it more likely to need to fall through to the
> > scalar epilogue or require another vectorized epilogue.
> I don't quite understand what you mean by change the vector modes for the
> epilogue. I don't think we do.
> If you are referring to:
>       /* If we are unrolling, try all VECTOR_MODES for the epilogue.  */
>       if (loop_vinfo->par_unrolling_factor > 1)
>         {
>       next_vector_mode = vector_modes[0];
>       mode_i = 1;
> 
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location,
>                  "* Re-trying analysis with vector mode"
>                  " %s for epilogue with partial vectors.\n",
>                  GET_MODE_NAME (next_vector_mode));
>       continue;
>         }
> 
> That just forces trying the vector modes we've tried before. Though I might
> need to revisit this now I think about it. I'm afraid it might be possible for
> this to generate an epilogue with a vf that is not lower than that of the main
> loop, but I'd need to think about this again.
> 
> Either way I don't think this changes the vector modes used for the epilogue.
> But maybe I'm just missing your point here.

Yes, I was refering to the above which suggests that when we vectorize
the main loop with V4SF but unroll then we try vectorizing the
epilogue with V4SF as well (but not unrolled).  I think that's
premature (not sure if you try V8SF if the main loop was V4SF but
unrolled 4 times).

> > That said, for simplicity I'd only change the VF of the main loop.
> >
> > There I wonder why you need to change vect_update_vf_for_slp or
> > vect_determine_partial_vectors_and_peeling and why it's not enough
> > to adjust the VF in a single place, I'd do that here:
> >
> >/* This is the point where we can re-start analysis with SLP forced off.
> > */
> > start_over:
> >
> >/* Now the vectorization factor is final.  */
> >poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> >gcc_assert (known_ne (vectorization_factor, 0U));
> >
> > >  call vect_update_vf_for_unroll ()
> I can move it there, it would indeed remove the need for the change to
> vect_update_vf_for_slp, the change to
> vect_determine_partial_vectors_and_peeling would still be required I think. It
> is meant to disable using partial vectors in an unrolled loop.

Why would we disable the use of partial vectors in an unrolled loop?

> > note there's also loop->unroll (from #pragma GCC unroll) which we
> > could include in what you look at in vect_unroll_value.
> >
> > I don't like add_stmt_cost_for_unroll - how should a target go
> > and decide based on what it is fed?  You could as well feed it
> > the scalar body or the vinfo so it can get a shot at all
> > the vectorizers meta data - but feeding it individual stmt_infos
> > does not add any meaningful abstraction and thus what's the
> > point?
> I am still working on tuning our backend hook, but the way it works is it
> estimates how many load, store and general ops are required for the vectorized
> loop based on these.
> > I _think_ what would make some sense is when we actually cost
> > the vector body (with the not unrolled VF) ask the target
> > "well, how about unrolling this?" because there it has the
> > chance to look at the actual vector stmts produced (in "cost form").
> > And if the target answers "yeah - go ahead and try x4" we signal
> > that to the iteration and have "mode N with x4 unroll" validated and
> > costed.
> >
> > So instead of a new target hook amend the finish_cost hook to
> > produce a suggested unroll value and cost both the unrolled and
> > not unrolled body.
> >
> > Sorry for steering in a different direction ;)
> The reason we decided to do this early and not after cost is because
> 'vect_prune_runtime_alias_test_list' and 'vect_enhance_data_refs_alignment'
> require the VF and if you suddenly raise that the alias analysis could become
> invalid.

Sure but I'm suggesting you keep the 

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-21 Thread Andre Vieira (lists) via Gcc-patches

Hi Richi,

Thanks for the review, see below some questions.

On 21/09/2021 13:30, Richard Biener wrote:

On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:


Hi all,

This patch adds the ability to define a target hook to unroll the main
vectorized loop. It also introduces --param's vect-unroll and
vect-unroll-reductions to control this through a command-line. I found this
useful to experiment and believe can help when tuning, so I decided to leave
it in.
We only unroll the main loop and have disabled unrolling epilogues for now. We
also do not support unrolling of any loop that has a negative step and we do
not support unrolling a loop with any reduction other than a
TREE_CODE_REDUCTION.

Bootstrapped and regression tested on aarch64-linux-gnu as part of the series.

I wonder why we want to change the vector modes used for the epilogue,
we're either making it more likely to need to fall through to the
scalar epilogue or require another vectorized epilogue.
I don't quite understand what you mean by change the vector modes for 
the epilogue. I don't think we do.

If you are referring to:
      /* If we are unrolling, try all VECTOR_MODES for the epilogue.  */
      if (loop_vinfo->par_unrolling_factor > 1)
        {
      next_vector_mode = vector_modes[0];
      mode_i = 1;

      if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
                 "* Re-trying analysis with vector mode"
                 " %s for epilogue with partial vectors.\n",
                 GET_MODE_NAME (next_vector_mode));
      continue;
        }

That just forces trying the vector modes we've tried before. Though I 
might need to revisit this now I think about it. I'm afraid it might be 
possible for this to generate an epilogue with a vf that is not lower 
than that of the main loop, but I'd need to think about this again.


Either way I don't think this changes the vector modes used for the 
epilogue. But maybe I'm just missing your point here.

That said, for simplicity I'd only change the VF of the main loop.

There I wonder why you need to change vect_update_vf_for_slp or
vect_determine_partial_vectors_and_peeling and why it's not enough
to adjust the VF in a single place, I'd do that here:

   /* This is the point where we can re-start analysis with SLP forced off.
*/
start_over:

   /* Now the vectorization factor is final.  */
   poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   gcc_assert (known_ne (vectorization_factor, 0U));

>  call vect_update_vf_for_unroll ()
I can move it there, it would indeed remove the need for the change to 
vect_update_vf_for_slp, the change to 
vect_determine_partial_vectors_and_peeling would still be required I 
think. It is meant to disable using partial vectors in an unrolled loop.

note there's also loop->unroll (from #pragma GCC unroll) which we
could include in what you look at in vect_unroll_value.

I don't like add_stmt_cost_for_unroll - how should a target go
and decide based on what it is fed?  You could as well feed it
the scalar body or the vinfo so it can get a shot at all
the vectorizers meta data - but feeding it individual stmt_infos
does not add any meaningful abstraction and thus what's the
point?
I am still working on tuning our backend hook, but the way it works is 
it estimates how many load, store and general ops are required for the 
vectorized loop based on these.

I _think_ what would make some sense is when we actually cost
the vector body (with the not unrolled VF) ask the target
"well, how about unrolling this?" because there it has the
chance to look at the actual vector stmts produced (in "cost form").
And if the target answers "yeah - go ahead and try x4" we signal
that to the iteration and have "mode N with x4 unroll" validated and
costed.

So instead of a new target hook amend the finish_cost hook to
produce a suggested unroll value and cost both the unrolled and
not unrolled body.

Sorry for steering in a different direction ;)
The reason we decided to do this early and not after cost is because 
'vect_prune_runtime_alias_test_list' and 
'vect_enhance_data_refs_alignment' require the VF and if you suddenly 
raise that the alias analysis could become invalid.


An initial implementation did do it later for that very reason that we 
could reuse the cost calculations and AArch64 already computed these 
'ops' after Richard Sandiford's patches.

But yeah ... the above kinda led me to rewrite it this way.



Thanks,
Richard.




gcc/ChangeLog:

     * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR
     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
     * doc/tm.texi.in: Add entries for target hooks above.
     * params.opt: Add vect-unroll and vect-unroll-reductions
parameters.
     * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR
     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
     * targhooks.c (default_add_stmt_cost_for_unroll): New.
     

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-21 Thread Richard Biener via Gcc-patches
On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:

> Hi all,
> 
> This patch adds the ability to define a target hook to unroll the main
> vectorized loop. It also introduces --param's vect-unroll and
> vect-unroll-reductions to control this through a command-line. I found this
> useful to experiment and believe can help when tuning, so I decided to leave
> it in.
> We only unroll the main loop and have disabled unrolling epilogues for now. We
> also do not support unrolling of any loop that has a negative step and we do
> not support unrolling a loop with any reduction other than a
> TREE_CODE_REDUCTION.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu as part of the series.

I wonder why we want to change the vector modes used for the epilogue,
we're either making it more likely to need to fall through to the
scalar epilogue or require another vectorized epilogue.

That said, for simplicity I'd only change the VF of the main loop.

There I wonder why you need to change vect_update_vf_for_slp or
vect_determine_partial_vectors_and_peeling and why it's not enough
to adjust the VF in a single place, I'd do that here:

  /* This is the point where we can re-start analysis with SLP forced off.  
*/
start_over:

  /* Now the vectorization factor is final.  */
  poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
  gcc_assert (known_ne (vectorization_factor, 0U));

>  call vect_update_vf_for_unroll ()

note there's also loop->unroll (from #pragma GCC unroll) which we
could include in what you look at in vect_unroll_value.

I don't like add_stmt_cost_for_unroll - how should a target go
and decide based on what it is fed?  You could as well feed it
the scalar body or the vinfo so it can get a shot at all
the vectorizers meta data - but feeding it individual stmt_infos
does not add any meaningful abstraction and thus what's the
point?

I _think_ what would make some sense is when we actually cost
the vector body (with the not unrolled VF) ask the target
"well, how about unrolling this?" because there it has the
chance to look at the actual vector stmts produced (in "cost form").
And if the target answers "yeah - go ahead and try x4" we signal
that to the iteration and have "mode N with x4 unroll" validated and
costed.

So instead of a new target hook amend the finish_cost hook to
produce a suggested unroll value and cost both the unrolled and
not unrolled body.

Sorry for steering in a different direction ;)

Thanks,
Richard.



> gcc/ChangeLog:
> 
>     * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR
>     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
>     * doc/tm.texi.in: Add entries for target hooks above.
>     * params.opt: Add vect-unroll and vect-unroll-reductions 
> parameters.
>     * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR
>     and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
>     * targhooks.c (default_add_stmt_cost_for_unroll): New.
>     (default_unroll_factor): Likewise.
>     * targhooks.h (default_add_stmt_cost_for_unroll): Likewise.
>     (default_unroll_factor): Likewise.
>     * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>     par_unrolling_factor.
>     (vect_update_vf_for_slp): Use unrolling factor to update 
> vectorization
>     factor.
>     (vect_determine_partial_vectors_and_peeling): Account for 
> unrolling.
>     (vect_determine_unroll_factor): Determine how much to unroll
> vectorized
>     main loop.
>     (vect_analyze_loop_2): Call vect_determine_unroll_factor.
>     (vect_analyze_loop): Allow for epilogue vectorization when 
> unrolling
>     and rewalk vector_mode array for the epilogues.
>     (vectorizable_reduction): Disable single_defuse_cycle when 
> unrolling.
>     * tree-vectorizer.h (vect_unroll_value): Declare par_unrolling_factor
>     as a member of loop_vec_info.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)