On Tue, 7 Dec 2021, Andre Vieira (lists) wrote:

> Hi,
> 
> Rebased on top of the epilogue mode patch.
> 
> OK for trunk?

@@ -7242,7 +7315,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
    participating.  */
   if (ncopies > 1
       && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
-      && reduc_chain_length == 1)
+      && reduc_chain_length == 1
+      && loop_vinfo->suggested_unroll_factor == 1)
     single_defuse_cycle = true;
 
   if (single_defuse_cycle || lane_reduc_code_p)

It seems to me that 'ncopies' should include 
loop_vinfo->suggested_unroll_factor already so the check shouldn't be
necessary.

Otherwise looks OK to me.

Thanks,
Richard.

> 
> gcc/ChangeLog:
> 
>         * tree-vect-loop.c (vect_estimate_min_profitable_iters): Pass new
> argument
>         suggested_unroll_factor.
>         (vect_analyze_loop_costing): Likewise.
>         (_loop_vec_info::_loop_vec_info): Initialize new member 
> suggested_unroll_factor.
>         (vect_determine_partial_vectors_and_peeling): Make epilogue of
> unrolled
>         main loop use partial vectors.
>         (vect_analyze_loop_2): Pass and use new argument 
> suggested_unroll_factor.
>         (vect_analyze_loop_1): Likewise.
>         (vect_analyze_loop): Change to intialize local 
> suggested_unroll_factor and use it.
>         (vectorizable_reduction): Don't use single_defuse_cycle when
> unrolling.
>         * tree-vectorizer.h (_loop_vec_info::_loop_vec_info): Add new member
> suggested_unroll_factor.
>         (vector_costs::vector_costs): Add new member
> m_suggested_unroll_factor.
>         (vector_costs::suggested_unroll_factor): New getter function.
>         (finish_cost): Set return argument suggested_unroll_factor.
> 
> 
> 
> Regards,
> Andre
> 
> On 07/12/2021 11:27, Andre Vieira (lists) via Gcc-patches wrote:
> > Hi,
> >
> > I've split this particular part off, since it's not only relevant to
> > unrolling. The new test shows how this is useful for existing
> > (non-unrolling) cases. I also had to fix the costing function, the main_vf /
> > epilogue_vf calculations for old and new didn't take into consideration that
> > the main_vf could be lower, nor did it take into consideration that they
> > were not necessarily always a multiple of each other.  So using CEIL here is
> > the correct approach.
> >
> > Bootstrapped and regression tested on aarch64-none-linux-gnu.
> >
> > OK for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for
> > epilogue costing.
> >         (vect_analyze_loop): Re-analyze all modes for epilogues.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/aarch64/masked_epilogue.c: New test.
> >
> > On 30/11/2021 13:56, Richard Biener wrote:
> >> On Tue, 30 Nov 2021, Andre Vieira (lists) wrote:
> >>
> >>> On 25/11/2021 12:46, Richard Biener wrote:
> >>>> Oops, my fault, yes, it does.  I would suggest to refactor things so
> >>>> that the mode_i = first_loop_i case is there only once.  I also wonder
> >>>> if all the argument about starting at 0 doesn't apply to the
> >>>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well?  So
> >>>> what's the reason to differ here?  So in the end I'd just change
> >>>> the existing
> >>>>
> >>>>     if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> >>>>       {
> >>>>
> >>>> to
> >>>>
> >>>>     if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)
> >>>>         || first_loop_vinfo->suggested_unroll_factor > 1)
> >>>>       {
> >>>>
> >>>> and maybe revisit this when we have an actual testcase showing that
> >>>> doing sth else has a positive effect?
> >>>>
> >>>> Thanks,
> >>>> Richard.
> >>> So I had a quick chat with Richard Sandiford and he is suggesting
> >>> resetting
> >>> mode_i to 0 for all cases.
> >>>
> >>> He pointed out that for some tunings the SVE mode might come after the
> >>> NEON
> >>> mode, which means that even for not-unrolled loop_vinfos we could end up
> >>> with
> >>> a suboptimal choice of mode for the epilogue. I.e. it could be that we
> >>> pick
> >>> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd
> >>> not
> >>> try VNx16QI for the epilogue.
> >>>
> >>> This would simplify the mode selecting cases, by just simply restarting at
> >>> mode_i in all epilogue cases. Is that something you'd be OK?
> >> Works for me with an updated comment.  Even better with showing a
> >> testcase exercising such tuning.
> >>
> >> Richard.
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Reply via email to