On Tue, 7 Dec 2021, Andre Vieira (lists) wrote:
> Hi,
>
> Rebased on top of the epilogue mode patch.
>
> OK for trunk?
@@ -7242,7 +7315,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
participating. */
if (ncopies > 1
&& (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
- && reduc_chain_length == 1)
+ && reduc_chain_length == 1
+ && loop_vinfo->suggested_unroll_factor == 1)
single_defuse_cycle = true;
if (single_defuse_cycle || lane_reduc_code_p)
It seems to me that 'ncopies' should include
loop_vinfo->suggested_unroll_factor already so the check shouldn't be
necessary.
Otherwise looks OK to me.
Thanks,
Richard.
>
> gcc/ChangeLog:
>
> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Pass new
> argument
> suggested_unroll_factor.
> (vect_analyze_loop_costing): Likewise.
> (_loop_vec_info::_loop_vec_info): Initialize new member
> suggested_unroll_factor.
> (vect_determine_partial_vectors_and_peeling): Make epilogue of
> unrolled
> main loop use partial vectors.
> (vect_analyze_loop_2): Pass and use new argument
> suggested_unroll_factor.
> (vect_analyze_loop_1): Likewise.
> (vect_analyze_loop): Change to intialize local
> suggested_unroll_factor and use it.
> (vectorizable_reduction): Don't use single_defuse_cycle when
> unrolling.
> * tree-vectorizer.h (_loop_vec_info::_loop_vec_info): Add new member
> suggested_unroll_factor.
> (vector_costs::vector_costs): Add new member
> m_suggested_unroll_factor.
> (vector_costs::suggested_unroll_factor): New getter function.
> (finish_cost): Set return argument suggested_unroll_factor.
>
>
>
> Regards,
> Andre
>
> On 07/12/2021 11:27, Andre Vieira (lists) via Gcc-patches wrote:
> > Hi,
> >
> > I've split this particular part off, since it's not only relevant to
> > unrolling. The new test shows how this is useful for existing
> > (non-unrolling) cases. I also had to fix the costing function, the main_vf /
> > epilogue_vf calculations for old and new didn't take into consideration that
> > the main_vf could be lower, nor did it take into consideration that they
> > were not necessarily always a multiple of each other. So using CEIL here is
> > the correct approach.
> >
> > Bootstrapped and regression tested on aarch64-none-linux-gnu.
> >
> > OK for trunk?
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for
> > epilogue costing.
> > (vect_analyze_loop): Re-analyze all modes for epilogues.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/masked_epilogue.c: New test.
> >
> > On 30/11/2021 13:56, Richard Biener wrote:
> >> On Tue, 30 Nov 2021, Andre Vieira (lists) wrote:
> >>
> >>> On 25/11/2021 12:46, Richard Biener wrote:
> >>>> Oops, my fault, yes, it does. I would suggest to refactor things so
> >>>> that the mode_i = first_loop_i case is there only once. I also wonder
> >>>> if all the argument about starting at 0 doesn't apply to the
> >>>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well? So
> >>>> what's the reason to differ here? So in the end I'd just change
> >>>> the existing
> >>>>
> >>>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> >>>> {
> >>>>
> >>>> to
> >>>>
> >>>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)
> >>>> || first_loop_vinfo->suggested_unroll_factor > 1)
> >>>> {
> >>>>
> >>>> and maybe revisit this when we have an actual testcase showing that
> >>>> doing sth else has a positive effect?
> >>>>
> >>>> Thanks,
> >>>> Richard.
> >>> So I had a quick chat with Richard Sandiford and he is suggesting
> >>> resetting
> >>> mode_i to 0 for all cases.
> >>>
> >>> He pointed out that for some tunings the SVE mode might come after the
> >>> NEON
> >>> mode, which means that even for not-unrolled loop_vinfos we could end up
> >>> with
> >>> a suboptimal choice of mode for the epilogue. I.e. it could be that we
> >>> pick
> >>> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd
> >>> not
> >>> try VNx16QI for the epilogue.
> >>>
> >>> This would simplify the mode selecting cases, by just simply restarting at
> >>> mode_i in all epilogue cases. Is that something you'd be OK?
> >> Works for me with an updated comment. Even better with showing a
> >> testcase exercising such tuning.
> >>
> >> Richard.
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)