On Tue, 7 Dec 2021, Andre Vieira (lists) wrote: > Hi, > > Rebased on top of the epilogue mode patch. > > OK for trunk?
@@ -7242,7 +7315,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo, participating. */ if (ncopies > 1 && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live) - && reduc_chain_length == 1) + && reduc_chain_length == 1 + && loop_vinfo->suggested_unroll_factor == 1) single_defuse_cycle = true; if (single_defuse_cycle || lane_reduc_code_p) It seems to me that 'ncopies' should include loop_vinfo->suggested_unroll_factor already so the check shouldn't be necessary. Otherwise looks OK to me. Thanks, Richard. > > gcc/ChangeLog: > > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Pass new > argument > suggested_unroll_factor. > (vect_analyze_loop_costing): Likewise. > (_loop_vec_info::_loop_vec_info): Initialize new member > suggested_unroll_factor. > (vect_determine_partial_vectors_and_peeling): Make epilogue of > unrolled > main loop use partial vectors. > (vect_analyze_loop_2): Pass and use new argument > suggested_unroll_factor. > (vect_analyze_loop_1): Likewise. > (vect_analyze_loop): Change to intialize local > suggested_unroll_factor and use it. > (vectorizable_reduction): Don't use single_defuse_cycle when > unrolling. > * tree-vectorizer.h (_loop_vec_info::_loop_vec_info): Add new member > suggested_unroll_factor. > (vector_costs::vector_costs): Add new member > m_suggested_unroll_factor. > (vector_costs::suggested_unroll_factor): New getter function. > (finish_cost): Set return argument suggested_unroll_factor. > > > > Regards, > Andre > > On 07/12/2021 11:27, Andre Vieira (lists) via Gcc-patches wrote: > > Hi, > > > > I've split this particular part off, since it's not only relevant to > > unrolling. The new test shows how this is useful for existing > > (non-unrolling) cases. I also had to fix the costing function, the main_vf / > > epilogue_vf calculations for old and new didn't take into consideration that > > the main_vf could be lower, nor did it take into consideration that they > > were not necessarily always a multiple of each other. So using CEIL here is > > the correct approach. > > > > Bootstrapped and regression tested on aarch64-none-linux-gnu. > > > > OK for trunk? > > > > gcc/ChangeLog: > > > > * tree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for > > epilogue costing. > > (vect_analyze_loop): Re-analyze all modes for epilogues. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/masked_epilogue.c: New test. > > > > On 30/11/2021 13:56, Richard Biener wrote: > >> On Tue, 30 Nov 2021, Andre Vieira (lists) wrote: > >> > >>> On 25/11/2021 12:46, Richard Biener wrote: > >>>> Oops, my fault, yes, it does. I would suggest to refactor things so > >>>> that the mode_i = first_loop_i case is there only once. I also wonder > >>>> if all the argument about starting at 0 doesn't apply to the > >>>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well? So > >>>> what's the reason to differ here? So in the end I'd just change > >>>> the existing > >>>> > >>>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) > >>>> { > >>>> > >>>> to > >>>> > >>>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo) > >>>> || first_loop_vinfo->suggested_unroll_factor > 1) > >>>> { > >>>> > >>>> and maybe revisit this when we have an actual testcase showing that > >>>> doing sth else has a positive effect? > >>>> > >>>> Thanks, > >>>> Richard. > >>> So I had a quick chat with Richard Sandiford and he is suggesting > >>> resetting > >>> mode_i to 0 for all cases. > >>> > >>> He pointed out that for some tunings the SVE mode might come after the > >>> NEON > >>> mode, which means that even for not-unrolled loop_vinfos we could end up > >>> with > >>> a suboptimal choice of mode for the epilogue. I.e. it could be that we > >>> pick > >>> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd > >>> not > >>> try VNx16QI for the epilogue. > >>> > >>> This would simplify the mode selecting cases, by just simply restarting at > >>> mode_i in all epilogue cases. Is that something you'd be OK? > >> Works for me with an updated comment. Even better with showing a > >> testcase exercising such tuning. > >> > >> Richard. > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)