Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
Thank you Kewen!! I will apply this now. BR, Andre On 25/05/2021 09:42, Kewen.Lin wrote: on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote: Hi Andre, on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote: Hi, When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration. This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound. Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures. Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target? Thanks for doing this! I can test it on Power10 which enables this parameter by default, also evaluate its impact on SPEC2017 Ofast/unroll. Bootstrapped/regtested on powerpc64le-linux-gnu Power10. SPEC2017 run didn't show any remarkable improvement/degradation. BR, Kewen
Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote: > Hi Andre, > > on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote: >> Hi, >> >> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer >> uses an unpredicated (all-true predicate for SVE) main loop and a predicated >> tail loop. The way this was implemented seems to mean it re-uses the same >> vector-mode for both loops, which means the tail loop isn't an actual loop >> but only executes one iteration. >> >> This patch uses the knowledge of the conditions to enter an epilogue loop to >> help come up with a potentially more restricive upper bound. >> >> Regression tested on aarch64-linux-gnu and also ran the testsuite using >> '--param vect-partial-vector-usage=1' detecting no ICEs and no execution >> failures. >> >> Would be good to have this tested for PPC too as I believe they are the main >> users of the --param vect-partial-vector-usage=1 option. Can someone help me >> test (and maybe even benchmark?) this on a PPC target? >> > > > Thanks for doing this! I can test it on Power10 which enables this parameter > by default, also evaluate its impact on SPEC2017 Ofast/unroll. > Bootstrapped/regtested on powerpc64le-linux-gnu Power10. SPEC2017 run didn't show any remarkable improvement/degradation. BR, Kewen
Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
"Andre Vieira (lists)" writes: > Hi, > > When vectorizing with --param vect-partial-vector-usage=1 the vectorizer > uses an unpredicated (all-true predicate for SVE) main loop and a > predicated tail loop. The way this was implemented seems to mean it > re-uses the same vector-mode for both loops, which means the tail loop > isn't an actual loop but only executes one iteration. > > This patch uses the knowledge of the conditions to enter an epilogue > loop to help come up with a potentially more restricive upper bound. > > Regression tested on aarch64-linux-gnu and also ran the testsuite using > '--param vect-partial-vector-usage=1' detecting no ICEs and no execution > failures. > > Would be good to have this tested for PPC too as I believe they are the > main users of the --param vect-partial-vector-usage=1 option. Can > someone help me test (and maybe even benchmark?) this on a PPC target? > > Kind regards, > Andre LGTM. OK if no objections and if the Power testing comes back clean. Thanks, Richard > gcc/ChangeLog: > > * tree-vect-loop.c (vect_transform_loop): Use main loop's > various' thresholds > to narrow the upper bound on epilogue iterations. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test. > > diff --git > a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c > b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c > new file mode 100644 > index > ..a03229eb55585f637ebd5288fb4c00f8f921d44c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */ > + > +void > +foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, > int n) > +{ > + for (int i = 0; i < n; ++i) > +c[i] = a[i] + b[i]; > +} > + > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} > 1 } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index > 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b > 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple > *loop_vectorized_call) >/* In these calculations the "- 1" converts loop iteration counts > back to latch counts. */ >if (loop->any_upper_bound) > -loop->nb_iterations_upper_bound > - = (final_iter_may_be_partial > - ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest, > - lowest_vf) - 1 > - : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, > -lowest_vf) - 1); > +{ > + loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); > + loop->nb_iterations_upper_bound > + = (final_iter_may_be_partial > +? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest, > + lowest_vf) - 1 > +: wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, > + lowest_vf) - 1); > + if (main_vinfo) > + { > + unsigned int bound; > + poly_uint64 main_iters > + = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo), > +LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo)); > + main_iters > + = upper_bound (main_iters, > +LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo)); > + if (can_div_away_from_zero_p (main_iters, > + LOOP_VINFO_VECT_FACTOR (loop_vinfo), > + &bound)) > + loop->nb_iterations_upper_bound > + = wi::umin ((widest_int) (bound - 1), > + loop->nb_iterations_upper_bound); > + } > + } >if (loop->any_likely_upper_bound) > loop->nb_iterations_likely_upper_bound >= (final_iter_may_be_partial
Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
Hi Andre, on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote: > Hi, > > When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses > an unpredicated (all-true predicate for SVE) main loop and a predicated tail > loop. The way this was implemented seems to mean it re-uses the same > vector-mode for both loops, which means the tail loop isn't an actual loop > but only executes one iteration. > > This patch uses the knowledge of the conditions to enter an epilogue loop to > help come up with a potentially more restricive upper bound. > > Regression tested on aarch64-linux-gnu and also ran the testsuite using > '--param vect-partial-vector-usage=1' detecting no ICEs and no execution > failures. > > Would be good to have this tested for PPC too as I believe they are the main > users of the --param vect-partial-vector-usage=1 option. Can someone help me > test (and maybe even benchmark?) this on a PPC target? > Thanks for doing this! I can test it on Power10 which enables this parameter by default, also evaluate its impact on SPEC2017 Ofast/unroll. Do you have any preference for the baseline commit? I'll use r12-0 if it's fine. BR, Kewen
[PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue
Hi, When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration. This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound. Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures. Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target? Kind regards, Andre gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Use main loop's various' thresholds to narrow the upper bound on epilogue iterations. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c new file mode 100644 index ..a03229eb55585f637ebd5288fb4c00f8f921d44c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */ + +void +foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, int n) +{ + for (int i = 0; i < n; ++i) +c[i] = a[i] + b[i]; +} + +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* In these calculations the "- 1" converts loop iteration counts back to latch counts. */ if (loop->any_upper_bound) -loop->nb_iterations_upper_bound - = (final_iter_may_be_partial -? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest, - lowest_vf) - 1 -: wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, - lowest_vf) - 1); +{ + loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); + loop->nb_iterations_upper_bound + = (final_iter_may_be_partial + ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest, + lowest_vf) - 1 + : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, +lowest_vf) - 1); + if (main_vinfo) + { + unsigned int bound; + poly_uint64 main_iters + = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo), + LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo)); + main_iters + = upper_bound (main_iters, + LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo)); + if (can_div_away_from_zero_p (main_iters, + LOOP_VINFO_VECT_FACTOR (loop_vinfo), + &bound)) + loop->nb_iterations_upper_bound + = wi::umin ((widest_int) (bound - 1), + loop->nb_iterations_upper_bound); + } + } if (loop->any_likely_upper_bound) loop->nb_iterations_likely_upper_bound = (final_iter_may_be_partial