subject:"\[PATCH\]\[vect\] Use main loop's thresholds and vectorization factor to narrow upper

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches


Thank you Kewen!!

I will apply this now.

BR,
Andre

On 25/05/2021 09:42, Kewen.Lin wrote:

on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:

Hi Andre,

on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses 
an unpredicated (all-true predicate for SVE) main loop and a predicated tail 
loop. The way this was implemented seems to mean it re-uses the same 
vector-mode for both loops, which means the tail loop isn't an actual loop but 
only executes one iteration.

This patch uses the knowledge of the conditions to enter an epilogue loop to 
help come up with a potentially more restricive upper bound.

Regression tested on aarch64-linux-gnu and also ran the testsuite using 
'--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
failures.

Would be good to have this tested for PPC too as I believe they are the main 
users of the --param vect-partial-vector-usage=1 option. Can someone help me 
test (and maybe even benchmark?) this on a PPC target?



Thanks for doing this!  I can test it on Power10 which enables this parameter
by default, also evaluate its impact on SPEC2017 Ofast/unroll.


Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
SPEC2017 run didn't show any remarkable improvement/degradation.

BR,
Kewen

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-05-25 Thread Kewen.Lin via Gcc-patches

on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:
> Hi Andre,
> 
> on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
>> Hi,
>>
>> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer 
>> uses an unpredicated (all-true predicate for SVE) main loop and a predicated 
>> tail loop. The way this was implemented seems to mean it re-uses the same 
>> vector-mode for both loops, which means the tail loop isn't an actual loop 
>> but only executes one iteration.
>>
>> This patch uses the knowledge of the conditions to enter an epilogue loop to 
>> help come up with a potentially more restricive upper bound.
>>
>> Regression tested on aarch64-linux-gnu and also ran the testsuite using 
>> '--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
>> failures.
>>
>> Would be good to have this tested for PPC too as I believe they are the main 
>> users of the --param vect-partial-vector-usage=1 option. Can someone help me 
>> test (and maybe even benchmark?) this on a PPC target?
>>
> 
> 
> Thanks for doing this!  I can test it on Power10 which enables this parameter
> by default, also evaluate its impact on SPEC2017 Ofast/unroll.
> 

Bootstrapped/regtested on powerpc64le-linux-gnu Power10.
SPEC2017 run didn't show any remarkable improvement/degradation.

BR,
Kewen

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-05-24 Thread Richard Sandiford via Gcc-patches

"Andre Vieira (lists)"  writes:
> Hi,
>
> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer 
> uses an unpredicated (all-true predicate for SVE) main loop and a 
> predicated tail loop. The way this was implemented seems to mean it 
> re-uses the same vector-mode for both loops, which means the tail loop 
> isn't an actual loop but only executes one iteration.
>
> This patch uses the knowledge of the conditions to enter an epilogue 
> loop to help come up with a potentially more restricive upper bound.
>
> Regression tested on aarch64-linux-gnu and also ran the testsuite using 
> '--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
> failures.
>
> Would be good to have this tested for PPC too as I believe they are the 
> main users of the --param vect-partial-vector-usage=1 option. Can 
> someone help me test (and maybe even benchmark?) this on a PPC target?
>
> Kind regards,
> Andre

LGTM.  OK if no objections and if the Power testing comes back clean.

Thanks,
Richard

> gcc/ChangeLog:
>
>      * tree-vect-loop.c (vect_transform_loop): Use main loop's 
> various' thresholds
>      to narrow the upper bound on epilogue iterations.
>
> gcc/testsuite/ChangeLog:
>
>      * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.
>
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
> new file mode 100644
> index 
> ..a03229eb55585f637ebd5288fb4c00f8f921d44c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
> +
> +void
> +foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, 
> int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +c[i] = a[i] + b[i];
> +}
> +
> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 
> 1 } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 
> 3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b
>  100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>/* In these calculations the "- 1" converts loop iteration counts
>   back to latch counts.  */
>if (loop->any_upper_bound)
> -loop->nb_iterations_upper_bound
> -  = (final_iter_may_be_partial
> -  ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
> -   lowest_vf) - 1
> -  : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
> -lowest_vf) - 1);
> +{
> +  loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
> +  loop->nb_iterations_upper_bound
> + = (final_iter_may_be_partial
> +? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
> + lowest_vf) - 1
> +: wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
> +  lowest_vf) - 1);
> +  if (main_vinfo)
> + {
> +   unsigned int bound;
> +   poly_uint64 main_iters
> + = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
> +LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
> +   main_iters
> + = upper_bound (main_iters,
> +LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
> +   if (can_div_away_from_zero_p (main_iters,
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> + &bound))
> + loop->nb_iterations_upper_bound
> +   = wi::umin ((widest_int) (bound - 1),
> +   loop->nb_iterations_upper_bound);
> +  }
> +  }
>if (loop->any_likely_upper_bound)
>  loop->nb_iterations_likely_upper_bound
>= (final_iter_may_be_partial

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-05-24 Thread Kewen.Lin via Gcc-patches

Hi Andre,

on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
> Hi,
> 
> When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses 
> an unpredicated (all-true predicate for SVE) main loop and a predicated tail 
> loop. The way this was implemented seems to mean it re-uses the same 
> vector-mode for both loops, which means the tail loop isn't an actual loop 
> but only executes one iteration.
> 
> This patch uses the knowledge of the conditions to enter an epilogue loop to 
> help come up with a potentially more restricive upper bound.
> 
> Regression tested on aarch64-linux-gnu and also ran the testsuite using 
> '--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
> failures.
> 
> Would be good to have this tested for PPC too as I believe they are the main 
> users of the --param vect-partial-vector-usage=1 option. Can someone help me 
> test (and maybe even benchmark?) this on a PPC target?
> 


Thanks for doing this!  I can test it on Power10 which enables this parameter
by default, also evaluate its impact on SPEC2017 Ofast/unroll.

Do you have any preference for the baseline commit?  I'll use r12-0 if it's 
fine.

BR,
Kewen

[PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-05-23 Thread Andre Vieira (lists) via Gcc-patches


Hi,

When vectorizing with --param vect-partial-vector-usage=1 the vectorizer 
uses an unpredicated (all-true predicate for SVE) main loop and a 
predicated tail loop. The way this was implemented seems to mean it 
re-uses the same vector-mode for both loops, which means the tail loop 
isn't an actual loop but only executes one iteration.


This patch uses the knowledge of the conditions to enter an epilogue 
loop to help come up with a potentially more restricive upper bound.


Regression tested on aarch64-linux-gnu and also ran the testsuite using 
'--param vect-partial-vector-usage=1' detecting no ICEs and no execution 
failures.


Would be good to have this tested for PPC too as I believe they are the 
main users of the --param vect-partial-vector-usage=1 option. Can 
someone help me test (and maybe even benchmark?) this on a PPC target?


Kind regards,
Andre

gcc/ChangeLog:

    * tree-vect-loop.c (vect_transform_loop): Use main loop's 
various' thresholds

    to narrow the upper bound on epilogue iterations.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.

diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c 
b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
new file mode 100644
index 
..a03229eb55585f637ebd5288fb4c00f8f921d44c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
+
+void
+foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, 
int n)
+{
+  for (int i = 0; i < n; ++i)
+c[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 
} } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b
 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   /* In these calculations the "- 1" converts loop iteration counts
  back to latch counts.  */
   if (loop->any_upper_bound)
-loop->nb_iterations_upper_bound
-  = (final_iter_may_be_partial
-? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
- lowest_vf) - 1
-: wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
-  lowest_vf) - 1);
+{
+  loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
+  loop->nb_iterations_upper_bound
+   = (final_iter_may_be_partial
+  ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
+   lowest_vf) - 1
+  : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
+lowest_vf) - 1);
+  if (main_vinfo)
+   {
+ unsigned int bound;
+ poly_uint64 main_iters
+   = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
+  LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
+ main_iters
+   = upper_bound (main_iters,
+  LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
+ if (can_div_away_from_zero_p (main_iters,
+   LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+   &bound))
+   loop->nb_iterations_upper_bound
+ = wi::umin ((widest_int) (bound - 1),
+ loop->nb_iterations_upper_bound);
+  }
+  }
   if (loop->any_likely_upper_bound)
 loop->nb_iterations_likely_upper_bound
   = (final_iter_may_be_partial

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

[PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

5 matches

Site Navigation

Mail list logo

Footer information