Hi,

on 2024/4/22 17:28, Alexandre Oliva wrote:
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566525.html
> 
> 
> This test expects vectorization at power8+ because strict alignment is
> not required for vectors.  For power7, vectorization is not to take
> place because it's not deemed profitable: 12 iterations would be
> required to make it so.
> 
> But for power6 and below, the test's 10 iterations are enough to make
> vectorization profitable, but the test doesn't expect this.  Assuming
> the decision is indeed appropriate, I'm adjusting the expectations.

For a record, the cost difference between power6 and power7 is the cost
for vec_perm, it's:

* p6 *

ic[i_23] 2 times vector_stmt costs 2 in prologue
ic[i_23] 1 times vector_stmt costs 1 in prologue
ic[i_23] 1 times vector_load costs 2 in body
ic[i_23] 1 times vec_perm costs 1 in body

vs.

* p7 *

ic[i_23] 2 times vector_stmt costs 2 in prologue
ic[i_23] 1 times vector_stmt costs 1 in prologue
ic[i_23] 1 times vector_load costs 2 in body
ic[i_23] 1 times vec_perm costs 3 in body

, it further cause minimum iters for profitability difference.

> 
> 
> for  gcc/testsuite/ChangeLog
> 
>       * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust
>       expectations for cpus below power7.
> ---
>  .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> index cbbfbb24658f8..0dab2c08acdb4 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> @@ -46,9 +46,10 @@ int main (void)
>    return 0;
>  }
>  
> -/* Peeling to align the store is used. Overhead of peeling is too high.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target 
> { vector_alignment_reachable && {! vect_no_align} } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" 
> { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */
> +/* Peeling to align the store is used. Overhead of peeling is too high
> +   for power7, but acceptable for earlier architectures.  */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target 
> { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } 
> } */
> +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" 
> { target { has_arch_pwr7 && { vector_alignment_reachable && {! 
> vect_hw_misalign} } } } } } */
>  
>  /* Versioning to align the store is used. Overhead of versioning is not too 
> high.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_no_align || {! vector_alignment_reachable} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } 
> } } } */

For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line 
above)
shows the original intention of this case is to expect not profitable for 
peeling
so it's not expected to be handled here, can we just tweak the loop bound 
instead,
such as:

-#define N 14
+#define N 13
 #define OFF 4 

?, it can make this loop not profitable to be vectorized for !vect_no_align with
peeling (both pwr7 and pwr6) and keep consistent.

BR,
Kewen

> 
> 

Reply via email to