Hi, on 2024/4/22 17:28, Alexandre Oliva wrote: > Ping? > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566525.html > > > This test expects vectorization at power8+ because strict alignment is > not required for vectors. For power7, vectorization is not to take > place because it's not deemed profitable: 12 iterations would be > required to make it so. > > But for power6 and below, the test's 10 iterations are enough to make > vectorization profitable, but the test doesn't expect this. Assuming > the decision is indeed appropriate, I'm adjusting the expectations.
For a record, the cost difference between power6 and power7 is the cost for vec_perm, it's: * p6 * ic[i_23] 2 times vector_stmt costs 2 in prologue ic[i_23] 1 times vector_stmt costs 1 in prologue ic[i_23] 1 times vector_load costs 2 in body ic[i_23] 1 times vec_perm costs 1 in body vs. * p7 * ic[i_23] 2 times vector_stmt costs 2 in prologue ic[i_23] 1 times vector_stmt costs 1 in prologue ic[i_23] 1 times vector_load costs 2 in body ic[i_23] 1 times vec_perm costs 3 in body , it further cause minimum iters for profitability difference. > > > for gcc/testsuite/ChangeLog > > * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust > expectations for cpus below power7. > --- > .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > index cbbfbb24658f8..0dab2c08acdb4 100644 > --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c > @@ -46,9 +46,10 @@ int main (void) > return 0; > } > > -/* Peeling to align the store is used. Overhead of peeling is too high. */ > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target > { vector_alignment_reachable && {! vect_no_align} } } } } */ > -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" > { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */ > +/* Peeling to align the store is used. Overhead of peeling is too high > + for power7, but acceptable for earlier architectures. */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target > { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } > } */ > +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" > { target { has_arch_pwr7 && { vector_alignment_reachable && {! > vect_hw_misalign} } } } } } */ > > /* Versioning to align the store is used. Overhead of versioning is not too > high. */ > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > { vect_no_align || {! vector_alignment_reachable} } } } } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target > { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } > } } } */ For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line above) shows the original intention of this case is to expect not profitable for peeling so it's not expected to be handled here, can we just tweak the loop bound instead, such as: -#define N 14 +#define N 13 #define OFF 4 ?, it can make this loop not profitable to be vectorized for !vect_no_align with peeling (both pwr7 and pwr6) and keep consistent. BR, Kewen > >