Re: [PATCH] Improve vectorizer peeling for alignment costmodel
On Tue, 9 May 2017, Richard Biener wrote: > On Fri, 5 May 2017, Christophe Lyon wrote: > > > Hi Richard, > > > > > > On 3 May 2017 at 10:19, Richard Bienerwrote: > > > > > > The following extends the very simplistic cost modeling I added somewhen > > > late in the release process to, for all unknown misaligned refs, also > > > apply this model for loops containing stores. > > > > > > The model basically says it's useless to peel for alignment if there's > > > only a single DR that is affected or if, in case we'll end up using > > > hw-supported misaligned loads, the cost of misaligned loads is the same > > > as of aligned ones. Previously we'd usually align one of the stores > > > with the theory that this improves (precious) store-bandwith. > > > > > > Note this is only a so slightly conservative (aka less peeling). We'll > > > still apply peeling for alignment if you make the testcase use += > > > because then we'll align both the load and the store from v1. > > > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > > > Richard. > > > > > > 2017-05-03 Richard Biener > > > > > > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): > > > When all DRs have unknown misaligned do not always peel > > > when there is a store but apply the same costing model as if > > > there were only loads. > > > > > > * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New > > > testcase. > > > > > > > This patch (r247544) caused regressions on aarch64 and arm: > > - PASS now FAIL [PASS => FAIL]: > > > > Executed from: gcc.dg/vect/vect.exp > > gcc.dg/vect/vect-44.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > > gcc.dg/vect/vect-44.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Vectorizing an unaligned access" 2 > > gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of > > access forced using peeling" 1 > > gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 2 > > gcc.dg/vect/vect-50.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > > gcc.dg/vect/vect-50.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "Vectorizing an unaligned access" 2 > > gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of > > access forced using peeling" 1 > > gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an > > unaligned access" 2 > > Ok, so the reason is that we no longer peel for alignment for > > for (i = 0; i < N; i++) > { > pa[i] = pb[i] * pc[i]; > } > > which is probably good. This is because the generic aarch64 cost model > (and probaby also arm) has > > 1, /* vec_align_load_cost */ > 1, /* vec_unalign_load_cost */ > 1, /* vec_unalign_store_cost */ > 1, /* vec_store_cost */ > > so there's no benefit in aligning. x86 generic tuning has > > 1,/* vec_align_load_cost. */ > 2,/* vec_unalign_load_cost. */ > 1,/* vec_store_cost. */ > > and vec_unalign_store_cost sharing with vec_unalign_load_cost. > That makes us still apply peeling. > > Fixing this with vect_ testsuite conditions is going to be tricky > so the easiest is to simply disable peeling here. > > Tested on aarch64 and x86_64, committed. > > Richard. > > 2017-05-09 Richard Biener > > * gcc.dg/vect/vect-44.c: Add --param vect-max-peeling-for-alignment=0 > and adjust. > * gcc.dg/vect/vect-50.c: Likewise. > > Index: gcc/testsuite/gcc.dg/vect/vect-44.c > === > --- gcc/testsuite/gcc.dg/vect/vect-44.c (revision 247782) > +++ gcc/testsuite/gcc.dg/vect/vect-44.c (working copy) > @@ -1,6 +1,7 @@ > +/* { dg-do compile } */ Without these changes. Those were for aarch64 cross testing. Richard. > /* { dg-require-effective-target vect_float } */ > +/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */ > > -#include > #include "tree-vect.h" > > #define N 256 > @@ -65,7 +66,7 @@ int main (void) > two loads to be aligned). */ > > /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > -/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 > "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ > -/* { dg-final { scan-tree-dump-times "Alignment of access forced using > peeling" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {! > vector_alignment_reachable} } } } } */ > +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 > "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ > +/* { dg-final { scan-tree-dump-times "Alignment of access
Re: [PATCH] Improve vectorizer peeling for alignment costmodel
On Fri, 5 May 2017, Christophe Lyon wrote: > Hi Richard, > > > On 3 May 2017 at 10:19, Richard Bienerwrote: > > > > The following extends the very simplistic cost modeling I added somewhen > > late in the release process to, for all unknown misaligned refs, also > > apply this model for loops containing stores. > > > > The model basically says it's useless to peel for alignment if there's > > only a single DR that is affected or if, in case we'll end up using > > hw-supported misaligned loads, the cost of misaligned loads is the same > > as of aligned ones. Previously we'd usually align one of the stores > > with the theory that this improves (precious) store-bandwith. > > > > Note this is only a so slightly conservative (aka less peeling). We'll > > still apply peeling for alignment if you make the testcase use += > > because then we'll align both the load and the store from v1. > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > Richard. > > > > 2017-05-03 Richard Biener > > > > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): > > When all DRs have unknown misaligned do not always peel > > when there is a store but apply the same costing model as if > > there were only loads. > > > > * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase. > > > > This patch (r247544) caused regressions on aarch64 and arm: > - PASS now FAIL [PASS => FAIL]: > > Executed from: gcc.dg/vect/vect.exp > gcc.dg/vect/vect-44.c -flto -ffat-lto-objects > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > gcc.dg/vect/vect-44.c -flto -ffat-lto-objects > scan-tree-dump-times vect "Vectorizing an unaligned access" 2 > gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of > access forced using peeling" 1 > gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an > unaligned access" 2 > gcc.dg/vect/vect-50.c -flto -ffat-lto-objects > scan-tree-dump-times vect "Alignment of access forced using peeling" 1 > gcc.dg/vect/vect-50.c -flto -ffat-lto-objects > scan-tree-dump-times vect "Vectorizing an unaligned access" 2 > gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of > access forced using peeling" 1 > gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an > unaligned access" 2 Ok, so the reason is that we no longer peel for alignment for for (i = 0; i < N; i++) { pa[i] = pb[i] * pc[i]; } which is probably good. This is because the generic aarch64 cost model (and probaby also arm) has 1, /* vec_align_load_cost */ 1, /* vec_unalign_load_cost */ 1, /* vec_unalign_store_cost */ 1, /* vec_store_cost */ so there's no benefit in aligning. x86 generic tuning has 1,/* vec_align_load_cost. */ 2,/* vec_unalign_load_cost. */ 1,/* vec_store_cost. */ and vec_unalign_store_cost sharing with vec_unalign_load_cost. That makes us still apply peeling. Fixing this with vect_ testsuite conditions is going to be tricky so the easiest is to simply disable peeling here. Tested on aarch64 and x86_64, committed. Richard. 2017-05-09 Richard Biener * gcc.dg/vect/vect-44.c: Add --param vect-max-peeling-for-alignment=0 and adjust. * gcc.dg/vect/vect-50.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/vect-44.c === --- gcc/testsuite/gcc.dg/vect/vect-44.c (revision 247782) +++ gcc/testsuite/gcc.dg/vect/vect-44.c (working copy) @@ -1,6 +1,7 @@ +/* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ +/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */ -#include #include "tree-vect.h" #define N 256 @@ -65,7 +66,7 @@ int main (void) two loads to be aligned). */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ -/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {! vector_alignment_reachable} } } } } */ +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ +/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 0 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {! vector_alignment_reachable} } } } } */ /* { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 3 "vect" { target { vect_no_align && { ! vect_hw_misalign } } } } } */ /* { dg-final { scan-tree-dump-times "Alignment of access forced using
Re: [PATCH] Improve vectorizer peeling for alignment costmodel
Hi Richard, On 3 May 2017 at 10:19, Richard Bienerwrote: > > The following extends the very simplistic cost modeling I added somewhen > late in the release process to, for all unknown misaligned refs, also > apply this model for loops containing stores. > > The model basically says it's useless to peel for alignment if there's > only a single DR that is affected or if, in case we'll end up using > hw-supported misaligned loads, the cost of misaligned loads is the same > as of aligned ones. Previously we'd usually align one of the stores > with the theory that this improves (precious) store-bandwith. > > Note this is only a so slightly conservative (aka less peeling). We'll > still apply peeling for alignment if you make the testcase use += > because then we'll align both the load and the store from v1. > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > Richard. > > 2017-05-03 Richard Biener > > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): > When all DRs have unknown misaligned do not always peel > when there is a store but apply the same costing model as if > there were only loads. > > * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase. > This patch (r247544) caused regressions on aarch64 and arm: - PASS now FAIL [PASS => FAIL]: Executed from: gcc.dg/vect/vect.exp gcc.dg/vect/vect-44.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 gcc.dg/vect/vect-44.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 2 gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an unaligned access" 2 gcc.dg/vect/vect-50.c -flto -ffat-lto-objects scan-tree-dump-times vect "Alignment of access forced using peeling" 1 gcc.dg/vect/vect-50.c -flto -ffat-lto-objects scan-tree-dump-times vect "Vectorizing an unaligned access" 2 gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of access forced using peeling" 1 gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an unaligned access" 2 Thanks, Christophe > Index: gcc/tree-vect-data-refs.c > === > --- gcc/tree-vect-data-refs.c (revision 247498) > +++ gcc/tree-vect-data-refs.c (working copy) > @@ -1715,18 +1741,18 @@ vect_enhance_data_refs_alignment (loop_v > dr0 = first_store; > } > > - /* In case there are only loads with different unknown misalignments, > use > - peeling only if it may help to align other accesses in the loop or > + /* Use peeling only if it may help to align other accesses in the loop > or > if it may help improving load bandwith when we'd end up using > unaligned loads. */ >tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0))); > - if (!first_store > - && !STMT_VINFO_SAME_ALIGN_REFS ( > - vinfo_for_stmt (DR_STMT (dr0))).length () > + if (STMT_VINFO_SAME_ALIGN_REFS > + (vinfo_for_stmt (DR_STMT (dr0))).length () == 0 > && (vect_supportable_dr_alignment (dr0, false) > != dr_unaligned_supported > - || (builtin_vectorization_cost (vector_load, dr0_vt, 0) > - == builtin_vectorization_cost (unaligned_load, dr0_vt, > -1 > + || (DR_IS_READ (dr0) > + && (builtin_vectorization_cost (vector_load, dr0_vt, 0) > + == builtin_vectorization_cost (unaligned_load, > +dr0_vt, -1) > do_peeling = false; > } > > > Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c > === > --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c > (nonexistent) > +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c > (working copy) > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > + > +void func(double * __restrict__ v1, double * v2, unsigned n) > +{ > + for (unsigned i = 0; i < n; ++i) > +v1[i] = v2[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "Alignment of access forced using > peeling" "vect" } } */