Re: [PATCH] Improve vectorizer peeling for alignment costmodel

2017-05-09 Thread Richard Biener
On Tue, 9 May 2017, Richard Biener wrote:

> On Fri, 5 May 2017, Christophe Lyon wrote:
> 
> > Hi Richard,
> > 
> > 
> > On 3 May 2017 at 10:19, Richard Biener  wrote:
> > >
> > > The following extends the very simplistic cost modeling I added somewhen
> > > late in the release process to, for all unknown misaligned refs, also
> > > apply this model for loops containing stores.
> > >
> > > The model basically says it's useless to peel for alignment if there's
> > > only a single DR that is affected or if, in case we'll end up using
> > > hw-supported misaligned loads, the cost of misaligned loads is the same
> > > as of aligned ones.  Previously we'd usually align one of the stores
> > > with the theory that this improves (precious) store-bandwith.
> > >
> > > Note this is only a so slightly conservative (aka less peeling).  We'll
> > > still apply peeling for alignment if you make the testcase use +=
> > > because then we'll align both the load and the store from v1.
> > >
> > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > >
> > > Richard.
> > >
> > > 2017-05-03  Richard Biener  
> > >
> > > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> > > When all DRs have unknown misaligned do not always peel
> > > when there is a store but apply the same costing model as if
> > > there were only loads.
> > >
> > > * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New 
> > > testcase.
> > >
> > 
> > This patch (r247544) caused regressions on aarch64 and arm:
> >   - PASS now FAIL [PASS => FAIL]:
> > 
> >   Executed from: gcc.dg/vect/vect.exp
> > gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 2
> > gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of
> > access forced using peeling" 1
> > gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an
> > unaligned access" 2
> > gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 2
> > gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of
> > access forced using peeling" 1
> > gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an
> > unaligned access" 2
> 
> Ok, so the reason is that we no longer peel for alignment for
> 
>   for (i = 0; i < N; i++)
> {
>   pa[i] = pb[i] * pc[i];
> }
> 
> which is probably good.  This is because the generic aarch64 cost model
> (and probaby also arm) has
> 
>   1, /* vec_align_load_cost  */
>   1, /* vec_unalign_load_cost  */
>   1, /* vec_unalign_store_cost  */
>   1, /* vec_store_cost  */
> 
> so there's no benefit in aligning.  x86 generic tuning has
> 
>   1,/* vec_align_load_cost.  */
>   2,/* vec_unalign_load_cost.  */
>   1,/* vec_store_cost.  */
> 
> and vec_unalign_store_cost sharing with vec_unalign_load_cost.
> That makes us still apply peeling.
> 
> Fixing this with vect_ testsuite conditions is going to be tricky
> so the easiest is to simply disable peeling here.
> 
> Tested on aarch64 and x86_64, committed.
> 
> Richard.
> 
> 2017-05-09  Richard Biener  
> 
>   * gcc.dg/vect/vect-44.c: Add --param vect-max-peeling-for-alignment=0
>   and adjust.
>   * gcc.dg/vect/vect-50.c: Likewise.
> 
> Index: gcc/testsuite/gcc.dg/vect/vect-44.c
> ===
> --- gcc/testsuite/gcc.dg/vect/vect-44.c   (revision 247782)
> +++ gcc/testsuite/gcc.dg/vect/vect-44.c   (working copy)
> @@ -1,6 +1,7 @@
> +/* { dg-do compile } */

Without these changes.  Those were for aarch64 cross testing.

Richard.

>  /* { dg-require-effective-target vect_float } */
> +/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
>  
> -#include 
>  #include "tree-vect.h"
>  
>  #define N 256
> @@ -65,7 +66,7 @@ int main (void)
> two loads to be aligned).  */
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 
> "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> -/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
> peeling" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {! 
> vector_alignment_reachable} } } } } */
> +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 
> "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
> +/* { dg-final { scan-tree-dump-times "Alignment of access 

Re: [PATCH] Improve vectorizer peeling for alignment costmodel

2017-05-09 Thread Richard Biener
On Fri, 5 May 2017, Christophe Lyon wrote:

> Hi Richard,
> 
> 
> On 3 May 2017 at 10:19, Richard Biener  wrote:
> >
> > The following extends the very simplistic cost modeling I added somewhen
> > late in the release process to, for all unknown misaligned refs, also
> > apply this model for loops containing stores.
> >
> > The model basically says it's useless to peel for alignment if there's
> > only a single DR that is affected or if, in case we'll end up using
> > hw-supported misaligned loads, the cost of misaligned loads is the same
> > as of aligned ones.  Previously we'd usually align one of the stores
> > with the theory that this improves (precious) store-bandwith.
> >
> > Note this is only a so slightly conservative (aka less peeling).  We'll
> > still apply peeling for alignment if you make the testcase use +=
> > because then we'll align both the load and the store from v1.
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2017-05-03  Richard Biener  
> >
> > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> > When all DRs have unknown misaligned do not always peel
> > when there is a store but apply the same costing model as if
> > there were only loads.
> >
> > * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase.
> >
> 
> This patch (r247544) caused regressions on aarch64 and arm:
>   - PASS now FAIL [PASS => FAIL]:
> 
>   Executed from: gcc.dg/vect/vect.exp
> gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Vectorizing an unaligned access" 2
> gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of
> access forced using peeling" 1
> gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an
> unaligned access" 2
> gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Vectorizing an unaligned access" 2
> gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of
> access forced using peeling" 1
> gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an
> unaligned access" 2

Ok, so the reason is that we no longer peel for alignment for

  for (i = 0; i < N; i++)
{
  pa[i] = pb[i] * pc[i];
}

which is probably good.  This is because the generic aarch64 cost model
(and probaby also arm) has

  1, /* vec_align_load_cost  */
  1, /* vec_unalign_load_cost  */
  1, /* vec_unalign_store_cost  */
  1, /* vec_store_cost  */

so there's no benefit in aligning.  x86 generic tuning has

  1,/* vec_align_load_cost.  */
  2,/* vec_unalign_load_cost.  */
  1,/* vec_store_cost.  */

and vec_unalign_store_cost sharing with vec_unalign_load_cost.
That makes us still apply peeling.

Fixing this with vect_ testsuite conditions is going to be tricky
so the easiest is to simply disable peeling here.

Tested on aarch64 and x86_64, committed.

Richard.

2017-05-09  Richard Biener  

* gcc.dg/vect/vect-44.c: Add --param vect-max-peeling-for-alignment=0
and adjust.
* gcc.dg/vect/vect-50.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-44.c
===
--- gcc/testsuite/gcc.dg/vect/vect-44.c (revision 247782)
+++ gcc/testsuite/gcc.dg/vect/vect-44.c (working copy)
@@ -1,6 +1,7 @@
+/* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "--param vect-max-peeling-for-alignment=0" } */
 
-#include 
 #include "tree-vect.h"
 
 #define N 256
@@ -65,7 +66,7 @@ int main (void)
two loads to be aligned).  */
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 
"vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {! 
vector_alignment_reachable} } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 3 
"vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 0 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || {! 
vector_alignment_reachable} } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 
versioning." 3 "vect" { target { vect_no_align && { ! vect_hw_misalign } } } } 
} */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using 

Re: [PATCH] Improve vectorizer peeling for alignment costmodel

2017-05-05 Thread Christophe Lyon
Hi Richard,


On 3 May 2017 at 10:19, Richard Biener  wrote:
>
> The following extends the very simplistic cost modeling I added somewhen
> late in the release process to, for all unknown misaligned refs, also
> apply this model for loops containing stores.
>
> The model basically says it's useless to peel for alignment if there's
> only a single DR that is affected or if, in case we'll end up using
> hw-supported misaligned loads, the cost of misaligned loads is the same
> as of aligned ones.  Previously we'd usually align one of the stores
> with the theory that this improves (precious) store-bandwith.
>
> Note this is only a so slightly conservative (aka less peeling).  We'll
> still apply peeling for alignment if you make the testcase use +=
> because then we'll align both the load and the store from v1.
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2017-05-03  Richard Biener  
>
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> When all DRs have unknown misaligned do not always peel
> when there is a store but apply the same costing model as if
> there were only loads.
>
> * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase.
>

This patch (r247544) caused regressions on aarch64 and arm:
  - PASS now FAIL [PASS => FAIL]:

  Executed from: gcc.dg/vect/vect.exp
gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Alignment of access forced using peeling" 1
gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Vectorizing an unaligned access" 2
gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of
access forced using peeling" 1
gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an
unaligned access" 2
gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Alignment of access forced using peeling" 1
gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Vectorizing an unaligned access" 2
gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of
access forced using peeling" 1
gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an
unaligned access" 2

Thanks,

Christophe

> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   (revision 247498)
> +++ gcc/tree-vect-data-refs.c   (working copy)
> @@ -1715,18 +1741,18 @@ vect_enhance_data_refs_alignment (loop_v
>  dr0 = first_store;
>  }
>
> -  /* In case there are only loads with different unknown misalignments, 
> use
> - peeling only if it may help to align other accesses in the loop or
> +  /* Use peeling only if it may help to align other accesses in the loop 
> or
>  if it may help improving load bandwith when we'd end up using
>  unaligned loads.  */
>tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
> -  if (!first_store
> - && !STMT_VINFO_SAME_ALIGN_REFS (
> - vinfo_for_stmt (DR_STMT (dr0))).length ()
> +  if (STMT_VINFO_SAME_ALIGN_REFS
> +   (vinfo_for_stmt (DR_STMT (dr0))).length () == 0
>   && (vect_supportable_dr_alignment (dr0, false)
>   != dr_unaligned_supported
> - || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
> - == builtin_vectorization_cost (unaligned_load, dr0_vt, 
> -1
> + || (DR_IS_READ (dr0)
> + && (builtin_vectorization_cost (vector_load, dr0_vt, 0)
> + == builtin_vectorization_cost (unaligned_load,
> +dr0_vt, -1)
>  do_peeling = false;
>  }
>
>
> Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c
> ===
> --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c
> (nonexistent)
> +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c
> (working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +
> +void func(double * __restrict__ v1, double * v2, unsigned n)
> +{
> +  for (unsigned i = 0; i < n; ++i)
> +v1[i] = v2[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> peeling" "vect" } } */