Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-03 Thread Richard Biener
On Fri, 3 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener  writes:
> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >> >
> >> >> >> Richard Biener  writes:
> >> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >> >> >
> >> >> >> >> Sorry for the late reply, but:
> >> >> >> >> 
> >> >> >> >> Richard Biener  writes:
> >> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >> >> >> >
> >> >> >> >> >> 
> >> >> >> >> >> Currently we force peeling for gaps whenever element
> >> >> >> >> >> overrun can occur
> >> >> >> >> >> but for aligned accesses we know that the loads won't trap
> >> >> >> >> >> and thus
> >> >> >> >> >> we can avoid this.
> >> >> >> >> >> 
> >> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu
> >> >> >> >> >> (I expect
> >> >> >> >> >> some testsuite fallout here so didn't bother to invent a
> >> >> >> >> >> new testcase).
> >> >> >> >> >> 
> >> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in 
> >> >> >> >> >> general
> >> >> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >> >> >> >
> >> >> >> >> > This is what I applied.
> >> >> >> >> >
> >> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >> >> >> >
> >> >> >> >> > Richard.
> >> >> >> >> [...]
> >> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> >> >> >> > index 15aec21..c29e73d 100644
> >> >> >> >> > --- a/gcc/tree-vect-stmts.c
> >> >> >> >> > +++ b/gcc/tree-vect-stmts.c
> >> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, 
> >> >> >> >> > tree vectype, bool slp,
> >> >> >> >> >/* If there is a gap at the end of the group then these 
> >> >> >> >> > optimizations
> >> >> >> >> >  would access excess elements in the last iteration.  */
> >> >> >> >> >bool would_overrun_p = (gap != 0);
> >> >> >> >> > +  /* If the access is aligned an overrun is fine.  */
> >> >> >> >> > +  if (would_overrun_p
> >> >> >> >> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> >> >> >> > +   would_overrun_p = false;
> >> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >> >> >> >   && (can_overrun_p || !would_overrun_p)
> >> >> >> >> >   && compare_step_with_zero (stmt) > 0)
> >> >> >> >> 
> >> >> >> >> ...is this right for all cases?  I think it only looks for
> >> >> >> >> single-vector
> >> >> >> >> alignment, but the gap can in principle be vector-sized or larger,
> >> >> >> >> at least for load-lanes.
> >> >> >> >>
> >> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> >> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the 
> >> >> >> >> group
> >> >> >> >> spans two vectors and we have no guarantee that the second one
> >> >> >> >> is mapped.
> >> >> >> >
> >> >> >> > The check assumes that if aligned_access_p () returns true then the
> >> >> >> > whole access is aligned in a way that it can't cross page 
> >> >> >> > boundaries.
> >> >> >> > That's of course not the case if alignment is 16 bytes but the 
> >> >> >> > access
> >> >> >> > will be a multiple of that.
> >> >> >> >  
> >> >> >> >> I haven't been able to come up with a testcase though.  We seem 
> >> >> >> >> to be
> >> >> >> >> overly conservative when computing alignments.
> >> >> >> >
> >> >> >> > Not sure if we can run into this with load-lanes given that bumps 
> >> >> >> > the
> >> >> >> > vectorization factor.  Also does load-lane work with gaps?
> >> >> >> >
> >> >> >> > I think that gap can never be larger than nunits-1 so it is by 
> >> >> >> > definition
> >> >> >> > in the last "vector" independent of the VF.
> >> >> >> >
> >> >> >> > Classical gap case is
> >> >> >> >
> >> >> >> > for (i=0; i >> >> >> >  {
> >> >> >> >y[3*i + 0] = x[4*i + 0];
> >> >> >> >y[3*i + 1] = x[4*i + 1];
> >> >> >> >y[3*i + 2] = x[4*i + 2];
> >> >> >> >  }
> >> >> >> >
> >> >> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> >> >> >> > the y's different streams and you should get the perfect case for
> >> >> >> > load-lane:
> >> >> >> >
> >> >> >> > for (i=0; i >> >> >> >  {
> >> >> >> >y[i] = x[4*i + 0];
> >> >> >> >z[i] = x[4*i + 1];
> >> >> >> >w[i] = x[4*i + 2];
> >> >> >> >  } 
> >> >> >> >
> >> >> >> > previously we'd peel at least 4 iterations into the epilogue for
> >> >> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> >> >> >> > ok.
> >> >> >> 
> >> >> >> The case I was thinking of was like the second, but with the
> >> >> >> element type being DI or DF and with the + 2 statement removed.
> >> >> >> E.g.:
> >> >> >> 
> >> >> >> double 

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-03 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >
>> >> >> Richard Biener  writes:
>> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >> >
>> >> >> >> Sorry for the late reply, but:
>> >> >> >> 
>> >> >> >> Richard Biener  writes:
>> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >> >> >
>> >> >> >> >> 
>> >> >> >> >> Currently we force peeling for gaps whenever element
>> >> >> >> >> overrun can occur
>> >> >> >> >> but for aligned accesses we know that the loads won't trap
>> >> >> >> >> and thus
>> >> >> >> >> we can avoid this.
>> >> >> >> >> 
>> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu
>> >> >> >> >> (I expect
>> >> >> >> >> some testsuite fallout here so didn't bother to invent a
>> >> >> >> >> new testcase).
>> >> >> >> >> 
>> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >> >> >
>> >> >> >> > This is what I applied.
>> >> >> >> >
>> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >> >> >
>> >> >> >> > Richard.
>> >> >> >> [...]
>> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> >> >> > index 15aec21..c29e73d 100644
>> >> >> >> > --- a/gcc/tree-vect-stmts.c
>> >> >> >> > +++ b/gcc/tree-vect-stmts.c
>> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, 
>> >> >> >> > tree vectype, bool slp,
>> >> >> >> >/* If there is a gap at the end of the group then these 
>> >> >> >> > optimizations
>> >> >> >> >would access excess elements in the last iteration.  */
>> >> >> >> >bool would_overrun_p = (gap != 0);
>> >> >> >> > +  /* If the access is aligned an overrun is fine.  */
>> >> >> >> > +  if (would_overrun_p
>> >> >> >> > +   && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> >> >> > + would_overrun_p = false;
>> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >> >> > && (can_overrun_p || !would_overrun_p)
>> >> >> >> > && compare_step_with_zero (stmt) > 0)
>> >> >> >> 
>> >> >> >> ...is this right for all cases?  I think it only looks for
>> >> >> >> single-vector
>> >> >> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> >> >> at least for load-lanes.
>> >> >> >>
>> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the 
>> >> >> >> group
>> >> >> >> spans two vectors and we have no guarantee that the second one
>> >> >> >> is mapped.
>> >> >> >
>> >> >> > The check assumes that if aligned_access_p () returns true then the
>> >> >> > whole access is aligned in a way that it can't cross page boundaries.
>> >> >> > That's of course not the case if alignment is 16 bytes but the access
>> >> >> > will be a multiple of that.
>> >> >> >  
>> >> >> >> I haven't been able to come up with a testcase though.  We seem to 
>> >> >> >> be
>> >> >> >> overly conservative when computing alignments.
>> >> >> >
>> >> >> > Not sure if we can run into this with load-lanes given that bumps the
>> >> >> > vectorization factor.  Also does load-lane work with gaps?
>> >> >> >
>> >> >> > I think that gap can never be larger than nunits-1 so it is by 
>> >> >> > definition
>> >> >> > in the last "vector" independent of the VF.
>> >> >> >
>> >> >> > Classical gap case is
>> >> >> >
>> >> >> > for (i=0; i> >> >> >  {
>> >> >> >y[3*i + 0] = x[4*i + 0];
>> >> >> >y[3*i + 1] = x[4*i + 1];
>> >> >> >y[3*i + 2] = x[4*i + 2];
>> >> >> >  }
>> >> >> >
>> >> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> >> >> > the y's different streams and you should get the perfect case for
>> >> >> > load-lane:
>> >> >> >
>> >> >> > for (i=0; i> >> >> >  {
>> >> >> >y[i] = x[4*i + 0];
>> >> >> >z[i] = x[4*i + 1];
>> >> >> >w[i] = x[4*i + 2];
>> >> >> >  } 
>> >> >> >
>> >> >> > previously we'd peel at least 4 iterations into the epilogue for
>> >> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> >> >> > ok.
>> >> >> 
>> >> >> The case I was thinking of was like the second, but with the
>> >> >> element type being DI or DF and with the + 2 statement removed.
>> >> >> E.g.:
>> >> >> 
>> >> >> double __attribute__((noinline))
>> >> >> foo (double *a)
>> >> >> {
>> >> >>   double res = 0.0;
>> >> >>   for (int n = 0; n < 256; n += 4)
>> >> >> res += a[n] + a[n + 1];
>> >> >>   return res;
>> >> >> }
>> >> >> 
>> >> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> >> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> >> 

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-02 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener  writes:
> >> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >> >
> >> >> >> Sorry for the late reply, but:
> >> >> >> 
> >> >> >> Richard Biener  writes:
> >> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >> >> >
> >> >> >> >> 
> >> >> >> >> Currently we force peeling for gaps whenever element overrun can 
> >> >> >> >> occur
> >> >> >> >> but for aligned accesses we know that the loads won't trap and 
> >> >> >> >> thus
> >> >> >> >> we can avoid this.
> >> >> >> >> 
> >> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I 
> >> >> >> >> expect
> >> >> >> >> some testsuite fallout here so didn't bother to invent a new 
> >> >> >> >> testcase).
> >> >> >> >> 
> >> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
> >> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >> >> >
> >> >> >> > This is what I applied.
> >> >> >> >
> >> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >> >> >
> >> >> >> > Richard.
> >> >> >> [...]
> >> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> >> >> > index 15aec21..c29e73d 100644
> >> >> >> > --- a/gcc/tree-vect-stmts.c
> >> >> >> > +++ b/gcc/tree-vect-stmts.c
> >> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, 
> >> >> >> > tree vectype, bool slp,
> >> >> >> >/* If there is a gap at the end of the group then these 
> >> >> >> > optimizations
> >> >> >> > would access excess elements in the last iteration.  */
> >> >> >> >bool would_overrun_p = (gap != 0);
> >> >> >> > +  /* If the access is aligned an overrun is fine.  */
> >> >> >> > +  if (would_overrun_p
> >> >> >> > +&& aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> >> >> > +  would_overrun_p = false;
> >> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >> >> >  && (can_overrun_p || !would_overrun_p)
> >> >> >> >  && compare_step_with_zero (stmt) > 0)
> >> >> >> 
> >> >> >> ...is this right for all cases?  I think it only looks for 
> >> >> >> single-vector
> >> >> >> alignment, but the gap can in principle be vector-sized or larger,
> >> >> >> at least for load-lanes.
> >> >> >>
> >> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> >> >> >> spans two vectors and we have no guarantee that the second one
> >> >> >> is mapped.
> >> >> >
> >> >> > The check assumes that if aligned_access_p () returns true then the
> >> >> > whole access is aligned in a way that it can't cross page boundaries.
> >> >> > That's of course not the case if alignment is 16 bytes but the access
> >> >> > will be a multiple of that.
> >> >> >  
> >> >> >> I haven't been able to come up with a testcase though.  We seem to be
> >> >> >> overly conservative when computing alignments.
> >> >> >
> >> >> > Not sure if we can run into this with load-lanes given that bumps the
> >> >> > vectorization factor.  Also does load-lane work with gaps?
> >> >> >
> >> >> > I think that gap can never be larger than nunits-1 so it is by 
> >> >> > definition
> >> >> > in the last "vector" independent of the VF.
> >> >> >
> >> >> > Classical gap case is
> >> >> >
> >> >> > for (i=0; i >> >> >  {
> >> >> >y[3*i + 0] = x[4*i + 0];
> >> >> >y[3*i + 1] = x[4*i + 1];
> >> >> >y[3*i + 2] = x[4*i + 2];
> >> >> >  }
> >> >> >
> >> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> >> >> > the y's different streams and you should get the perfect case for
> >> >> > load-lane:
> >> >> >
> >> >> > for (i=0; i >> >> >  {
> >> >> >y[i] = x[4*i + 0];
> >> >> >z[i] = x[4*i + 1];
> >> >> >w[i] = x[4*i + 2];
> >> >> >  } 
> >> >> >
> >> >> > previously we'd peel at least 4 iterations into the epilogue for
> >> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> >> >> > ok.
> >> >> 
> >> >> The case I was thinking of was like the second, but with the
> >> >> element type being DI or DF and with the + 2 statement removed.
> >> >> E.g.:
> >> >> 
> >> >> double __attribute__((noinline))
> >> >> foo (double *a)
> >> >> {
> >> >>   double res = 0.0;
> >> >>   for (int n = 0; n < 256; n += 4)
> >> >> res += a[n] + a[n + 1];
> >> >>   return res;
> >> >> }
> >> >> 
> >> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
> >> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
> >> >> and a[n + 3].
> >> >
> >> > Yes, indeed.  It's safe when peeling for gaps would remove
> >> > N < alignof (ref) / sizeof (ref) scalar iterations.
> >> >
> >> > Peeling for gaps 

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >> >
>> >> >> Sorry for the late reply, but:
>> >> >> 
>> >> >> Richard Biener  writes:
>> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >> >
>> >> >> >> 
>> >> >> >> Currently we force peeling for gaps whenever element overrun can 
>> >> >> >> occur
>> >> >> >> but for aligned accesses we know that the loads won't trap and thus
>> >> >> >> we can avoid this.
>> >> >> >> 
>> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> >> >> some testsuite fallout here so didn't bother to invent a new 
>> >> >> >> testcase).
>> >> >> >> 
>> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >> >
>> >> >> > This is what I applied.
>> >> >> >
>> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >> >
>> >> >> > Richard.
>> >> >> [...]
>> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> >> > index 15aec21..c29e73d 100644
>> >> >> > --- a/gcc/tree-vect-stmts.c
>> >> >> > +++ b/gcc/tree-vect-stmts.c
>> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
>> >> >> > vectype, bool slp,
>> >> >> >/* If there is a gap at the end of the group then these 
>> >> >> > optimizations
>> >> >> >   would access excess elements in the last iteration.  */
>> >> >> >bool would_overrun_p = (gap != 0);
>> >> >> > +  /* If the access is aligned an overrun is fine.  */
>> >> >> > +  if (would_overrun_p
>> >> >> > +  && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> >> > +would_overrun_p = false;
>> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >> >&& (can_overrun_p || !would_overrun_p)
>> >> >> >&& compare_step_with_zero (stmt) > 0)
>> >> >> 
>> >> >> ...is this right for all cases?  I think it only looks for 
>> >> >> single-vector
>> >> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> >> at least for load-lanes.
>> >> >>
>> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> >> >> spans two vectors and we have no guarantee that the second one
>> >> >> is mapped.
>> >> >
>> >> > The check assumes that if aligned_access_p () returns true then the
>> >> > whole access is aligned in a way that it can't cross page boundaries.
>> >> > That's of course not the case if alignment is 16 bytes but the access
>> >> > will be a multiple of that.
>> >> >  
>> >> >> I haven't been able to come up with a testcase though.  We seem to be
>> >> >> overly conservative when computing alignments.
>> >> >
>> >> > Not sure if we can run into this with load-lanes given that bumps the
>> >> > vectorization factor.  Also does load-lane work with gaps?
>> >> >
>> >> > I think that gap can never be larger than nunits-1 so it is by 
>> >> > definition
>> >> > in the last "vector" independent of the VF.
>> >> >
>> >> > Classical gap case is
>> >> >
>> >> > for (i=0; i> >> >  {
>> >> >y[3*i + 0] = x[4*i + 0];
>> >> >y[3*i + 1] = x[4*i + 1];
>> >> >y[3*i + 2] = x[4*i + 2];
>> >> >  }
>> >> >
>> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> >> > the y's different streams and you should get the perfect case for
>> >> > load-lane:
>> >> >
>> >> > for (i=0; i> >> >  {
>> >> >y[i] = x[4*i + 0];
>> >> >z[i] = x[4*i + 1];
>> >> >w[i] = x[4*i + 2];
>> >> >  } 
>> >> >
>> >> > previously we'd peel at least 4 iterations into the epilogue for
>> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> >> > ok.
>> >> 
>> >> The case I was thinking of was like the second, but with the
>> >> element type being DI or DF and with the + 2 statement removed.
>> >> E.g.:
>> >> 
>> >> double __attribute__((noinline))
>> >> foo (double *a)
>> >> {
>> >>   double res = 0.0;
>> >>   for (int n = 0; n < 256; n += 4)
>> >> res += a[n] + a[n + 1];
>> >>   return res;
>> >> }
>> >> 
>> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> >> and a[n + 3].
>> >
>> > Yes, indeed.  It's safe when peeling for gaps would remove
>> > N < alignof (ref) / sizeof (ref) scalar iterations.
>> >
>> > Peeling for gaps simply subtracts one from the niter of the vectorized 
>> > loop.
>> 
>> I think subtracting one is enough in all cases.  It's only the final
>> iteration of the scalar loop that can't access a[n + 2] and a[n + 3].
>> 
>> (Of course, subtracting one happens before peeling for niters, so it
>> only makes a 

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >> >
> >> >> Sorry for the late reply, but:
> >> >> 
> >> >> Richard Biener  writes:
> >> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >> >
> >> >> >> 
> >> >> >> Currently we force peeling for gaps whenever element overrun can 
> >> >> >> occur
> >> >> >> but for aligned accesses we know that the loads won't trap and thus
> >> >> >> we can avoid this.
> >> >> >> 
> >> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> >> >> some testsuite fallout here so didn't bother to invent a new 
> >> >> >> testcase).
> >> >> >> 
> >> >> >> Just in case somebody thinks the overrun is a bad idea in general
> >> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >> >
> >> >> > This is what I applied.
> >> >> >
> >> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >> >
> >> >> > Richard.
> >> >> [...]
> >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> >> > index 15aec21..c29e73d 100644
> >> >> > --- a/gcc/tree-vect-stmts.c
> >> >> > +++ b/gcc/tree-vect-stmts.c
> >> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> >> >> > vectype, bool slp,
> >> >> >/* If there is a gap at the end of the group then these 
> >> >> > optimizations
> >> >> >would access excess elements in the last iteration.  */
> >> >> >bool would_overrun_p = (gap != 0);
> >> >> > +  /* If the access is aligned an overrun is fine.  */
> >> >> > +  if (would_overrun_p
> >> >> > +   && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> >> > + would_overrun_p = false;
> >> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >> > && (can_overrun_p || !would_overrun_p)
> >> >> > && compare_step_with_zero (stmt) > 0)
> >> >> 
> >> >> ...is this right for all cases?  I think it only looks for single-vector
> >> >> alignment, but the gap can in principle be vector-sized or larger,
> >> >> at least for load-lanes.
> >> >>
> >> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> >> >> spans two vectors and we have no guarantee that the second one
> >> >> is mapped.
> >> >
> >> > The check assumes that if aligned_access_p () returns true then the
> >> > whole access is aligned in a way that it can't cross page boundaries.
> >> > That's of course not the case if alignment is 16 bytes but the access
> >> > will be a multiple of that.
> >> >  
> >> >> I haven't been able to come up with a testcase though.  We seem to be
> >> >> overly conservative when computing alignments.
> >> >
> >> > Not sure if we can run into this with load-lanes given that bumps the
> >> > vectorization factor.  Also does load-lane work with gaps?
> >> >
> >> > I think that gap can never be larger than nunits-1 so it is by definition
> >> > in the last "vector" independent of the VF.
> >> >
> >> > Classical gap case is
> >> >
> >> > for (i=0; i >> >  {
> >> >y[3*i + 0] = x[4*i + 0];
> >> >y[3*i + 1] = x[4*i + 1];
> >> >y[3*i + 2] = x[4*i + 2];
> >> >  }
> >> >
> >> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> >> > the y's different streams and you should get the perfect case for
> >> > load-lane:
> >> >
> >> > for (i=0; i >> >  {
> >> >y[i] = x[4*i + 0];
> >> >z[i] = x[4*i + 1];
> >> >w[i] = x[4*i + 2];
> >> >  } 
> >> >
> >> > previously we'd peel at least 4 iterations into the epilogue for
> >> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> >> > ok.
> >> 
> >> The case I was thinking of was like the second, but with the
> >> element type being DI or DF and with the + 2 statement removed.
> >> E.g.:
> >> 
> >> double __attribute__((noinline))
> >> foo (double *a)
> >> {
> >>   double res = 0.0;
> >>   for (int n = 0; n < 256; n += 4)
> >> res += a[n] + a[n + 1];
> >>   return res;
> >> }
> >> 
> >> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
> >> to V2DF isn't enough to guarantee that we can access a[n + 2]
> >> and a[n + 3].
> >
> > Yes, indeed.  It's safe when peeling for gaps would remove
> > N < alignof (ref) / sizeof (ref) scalar iterations.
> >
> > Peeling for gaps simply subtracts one from the niter of the vectorized 
> > loop.
> 
> I think subtracting one is enough in all cases.  It's only the final
> iteration of the scalar loop that can't access a[n + 2] and a[n + 3].
> 
> (Of course, subtracting one happens before peeling for niters, so it
> only makes a difference if the original niters was a multiple of the VF,
> in which case we peel a full vector's worth of iterations instead of
> peeling none.)

I think one could extend the 

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
>> >
>> >> Sorry for the late reply, but:
>> >> 
>> >> Richard Biener  writes:
>> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >> >
>> >> >> 
>> >> >> Currently we force peeling for gaps whenever element overrun can occur
>> >> >> but for aligned accesses we know that the loads won't trap and thus
>> >> >> we can avoid this.
>> >> >> 
>> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> >> some testsuite fallout here so didn't bother to invent a new testcase).
>> >> >> 
>> >> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> >> (even when not trapping).  Like for ASAN or valgrind.
>> >> >
>> >> > This is what I applied.
>> >> >
>> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >
>> >> > Richard.
>> >> [...]
>> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> >> > index 15aec21..c29e73d 100644
>> >> > --- a/gcc/tree-vect-stmts.c
>> >> > +++ b/gcc/tree-vect-stmts.c
>> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
>> >> > vectype, bool slp,
>> >> >/* If there is a gap at the end of the group then these 
>> >> > optimizations
>> >> >  would access excess elements in the last iteration.  */
>> >> >bool would_overrun_p = (gap != 0);
>> >> > +  /* If the access is aligned an overrun is fine.  */
>> >> > +  if (would_overrun_p
>> >> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> >> > +   would_overrun_p = false;
>> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >> >   && (can_overrun_p || !would_overrun_p)
>> >> >   && compare_step_with_zero (stmt) > 0)
>> >> 
>> >> ...is this right for all cases?  I think it only looks for single-vector
>> >> alignment, but the gap can in principle be vector-sized or larger,
>> >> at least for load-lanes.
>> >>
>> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> >> spans two vectors and we have no guarantee that the second one
>> >> is mapped.
>> >
>> > The check assumes that if aligned_access_p () returns true then the
>> > whole access is aligned in a way that it can't cross page boundaries.
>> > That's of course not the case if alignment is 16 bytes but the access
>> > will be a multiple of that.
>> >  
>> >> I haven't been able to come up with a testcase though.  We seem to be
>> >> overly conservative when computing alignments.
>> >
>> > Not sure if we can run into this with load-lanes given that bumps the
>> > vectorization factor.  Also does load-lane work with gaps?
>> >
>> > I think that gap can never be larger than nunits-1 so it is by definition
>> > in the last "vector" independent of the VF.
>> >
>> > Classical gap case is
>> >
>> > for (i=0; i> >  {
>> >y[3*i + 0] = x[4*i + 0];
>> >y[3*i + 1] = x[4*i + 1];
>> >y[3*i + 2] = x[4*i + 2];
>> >  }
>> >
>> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
>> > the y's different streams and you should get the perfect case for
>> > load-lane:
>> >
>> > for (i=0; i> >  {
>> >y[i] = x[4*i + 0];
>> >z[i] = x[4*i + 1];
>> >w[i] = x[4*i + 2];
>> >  } 
>> >
>> > previously we'd peel at least 4 iterations into the epilogue for
>> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
>> > ok.
>> 
>> The case I was thinking of was like the second, but with the
>> element type being DI or DF and with the + 2 statement removed.
>> E.g.:
>> 
>> double __attribute__((noinline))
>> foo (double *a)
>> {
>>   double res = 0.0;
>>   for (int n = 0; n < 256; n += 4)
>> res += a[n] + a[n + 1];
>>   return res;
>> }
>> 
>> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
>> to V2DF isn't enough to guarantee that we can access a[n + 2]
>> and a[n + 3].
>
> Yes, indeed.  It's safe when peeling for gaps would remove
> N < alignof (ref) / sizeof (ref) scalar iterations.
>
> Peeling for gaps simply subtracts one from the niter of the vectorized 
> loop.

I think subtracting one is enough in all cases.  It's only the final
iteration of the scalar loop that can't access a[n + 2] and a[n + 3].

(Of course, subtracting one happens before peeling for niters, so it
only makes a difference if the original niters was a multiple of the VF,
in which case we peel a full vector's worth of iterations instead of
peeling none.)

Thanks,
Richard


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 1 Mar 2017, Richard Sandiford wrote:
> >
> >> Sorry for the late reply, but:
> >> 
> >> Richard Biener  writes:
> >> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >> >
> >> >> 
> >> >> Currently we force peeling for gaps whenever element overrun can occur
> >> >> but for aligned accesses we know that the loads won't trap and thus
> >> >> we can avoid this.
> >> >> 
> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> >> some testsuite fallout here so didn't bother to invent a new testcase).
> >> >> 
> >> >> Just in case somebody thinks the overrun is a bad idea in general
> >> >> (even when not trapping).  Like for ASAN or valgrind.
> >> >
> >> > This is what I applied.
> >> >
> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> >
> >> > Richard.
> >> [...]
> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> > index 15aec21..c29e73d 100644
> >> > --- a/gcc/tree-vect-stmts.c
> >> > +++ b/gcc/tree-vect-stmts.c
> >> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> >> > vectype, bool slp,
> >> >/* If there is a gap at the end of the group then these 
> >> > optimizations
> >> >   would access excess elements in the last iteration.  */
> >> >bool would_overrun_p = (gap != 0);
> >> > +  /* If the access is aligned an overrun is fine.  */
> >> > +  if (would_overrun_p
> >> > +  && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> >> > +would_overrun_p = false;
> >> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >> >&& (can_overrun_p || !would_overrun_p)
> >> >&& compare_step_with_zero (stmt) > 0)
> >> 
> >> ...is this right for all cases?  I think it only looks for single-vector
> >> alignment, but the gap can in principle be vector-sized or larger,
> >> at least for load-lanes.
> >>
> >> E.g. say we have a 128-bit vector of doubles in a group of size 4
> >> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> >> spans two vectors and we have no guarantee that the second one
> >> is mapped.
> >
> > The check assumes that if aligned_access_p () returns true then the
> > whole access is aligned in a way that it can't cross page boundaries.
> > That's of course not the case if alignment is 16 bytes but the access
> > will be a multiple of that.
> >  
> >> I haven't been able to come up with a testcase though.  We seem to be
> >> overly conservative when computing alignments.
> >
> > Not sure if we can run into this with load-lanes given that bumps the
> > vectorization factor.  Also does load-lane work with gaps?
> >
> > I think that gap can never be larger than nunits-1 so it is by definition
> > in the last "vector" independent of the VF.
> >
> > Classical gap case is
> >
> > for (i=0; i >  {
> >y[3*i + 0] = x[4*i + 0];
> >y[3*i + 1] = x[4*i + 1];
> >y[3*i + 2] = x[4*i + 2];
> >  }
> >
> > where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> > the y's different streams and you should get the perfect case for
> > load-lane:
> >
> > for (i=0; i >  {
> >y[i] = x[4*i + 0];
> >z[i] = x[4*i + 1];
> >w[i] = x[4*i + 2];
> >  } 
> >
> > previously we'd peel at least 4 iterations into the epilogue for
> > the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> > ok.
> 
> The case I was thinking of was like the second, but with the
> element type being DI or DF and with the + 2 statement removed.
> E.g.:
> 
> double __attribute__((noinline))
> foo (double *a)
> {
>   double res = 0.0;
>   for (int n = 0; n < 256; n += 4)
> res += a[n] + a[n + 1];
>   return res;
> }
> 
> (with -ffast-math).  We do use LD4 for this, and having "a" aligned
> to V2DF isn't enough to guarantee that we can access a[n + 2]
> and a[n + 3].

Yes, indeed.  It's safe when peeling for gaps would remove
N < alignof (ref) / sizeof (ref) scalar iterations.

Peeling for gaps simply subtracts one from the niter of the vectorized 
loop.

One should be able to construct a testcase w/o load-lanes by ensuring
a high enough VF.

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 1 Mar 2017, Richard Sandiford wrote:
>
>> Sorry for the late reply, but:
>> 
>> Richard Biener  writes:
>> > On Mon, 7 Nov 2016, Richard Biener wrote:
>> >
>> >> 
>> >> Currently we force peeling for gaps whenever element overrun can occur
>> >> but for aligned accesses we know that the loads won't trap and thus
>> >> we can avoid this.
>> >> 
>> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> >> some testsuite fallout here so didn't bother to invent a new testcase).
>> >> 
>> >> Just in case somebody thinks the overrun is a bad idea in general
>> >> (even when not trapping).  Like for ASAN or valgrind.
>> >
>> > This is what I applied.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > Richard.
>> [...]
>> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > index 15aec21..c29e73d 100644
>> > --- a/gcc/tree-vect-stmts.c
>> > +++ b/gcc/tree-vect-stmts.c
>> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
>> > vectype, bool slp,
>> >/* If there is a gap at the end of the group then these 
>> > optimizations
>> > would access excess elements in the last iteration.  */
>> >bool would_overrun_p = (gap != 0);
>> > +  /* If the access is aligned an overrun is fine.  */
>> > +  if (would_overrun_p
>> > +&& aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
>> > +  would_overrun_p = false;
>> >if (!STMT_VINFO_STRIDED_P (stmt_info)
>> >  && (can_overrun_p || !would_overrun_p)
>> >  && compare_step_with_zero (stmt) > 0)
>> 
>> ...is this right for all cases?  I think it only looks for single-vector
>> alignment, but the gap can in principle be vector-sized or larger,
>> at least for load-lanes.
>>
>> E.g. say we have a 128-bit vector of doubles in a group of size 4
>> and a gap of 2 or 3.  Even if the access itself is aligned, the group
>> spans two vectors and we have no guarantee that the second one
>> is mapped.
>
> The check assumes that if aligned_access_p () returns true then the
> whole access is aligned in a way that it can't cross page boundaries.
> That's of course not the case if alignment is 16 bytes but the access
> will be a multiple of that.
>  
>> I haven't been able to come up with a testcase though.  We seem to be
>> overly conservative when computing alignments.
>
> Not sure if we can run into this with load-lanes given that bumps the
> vectorization factor.  Also does load-lane work with gaps?
>
> I think that gap can never be larger than nunits-1 so it is by definition
> in the last "vector" independent of the VF.
>
> Classical gap case is
>
> for (i=0; i  {
>y[3*i + 0] = x[4*i + 0];
>y[3*i + 1] = x[4*i + 1];
>y[3*i + 2] = x[4*i + 2];
>  }
>
> where x has a gap of 1.  You'll get VF of 12 for the above.  Make
> the y's different streams and you should get the perfect case for
> load-lane:
>
> for (i=0; i  {
>y[i] = x[4*i + 0];
>z[i] = x[4*i + 1];
>w[i] = x[4*i + 2];
>  } 
>
> previously we'd peel at least 4 iterations into the epilogue for
> the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
> ok.

The case I was thinking of was like the second, but with the
element type being DI or DF and with the + 2 statement removed.
E.g.:

double __attribute__((noinline))
foo (double *a)
{
  double res = 0.0;
  for (int n = 0; n < 256; n += 4)
res += a[n] + a[n + 1];
  return res;
}

(with -ffast-math).  We do use LD4 for this, and having "a" aligned
to V2DF isn't enough to guarantee that we can access a[n + 2]
and a[n + 3].

Thanks,
Richard


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Biener
On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Sorry for the late reply, but:
> 
> Richard Biener  writes:
> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >
> >> 
> >> Currently we force peeling for gaps whenever element overrun can occur
> >> but for aligned accesses we know that the loads won't trap and thus
> >> we can avoid this.
> >> 
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> some testsuite fallout here so didn't bother to invent a new testcase).
> >> 
> >> Just in case somebody thinks the overrun is a bad idea in general
> >> (even when not trapping).  Like for ASAN or valgrind.
> >
> > This is what I applied.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Richard.
> [...]
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index 15aec21..c29e73d 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> > vectype, bool slp,
> >/* If there is a gap at the end of the group then these optimizations
> >  would access excess elements in the last iteration.  */
> >bool would_overrun_p = (gap != 0);
> > +  /* If the access is aligned an overrun is fine.  */
> > +  if (would_overrun_p
> > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> > +   would_overrun_p = false;
> >if (!STMT_VINFO_STRIDED_P (stmt_info)
> >   && (can_overrun_p || !would_overrun_p)
> >   && compare_step_with_zero (stmt) > 0)
> 
> ...is this right for all cases?  I think it only looks for single-vector
> alignment, but the gap can in principle be vector-sized or larger,
> at least for load-lanes.
>
> E.g. say we have a 128-bit vector of doubles in a group of size 4
> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> spans two vectors and we have no guarantee that the second one
> is mapped.

The check assumes that if aligned_access_p () returns true then the
whole access is aligned in a way that it can't cross page boundaries.
That's of course not the case if alignment is 16 bytes but the access
will be a multiple of that.
 
> I haven't been able to come up with a testcase though.  We seem to be
> overly conservative when computing alignments.

Not sure if we can run into this with load-lanes given that bumps the
vectorization factor.  Also does load-lane work with gaps?

I think that gap can never be larger than nunits-1 so it is by definition
in the last "vector" independent of the VF.

Classical gap case is

for (i=0; i

Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2017-03-01 Thread Richard Sandiford
Sorry for the late reply, but:

Richard Biener  writes:
> On Mon, 7 Nov 2016, Richard Biener wrote:
>
>> 
>> Currently we force peeling for gaps whenever element overrun can occur
>> but for aligned accesses we know that the loads won't trap and thus
>> we can avoid this.
>> 
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
>> some testsuite fallout here so didn't bother to invent a new testcase).
>> 
>> Just in case somebody thinks the overrun is a bad idea in general
>> (even when not trapping).  Like for ASAN or valgrind.
>
> This is what I applied.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Richard.
[...]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 15aec21..c29e73d 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
> bool slp,
>/* If there is a gap at the end of the group then these optimizations
>would access excess elements in the last iteration.  */
>bool would_overrun_p = (gap != 0);
> +  /* If the access is aligned an overrun is fine.  */
> +  if (would_overrun_p
> +   && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> + would_overrun_p = false;
>if (!STMT_VINFO_STRIDED_P (stmt_info)
> && (can_overrun_p || !would_overrun_p)
> && compare_step_with_zero (stmt) > 0)

...is this right for all cases?  I think it only looks for single-vector
alignment, but the gap can in principle be vector-sized or larger,
at least for load-lanes.

E.g. say we have a 128-bit vector of doubles in a group of size 4
and a gap of 2 or 3.  Even if the access itself is aligned, the group
spans two vectors and we have no guarantee that the second one
is mapped.

I haven't been able to come up with a testcase though.  We seem to be
overly conservative when computing alignments.

Thanks,
Richard


Re: [PATCH] Avoid peeling for gaps if accesses are aligned

2016-11-08 Thread Richard Biener
On Mon, 7 Nov 2016, Richard Biener wrote:

> 
> Currently we force peeling for gaps whenever element overrun can occur
> but for aligned accesses we know that the loads won't trap and thus
> we can avoid this.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> some testsuite fallout here so didn't bother to invent a new testcase).
> 
> Just in case somebody thinks the overrun is a bad idea in general
> (even when not trapping).  Like for ASAN or valgrind.

This is what I applied.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2016-11-08  Richard Biener  

* tree-vect-stmts.c (get_group_load_store_type): If the
access is aligned do not trigger peeling for gaps.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Do not
force alignment of vars with DECL_USER_ALIGN.

* gcc.dg/vect/vect-nb-iter-ub-2.c: Adjust.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
index bc07b4b..4e13702 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
@@ -3,7 +3,7 @@
 #include "tree-vect.h"
 
 int ii[32];
-char cc[66] =
+char cc[66] __attribute__((aligned(1))) =
   { 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0,
 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0,
 20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0,
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index b03cb1e..f014d68 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -831,6 +831,19 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
  return true;
}
 
+  if (DECL_USER_ALIGN (base))
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "not forcing alignment of user-aligned "
+  "variable: ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, base);
+ dump_printf (MSG_NOTE, "\n");
+   }
+ return true;
+   }
+
   /* Force the alignment of the decl.
 NOTE: This is the only change to the code we make during
 the analysis phase, before deciding to vectorize the loop.  */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 15aec21..c29e73d 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1770,6 +1770,11 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
bool slp,
   " non-consecutive accesses\n");
  return false;
}
+ /* If the access is aligned an overrun is fine.  */
+ if (overrun_p
+ && aligned_access_p
+  (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt
+   overrun_p = false;
  if (overrun_p && !can_overrun_p)
{
  if (dump_enabled_p ())
@@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree vectype, 
bool slp,
   /* If there is a gap at the end of the group then these optimizations
 would access excess elements in the last iteration.  */
   bool would_overrun_p = (gap != 0);
+  /* If the access is aligned an overrun is fine.  */
+  if (would_overrun_p
+ && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
+   would_overrun_p = false;
   if (!STMT_VINFO_STRIDED_P (stmt_info)
  && (can_overrun_p || !would_overrun_p)
  && compare_step_with_zero (stmt) > 0)


[PATCH] Avoid peeling for gaps if accesses are aligned

2016-11-07 Thread Richard Biener

Currently we force peeling for gaps whenever element overrun can occur
but for aligned accesses we know that the loads won't trap and thus
we can avoid this.

Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
some testsuite fallout here so didn't bother to invent a new testcase).

Just in case somebody thinks the overrun is a bad idea in general
(even when not trapping).  Like for ASAN or valgrind.

Richard.

2016-11-07  Richard Biener  

* tree-vect-stmts.c (get_group_load_store_type): If the
access is aligned do not trigger peeling for gaps.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 241893)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -1770,6 +1771,11 @@ get_group_load_store_type (gimple *stmt,
   " non-consecutive accesses\n");
  return false;
}
+ /* If the access is aligned an overrun is fine.  */
+ if (overrun_p
+ && aligned_access_p
+  (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt
+   overrun_p = false;
  if (overrun_p && !can_overrun_p)
{
  if (dump_enabled_p ())
@@ -1789,6 +1795,10 @@ get_group_load_store_type (gimple *stmt,
   /* If there is a gap at the end of the group then these optimizations
 would access excess elements in the last iteration.  */
   bool would_overrun_p = (gap != 0);
+  /* If the access is aligned an overrun is fine.  */
+  if (would_overrun_p
+ && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
+   would_overrun_p = false;
   if (!STMT_VINFO_STRIDED_P (stmt_info)
  && (can_overrun_p || !would_overrun_p)
  && compare_step_with_zero (stmt) > 0)