Re: Vectorize stores with unknown stride

2015-05-07 Thread Alan Lawrence
NP, and sorry for the spurious comments, hadn't spotted you were using nunits. I 
like the testcase, thanks :).


A.

Michael Matz wrote:

On Thu, 7 May 2015, Alan Lawrence wrote:


Also update comment? (5 identical cases)

Also update comment?


Obviously a good idea, thanks :)  (s/loads/accesses/ for the commit)


@@ -5013,7 +5025,7 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator
*gsi, gimple *vec_stmt,
   tree dataref_ptr = NULL_TREE;
   tree dataref_offset = NULL_TREE;
   gimple ptr_incr = NULL;
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);

Unrelated? (though I don't object to it!)


No, the patch introduces for a first time in this function a loop over 
unsigned i with nunits as bound, so the compare would warn if I weren't to 
change either the type of i (with a cascade of more such changes) or 
nuinits (with only this one place).  The whole file could use an overhaul 
of signedness (after all nunits in all functions will always be 
non-negative), but as a separate patch.



+  if (STMT_VINFO_STRIDED_P (stmt_info))
+;
+  else

This is not a long chain of ifs, so is there a reason not to have
if (!STMT_VINFO_STRIDED_P (stmt_info))
?


Err, right.  Probably I had some code in there initially; as now it looks 
a bit mannered :)



-#define STMT_VINFO_STRIDE_LOAD_P(S)   (S)->stride_load_p
+#define STMT_VINFO_STRIDED_P(S)   (S)->strided_p

Spacing (?)


No, it's using correct tabs, indentation by the diff '+' prefix deceives 
us.  (Your mailer must be playing games, I've sent it out with TABs 
intact).



+  float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
+  float dest[8];
+  check_vect ();
+  sumit (dest, src, src, 1, 8);
+  for (i = 0; i < 8; i++)
+if (2*src[i] != dest[i])
+  abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */

While I appreciate the vectorizer doesn't know the invariant stride is going
to be '1' when the loop executes...IMVHO I'd feel more reassured if we passed
in stride>1 at runtime ;). In fact I'd feel more reassured still, if we did
the thing with a few different values of stride...


Sensible.  I'll use the test case below (hey!  even stride zero! :) )

Regstrapping again with the changes as per above.  Committing tomorrow.  
Many thanks for the review.



Ciao,
Michael.
/* { dg-require-effective-target vect_float } */

#include 
#include "tree-vect.h"

void __attribute__((noinline))
sumit (float * __restrict dest,
   float * __restrict src, float * __restrict src2,
   int stride, int n)
{
  int i;
  for (i = 0; i < n; i++)
dest[i*stride] = src[i] + src2[i];
}

int main()
{
  int i, stride;
  float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
  float dest[64];
  check_vect ();
  for (stride = 0; stride < 8; stride++)
{
  sumit (dest, src, src, stride, 8);
  if (!stride && dest[0] != 16)
abort();
  else if (stride)
for (i = 0; i < 8; i++)
  if (2*src[i] != dest[i*stride])
abort ();
}
  return 0;
}

/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */





Re: Vectorize stores with unknown stride

2015-05-07 Thread Michael Matz
On Thu, 7 May 2015, Alan Lawrence wrote:

> Also update comment? (5 identical cases)
> 
> Also update comment?

Obviously a good idea, thanks :)  (s/loads/accesses/ for the commit)

> > @@ -5013,7 +5025,7 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator
> > *gsi, gimple *vec_stmt,
> >tree dataref_ptr = NULL_TREE;
> >tree dataref_offset = NULL_TREE;
> >gimple ptr_incr = NULL;
> > -  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
> > +  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
> 
> Unrelated? (though I don't object to it!)

No, the patch introduces for a first time in this function a loop over 
unsigned i with nunits as bound, so the compare would warn if I weren't to 
change either the type of i (with a cascade of more such changes) or 
nuinits (with only this one place).  The whole file could use an overhaul 
of signedness (after all nunits in all functions will always be 
non-negative), but as a separate patch.

> > +  if (STMT_VINFO_STRIDED_P (stmt_info))
> > +;
> > +  else
> 
> This is not a long chain of ifs, so is there a reason not to have
> if (!STMT_VINFO_STRIDED_P (stmt_info))
> ?

Err, right.  Probably I had some code in there initially; as now it looks 
a bit mannered :)

> > -#define STMT_VINFO_STRIDE_LOAD_P(S)   (S)->stride_load_p
> > +#define STMT_VINFO_STRIDED_P(S)   (S)->strided_p
> 
> Spacing (?)

No, it's using correct tabs, indentation by the diff '+' prefix deceives 
us.  (Your mailer must be playing games, I've sent it out with TABs 
intact).

> > +  float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
> > +  float dest[8];
> > +  check_vect ();
> > +  sumit (dest, src, src, 1, 8);
> > +  for (i = 0; i < 8; i++)
> > +if (2*src[i] != dest[i])
> > +  abort ();
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > +/* { dg-final { cleanup-tree-dump "vect" } } */
> 
> While I appreciate the vectorizer doesn't know the invariant stride is going
> to be '1' when the loop executes...IMVHO I'd feel more reassured if we passed
> in stride>1 at runtime ;). In fact I'd feel more reassured still, if we did
> the thing with a few different values of stride...

Sensible.  I'll use the test case below (hey!  even stride zero! :) )

Regstrapping again with the changes as per above.  Committing tomorrow.  
Many thanks for the review.


Ciao,
Michael.
/* { dg-require-effective-target vect_float } */

#include 
#include "tree-vect.h"

void __attribute__((noinline))
sumit (float * __restrict dest,
   float * __restrict src, float * __restrict src2,
   int stride, int n)
{
  int i;
  for (i = 0; i < n; i++)
dest[i*stride] = src[i] + src2[i];
}

int main()
{
  int i, stride;
  float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
  float dest[64];
  check_vect ();
  for (stride = 0; stride < 8; stride++)
{
  sumit (dest, src, src, stride, 8);
  if (!stride && dest[0] != 16)
abort();
  else if (stride)
for (i = 0; i < 8; i++)
  if (2*src[i] != dest[i*stride])
abort ();
}
  return 0;
}

/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */


Re: Vectorize stores with unknown stride

2015-05-07 Thread Alan Lawrence

(Below are all minor/style points only, no reason for patch not to go in.)

Michael Matz wrote:

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 96afc7a..6d8f17e 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -665,7 +665,7 @@ vect_compute_data_ref_alignment (struct data_reference *dr)

   /* Strided loads perform only component accesses, misalignment information
  is irrelevant for them.  */
-  if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
+  if (STMT_VINFO_STRIDED_P (stmt_info))
 return true;

>misalign = DR_INIT (dr);

Also update comment? (5 identical cases)


@@ -2368,7 +2368,7 @@ vect_analyze_data_ref_access (struct data_reference *dr)

   /* Assume this is a DR handled by non-constant strided load case.  */
   if (TREE_CODE (step) != INTEGER_CST)
-return STMT_VINFO_STRIDE_LOAD_P (stmt_info);
+return STMT_VINFO_STRIDED_P (stmt_info);


Also update comment?


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2ce6d4d..d268eb0 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5013,7 +5025,7 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator 
*gsi, gimple *vec_stmt,
   tree dataref_ptr = NULL_TREE;
   tree dataref_offset = NULL_TREE;
   gimple ptr_incr = NULL;
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);


Unrelated? (though I don't object to it!)


@@ -5100,38 +5112,42 @@ vectorizable_store (gimple stmt, gimple_stmt_iterator 
*gsi, gimple *vec_stmt,
   if (!STMT_VINFO_DATA_REF (stmt_info))
 return false;

-  negative =
-tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
- ? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
- size_zero_node) < 0;
-  if (negative && ncopies > 1)
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"multiple types with negative step.\n");
-  return false;
-}
-
-  if (negative)
+  if (STMT_VINFO_STRIDED_P (stmt_info))
+;
+  else


This is not a long chain of ifs, so is there a reason not to have
if (!STMT_VINFO_STRIDED_P (stmt_info))
?


diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 0796cc1..d231626 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -661,7 +663,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_VECTORIZABLE(S) (S)->vectorizable
 #define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
 #define STMT_VINFO_GATHER_P(S)(S)->gather_p
-#define STMT_VINFO_STRIDE_LOAD_P(S)   (S)->stride_load_p
+#define STMT_VINFO_STRIDED_P(S)   (S)->strided_p


Spacing (?)


 #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p

 #define STMT_VINFO_DR_BASE_ADDRESS(S)  (S)->dr_base_address
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c 
b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
new file mode 100644
index 000..32bcff9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
@@ -0,0 +1,30 @@
+/* { dg-require-effective-target vect_float } */
+
+#include 
+#include "tree-vect.h"
+
+void __attribute__((noinline))
+sumit (float * __restrict dest,
+   float * __restrict src, float * __restrict src2,
+   int stride, int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+dest[i*stride] = src[i] + src2[i];
+}
+
+int main()
+{
+  int i;
+  float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
+  float dest[8];
+  check_vect ();
+  sumit (dest, src, src, 1, 8);
+  for (i = 0; i < 8; i++)
+if (2*src[i] != dest[i])
+  abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */


While I appreciate the vectorizer doesn't know the invariant stride is going to 
be '1' when the loop executes...IMVHO I'd feel more reassured if we passed in 
stride>1 at runtime ;). In fact I'd feel more reassured still, if we did the 
thing with a few different values of stride...


Cheers, Alan



Re: Vectorize stores with unknown stride

2015-05-07 Thread Richard Biener
On Wed, May 6, 2015 at 5:37 PM, Michael Matz  wrote:
> Hi,
>
> I'm sitting on this since quite some time already and always missed stage
> 1.  This implements support for vectorizing strided stores with unknown
> but loop invariant stride, like:
>
> sumit (float * __restrict dest,
>float * __restrict src, float * __restrict src2,
>int stride, int n)
> {
>   int i;
>   for (i = 0; i < n; i++)
> dest[i*stride] = src[i] + src2[i];
> }
>
> I use the same scheme like for strided loads, i.e. expanding such store
> with N separate scalar stores (so alignment could also be ignored for such
> ones).
>
> This doesn't yet fix PR65962, because that one uses a _constant_ step.
> That makes us try a grouped access, which isn't supported for stores when
> there's a gap (which there is).  Unfortunately vect_analyze_group_access
> actively changes some vect info even before detecting that it's not a
> grouped access, so there must be some rollback implemented; I decided to
> defer this to a follow up.
>
> Regstrapped on x86-64-linux, no regressions (I had to adjust two
> fortran testcases where one more loop is vectorized now).  Okay for trunk?

Ok.

Thanks,
Richard.

>
> Ciao,
> Michael.
> * tree-vectorizer.h (struct _stmt_vec_info): Rename stride_load_p
> to strided_p.
> (STMT_VINFO_STRIDE_LOAD_P): Rename to ...
> (STMT_VINFO_STRIDED_P): ... this.
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Adjust.
> (vect_verify_datarefs_alignment): Likewise.
> (vect_enhance_data_refs_alignment): Likewise.
> (vect_analyze_data_ref_access): Likewise.
> (vect_analyze_data_refs): Accept strided stores.
> * tree-vect-stmts.c (vect_model_store_cost): Count strided stores.
> (vect_model_load_cost): Adjust for macro rename.
> (vectorizable_mask_load_store): Likewise.
> (vectorizable_load): Likewise.
> (vectorizable_store): Open code strided stores.
>
> testsuite/
> * gcc.dg/vect/vect-strided-store.c: New test.
> * gfortran.dg/vect/fast-math-pr37021.f90: Adjust.
> * gfortran.dg/vect/fast-math-rnflow-trs2a2.f90: Adjust.
>
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 96afc7a..6d8f17e 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -665,7 +665,7 @@ vect_compute_data_ref_alignment (struct data_reference 
> *dr)
>
>/* Strided loads perform only component accesses, misalignment information
>   is irrelevant for them.  */
> -  if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
> +  if (STMT_VINFO_STRIDED_P (stmt_info))
>  return true;
>
>misalign = DR_INIT (dr);
> @@ -942,7 +942,7 @@ vect_verify_datarefs_alignment (loop_vec_info loop_vinfo, 
> bb_vec_info bb_vinfo)
>
>/* Strided loads perform only component accesses, alignment is
>  irrelevant for them.  */
> -  if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
> +  if (STMT_VINFO_STRIDED_P (stmt_info))
> continue;
>
>supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
> @@ -1409,7 +1409,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>
>/* Strided loads perform only component accesses, alignment is
>  irrelevant for them.  */
> -  if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
> +  if (STMT_VINFO_STRIDED_P (stmt_info))
> continue;
>
>supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
> @@ -1701,7 +1701,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>
>   /* Strided loads perform only component accesses, alignment is
>  irrelevant for them.  */
> - if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
> + if (STMT_VINFO_STRIDED_P (stmt_info))
> continue;
>
>   save_misalignment = DR_MISALIGNMENT (dr);
> @@ -1821,7 +1821,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> loop_vinfo)
>
>   /* Strided loads perform only component accesses, alignment is
>  irrelevant for them.  */
> - if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
> + if (STMT_VINFO_STRIDED_P (stmt_info))
> continue;
>
>   supportable_dr_alignment = vect_supportable_dr_alignment (dr, 
> false);
> @@ -2368,7 +2368,7 @@ vect_analyze_data_ref_access (struct data_reference *dr)
>
>/* Assume this is a DR handled by non-constant strided load case.  */
>if (TREE_CODE (step) != INTEGER_CST)
> -return STMT_VINFO_STRIDE_LOAD_P (stmt_info);
> +return STMT_VINFO_STRIDED_P (stmt_info);
>
>/* Not consecutive access - check if it's a part of interleaving group.  */
>return vect_analyze_group_access (dr);
> @@ -3764,8 +3764,7 @@ again:
>else if (loop_vinfo
>&& TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
> {
> - if (nested_in_vect_loop_p (loop, stmt)
> - || !DR_IS_READ (dr))
> + if (nested_in_vect