On 6 May 2016 at 17:20, Richard Biener <rguent...@suse.de> wrote:
> On Wed, 4 May 2016, Prathamesh Kulkarni wrote:
>
>> On 23 February 2016 at 21:49, Prathamesh Kulkarni
>> <prathamesh.kulka...@linaro.org> wrote:
>> > On 23 February 2016 at 17:31, Richard Biener <rguent...@suse.de> wrote:
>> >> On Tue, 23 Feb 2016, Prathamesh Kulkarni wrote:
>> >>
>> >>> On 22 February 2016 at 17:36, Richard Biener <rguent...@suse.de> wrote:
>> >>> > On Mon, 22 Feb 2016, Prathamesh Kulkarni wrote:
>> >>> >
>> >>> >> Hi Richard,
>> >>> >> As discussed in private mail, this version of patch attempts to
>> >>> >> increase alignment
>> >>> >> of global struct decl if it contains an an array field(s) and array's
>> >>> >> offset is a multiple of the alignment of vector type corresponding to
>> >>> >> it's scalar type and recursively checks for nested structs.
>> >>> >> eg:
>> >>> >> static struct
>> >>> >> {
>> >>> >>   int a, b, c, d;
>> >>> >>   int k[4];
>> >>> >>   float f[10];
>> >>> >> };
>> >>> >> k is a candidate array since it's offset is 16 and alignment of
>> >>> >> "vector (4) int" is 8.
>> >>> >> Similarly for f.
>> >>> >>
>> >>> >> I haven't been able to create a test-case where there are
>> >>> >> multiple candidate arrays and vector alignment of arrays are 
>> >>> >> different.
>> >>> >> I suppose in this case we will have to increase alignment
>> >>> >> of the struct by the max alignment ?
>> >>> >> eg:
>> >>> >> static struct
>> >>> >> {
>> >>> >>   <fields>
>> >>> >>   T1 k[S1]
>> >>> >>   <fields>
>> >>> >>   T2 f[S2]
>> >>> >>   <fields>
>> >>> >> };
>> >>> >>
>> >>> >> if V1 is vector type corresponding to T1, and V2 corresponding vector
>> >>> >> type to T2,
>> >>> >> offset (k) % align(V1) == 0 and offset (f) % align(V2) == 0
>> >>> >> and align (V1) > align(V2) then we will increase alignment of struct
>> >>> >> by align(V1).
>> >>> >>
>> >>> >> Testing showed FAIL for g++.dg/torture/pr31863.C due to program 
>> >>> >> timeout.
>> >>> >> Initially it appeared to me, it went in infinite loop. However
>> >>> >> on second thoughts I think it's probably not an infinite loop, rather
>> >>> >> taking (extraordinarily) large amount of time
>> >>> >> to compile the test-case with the patch.
>> >>> >> The test-case  builds quickly for only 2 instantiations of ClassSpec
>> >>> >> (ClassSpec<Key, A001, 1>,
>> >>> >>  ClassSpec<Key, A002, 2>)
>> >>> >> Building with 22 instantiations (upto ClassSpec<Key, A0023, 22>) 
>> >>> >> takes up
>> >>> >> to ~1m to compile.
>> >>> >> with:
>> >>> >> 23  instantiations: ~2m
>> >>> >> 24 instantiations: ~5m
>> >>> >> For 30 instantiations I terminated cc1plus after 13m (by SIGKILL).
>> >>> >>
>> >>> >> I guess it shouldn't go in an infinite loop because:
>> >>> >> a) structs cannot have circular references.
>> >>> >> b) works for lower number of instantiations
>> >>> >> However I have no sound evidence that it cannot be in infinite loop.
>> >>> >> I don't understand why a decl node is getting visited more than once
>> >>> >> for that test-case.
>> >>> >>
>> >>> >> Using a hash_map to store alignments of decl's so that decl node gets 
>> >>> >> visited
>> >>> >> only once prevents the issue.
>> >>> >
>> >>> > Maybe aliases.  Try not walking vnode->alias == true vars.
>> >>> Hi,
>> >>> I have a hypothesis why decl node gets visited multiple times.
>> >>>
>> >>> Consider the test-case:
>> >>> template <typename T, unsigned N>
>> >>> struct X
>> >>> {
>> >>>   T a;
>> >>>   virtual int foo() { return N; }
>> >>> };
>> >>>
>> >>> typedef struct X<int, 1> x_1;
>> >>> typedef struct X<int ,2> x_2;
>> >>>
>> >>> static x_1 obj1 __attribute__((used));
>> >>> static x_2 obj2 __attribute__((used));
>> >>>
>> >>> Two additional structs are created by C++FE, c++filt shows:
>> >>> _ZTI1XIiLj1EE -> typeinfo for X<int, 1u>
>> >>> _ZTI1XIiLj2EE -> typeinfo for X<int, 2u>
>> >>>
>> >>> Both of these structs have only one field D.2991 and it appears it's
>> >>> *shared* between them:
>> >>>  struct  D.2991;
>> >>>     const void * D.2980;
>> >>>     const char * D.2981;
>> >>>
>> >>> Hence the decl node D.2991 and it's fields (D.2890, D.2981) get visited 
>> >>> twice:
>> >>> once when walking _ZTI1XIiLj1EE and 2nd time when walking _ZTI1XIiLj2EE
>> >>>
>> >>> Dump of walking over the global structs for above test-case:
>> >>> http://pastebin.com/R5SABW0c
>> >>>
>> >>> So it appears to me to me a DAG (interior node == struct decl, leaf ==
>> >>> non-struct field,
>> >>> edge from node1 -> node2 if node2 is field of node1) is getting
>> >>> created when struct decl
>> >>> is a type-info object.
>> >>>
>> >>> I am not really clear on how we should proceed:
>> >>> a) Keep using hash_map to store alignments so that every decl gets
>> >>> visited only once.
>> >>> b) Skip walking artificial record decls.
>> >>> I am not sure if skipping all artificial struct decls would be a good
>> >>> idea, but I don't
>> >>> think it's possible to identify if a struct-decl is typeinfo struct at
>> >>> middle-end ?
>> >>
>> >> You shouldn't end up walking those when walking the type of
>> >> global decls.  That is, don't walk typeinfo decls - yes, practically
>> >> that means just not walking DECL_ARTIFICIAL things.
>> > Hi,
>> > I have done the changes in the patch (attached) and cross-tested
>> > on arm*-*-* and aarch64*-*-* without failures.
>> > Is it OK for stage-1 ?
>> Hi,
>> Is the attached patch OK for trunk ?
>> Bootstrapped and tested on aarch64-linux-gnu, ppc64le-linux-gnu.
>> Cross-tested on arm*-*-* and aarch64*-*-*.
>
> You can't simply use
>
> +      offset = int_byte_position (field);
>
> as it can be placed at variable offset which will make int_byte_position
> ICE.  Note it also returns a truncated byte position (bit position
> stripped) which may be undesirable here.  I think you want to use
> bit_position and if and only if DECL_FIELD_OFFSET and
> DECL_FIELD_BIT_OFFSET are INTEGER_CST.
oops, I didn't realize offsets could be variable.
Will that be the case only for VLA member inside struct ?
>
> Your observation about the expensiveness of the walk still stands I guess
> and eventually you should at least cache the
> get_vec_alignment_for_record_decl cases.  Please make those workers
> _type rather than _decl helpers.
Done
>
> You seem to simply get at the maximum vectorized field/array element
> alignment possible for all arrays - you could restrict that to
> arrays with at least vector size (as followup).
Um sorry, I didn't understand this part.
>
> +  /* Skip artificial decls like typeinfo decls or if
> +     record is packed.  */
> +  if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type))
> +    return 0;
>
> I think we should honor DECL_USER_ALIGN as well and not mess with those
> decls.
Done
>
> Given the patch now does quite some extra work it might make sense
> to split the symtab part out of the vect_can_force_dr_alignment_p
> predicate and call that early.
In the patch I call symtab_node::can_increase_alignment_p early.
I tried moving that to it's callers - vect_compute_data_ref_alignment and
increase_alignment::execute, however that failed some tests in vect,
and hence I didn't add the following hunk in the patch. Did I miss some check ?

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7652e21..2c1acee 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -795,7 +795,10 @@ vect_compute_data_ref_alignment (struct data_reference *dr)
   && TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR)
  base = TREE_OPERAND (TREE_OPERAND (base, 0), 0);

-      if (!vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype)))
+      if (!(TREE_CODE (base) == VAR_DECL
+            && decl_in_symtab_p (base)
+            && symtab_node::get (base)->can_increase_alignment_p ()
+            && vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype))))
  {
   if (dump_enabled_p ())
     {

Thanks,
Prathamesh
>
> Richard.
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c 
b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c
new file mode 100644
index 0000000..7010a52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target section_anchors } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 32
+
+/* Increase alignment of struct if an array's offset is multiple of alignment 
of
+   vector type corresponding to it's scalar type.
+   For the below test-case:
+   offsetof(e) == 8 bytes. 
+   i) For arm: let x = alignment of vector type corresponding to int,
+   x == 8 bytes.
+   Since offsetof(e) % x == 0, set DECL_ALIGN(a, b, c) to x.
+   ii) For aarch64, ppc: x == 16 bytes.
+   Since offsetof(e) % x != 0, don't increase alignment of a, b, c.
+*/
+
+static struct A {
+  int p1, p2;
+  int e[N];
+} a, b, c;
+
+int foo(void)
+{
+  for (int i = 0; i < N; i++)
+    a.e[i] = b.e[i] + c.e[i];
+
+   return a.e[0];
+}
+
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
"increase_alignment" { target aarch64*-*-* } } } */
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 
"increase_alignment" { target powerpc64*-*-* } } } */
+/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 3 
"increase_alignment" { target arm*-*-* } } } */
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2b25b45..c94bbcb 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -794,38 +794,129 @@ make_pass_slp_vectorize (gcc::context *ctxt)
      This should involve global alignment analysis and in the future also
      array padding.  */
 
+static unsigned get_vec_alignment_for_type (tree);
+static hash_map<tree, unsigned> *type_align_map;
+
+/* Return alignment of array's vector type corresponding to scalar type.
+   0 if no vector type exists.  */
+static unsigned
+get_vec_alignment_for_array_type (tree type) 
+{
+  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+
+  tree vectype = get_vectype_for_scalar_type (strip_array_types (type));
+  return (vectype) ? TYPE_ALIGN (vectype) : 0;
+}
+
+/* Return alignment of field having maximum alignment of vector type
+   corresponding to it's scalar type. For now, we only consider fields whose
+   offset is a multiple of it's vector alignment.
+   0 if no suitable field is found.  */
+static unsigned
+get_vec_alignment_for_record_type (tree type) 
+{
+  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+  unsigned max_align = 0, alignment;
+  HOST_WIDE_INT offset;
+  tree offset_tree;
+
+  if (TYPE_PACKED (type))
+    return 0;
+
+  for (tree field = first_field (type);
+       field != NULL_TREE;
+       field = DECL_CHAIN (field))
+    {
+      /* Skip if not FIELD_DECL or has variable offset.  */
+      if (TREE_CODE (field) != FIELD_DECL
+         || TREE_CODE (DECL_FIELD_OFFSET (field)) != INTEGER_CST
+         || TREE_CODE (DECL_FIELD_BIT_OFFSET (field)) != INTEGER_CST
+         || DECL_USER_ALIGN (field)
+         || DECL_ARTIFICIAL (field))
+       continue;
+
+      offset_tree = bit_position (field);
+      /* FIXME: is this check necessary since we check for variable offset 
above ?  */
+      if (TREE_CODE (offset_tree) != INTEGER_CST)
+       continue;
+
+      offset = tree_to_uhwi (offset_tree); 
+      alignment = get_vec_alignment_for_type (TREE_TYPE (field));
+
+      if (alignment
+         && (offset % alignment == 0)
+         && (alignment > max_align))
+       max_align = alignment;
+    }
+
+  return max_align;
+}
+
+/* Return alignment of vector type corresponding to decl's scalar type
+   or 0 if it doesn't exist or the vector alignment is lesser than
+   decl's alignment.  */
+static unsigned
+get_vec_alignment_for_type (tree type)
+{
+  if (type == NULL_TREE)
+    return 0;
+
+  gcc_assert (TYPE_P (type));
+
+  unsigned *slot = type_align_map->get (type);
+  if (slot)
+    return *slot;
+
+  static unsigned alignment = 0;
+  switch (TREE_CODE (type))
+    {
+      case ARRAY_TYPE:
+       alignment = get_vec_alignment_for_array_type (type);
+       break;
+      case RECORD_TYPE:
+       alignment = get_vec_alignment_for_record_type (type);
+       break;
+      default:
+       alignment = 0;
+       break;
+    }
+
+  unsigned ret = (alignment > TYPE_ALIGN (type)) ? alignment : 0;
+  type_align_map->put (type, ret);
+  return ret;
+}
+
+/* Entry point to increase_alignment pass.  */
 static unsigned int
 increase_alignment (void)
 {
   varpool_node *vnode;
 
   vect_location = UNKNOWN_LOCATION;
+  type_align_map = new hash_map<tree, unsigned>;
 
   /* Increase the alignment of all global arrays for vectorization.  */
   FOR_EACH_DEFINED_VARIABLE (vnode)
     {
-      tree vectype, decl = vnode->decl;
-      tree t;
+      tree decl = vnode->decl;
       unsigned int alignment;
 
-      t = TREE_TYPE (decl);
-      if (TREE_CODE (t) != ARRAY_TYPE)
-        continue;
-      vectype = get_vectype_for_scalar_type (strip_array_types (t));
-      if (!vectype)
-        continue;
-      alignment = TYPE_ALIGN (vectype);
-      if (DECL_ALIGN (decl) >= alignment)
-        continue;
-
-      if (vect_can_force_dr_alignment_p (decl, alignment))
+      if ((decl_in_symtab_p (decl)
+         && !symtab_node::get (decl)->can_increase_alignment_p ())
+         || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl))
+       continue;
+
+      alignment = get_vec_alignment_for_type (TREE_TYPE (decl));
+      if (alignment && vect_can_force_dr_alignment_p (decl, alignment))
         {
-         vnode->increase_alignment (TYPE_ALIGN (vectype));
+         vnode->increase_alignment (alignment);
           dump_printf (MSG_NOTE, "Increasing alignment of decl: ");
           dump_generic_expr (MSG_NOTE, TDF_SLIM, decl);
           dump_printf (MSG_NOTE, "\n");
         }
     }
+
+  delete type_align_map;
   return 0;
 }
 

Attachment: ChangeLog
Description: Binary data

Reply via email to