On 6 May 2016 at 17:20, Richard Biener <rguent...@suse.de> wrote: > On Wed, 4 May 2016, Prathamesh Kulkarni wrote: > >> On 23 February 2016 at 21:49, Prathamesh Kulkarni >> <prathamesh.kulka...@linaro.org> wrote: >> > On 23 February 2016 at 17:31, Richard Biener <rguent...@suse.de> wrote: >> >> On Tue, 23 Feb 2016, Prathamesh Kulkarni wrote: >> >> >> >>> On 22 February 2016 at 17:36, Richard Biener <rguent...@suse.de> wrote: >> >>> > On Mon, 22 Feb 2016, Prathamesh Kulkarni wrote: >> >>> > >> >>> >> Hi Richard, >> >>> >> As discussed in private mail, this version of patch attempts to >> >>> >> increase alignment >> >>> >> of global struct decl if it contains an an array field(s) and array's >> >>> >> offset is a multiple of the alignment of vector type corresponding to >> >>> >> it's scalar type and recursively checks for nested structs. >> >>> >> eg: >> >>> >> static struct >> >>> >> { >> >>> >> int a, b, c, d; >> >>> >> int k[4]; >> >>> >> float f[10]; >> >>> >> }; >> >>> >> k is a candidate array since it's offset is 16 and alignment of >> >>> >> "vector (4) int" is 8. >> >>> >> Similarly for f. >> >>> >> >> >>> >> I haven't been able to create a test-case where there are >> >>> >> multiple candidate arrays and vector alignment of arrays are >> >>> >> different. >> >>> >> I suppose in this case we will have to increase alignment >> >>> >> of the struct by the max alignment ? >> >>> >> eg: >> >>> >> static struct >> >>> >> { >> >>> >> <fields> >> >>> >> T1 k[S1] >> >>> >> <fields> >> >>> >> T2 f[S2] >> >>> >> <fields> >> >>> >> }; >> >>> >> >> >>> >> if V1 is vector type corresponding to T1, and V2 corresponding vector >> >>> >> type to T2, >> >>> >> offset (k) % align(V1) == 0 and offset (f) % align(V2) == 0 >> >>> >> and align (V1) > align(V2) then we will increase alignment of struct >> >>> >> by align(V1). >> >>> >> >> >>> >> Testing showed FAIL for g++.dg/torture/pr31863.C due to program >> >>> >> timeout. >> >>> >> Initially it appeared to me, it went in infinite loop. However >> >>> >> on second thoughts I think it's probably not an infinite loop, rather >> >>> >> taking (extraordinarily) large amount of time >> >>> >> to compile the test-case with the patch. >> >>> >> The test-case builds quickly for only 2 instantiations of ClassSpec >> >>> >> (ClassSpec<Key, A001, 1>, >> >>> >> ClassSpec<Key, A002, 2>) >> >>> >> Building with 22 instantiations (upto ClassSpec<Key, A0023, 22>) >> >>> >> takes up >> >>> >> to ~1m to compile. >> >>> >> with: >> >>> >> 23 instantiations: ~2m >> >>> >> 24 instantiations: ~5m >> >>> >> For 30 instantiations I terminated cc1plus after 13m (by SIGKILL). >> >>> >> >> >>> >> I guess it shouldn't go in an infinite loop because: >> >>> >> a) structs cannot have circular references. >> >>> >> b) works for lower number of instantiations >> >>> >> However I have no sound evidence that it cannot be in infinite loop. >> >>> >> I don't understand why a decl node is getting visited more than once >> >>> >> for that test-case. >> >>> >> >> >>> >> Using a hash_map to store alignments of decl's so that decl node gets >> >>> >> visited >> >>> >> only once prevents the issue. >> >>> > >> >>> > Maybe aliases. Try not walking vnode->alias == true vars. >> >>> Hi, >> >>> I have a hypothesis why decl node gets visited multiple times. >> >>> >> >>> Consider the test-case: >> >>> template <typename T, unsigned N> >> >>> struct X >> >>> { >> >>> T a; >> >>> virtual int foo() { return N; } >> >>> }; >> >>> >> >>> typedef struct X<int, 1> x_1; >> >>> typedef struct X<int ,2> x_2; >> >>> >> >>> static x_1 obj1 __attribute__((used)); >> >>> static x_2 obj2 __attribute__((used)); >> >>> >> >>> Two additional structs are created by C++FE, c++filt shows: >> >>> _ZTI1XIiLj1EE -> typeinfo for X<int, 1u> >> >>> _ZTI1XIiLj2EE -> typeinfo for X<int, 2u> >> >>> >> >>> Both of these structs have only one field D.2991 and it appears it's >> >>> *shared* between them: >> >>> struct D.2991; >> >>> const void * D.2980; >> >>> const char * D.2981; >> >>> >> >>> Hence the decl node D.2991 and it's fields (D.2890, D.2981) get visited >> >>> twice: >> >>> once when walking _ZTI1XIiLj1EE and 2nd time when walking _ZTI1XIiLj2EE >> >>> >> >>> Dump of walking over the global structs for above test-case: >> >>> http://pastebin.com/R5SABW0c >> >>> >> >>> So it appears to me to me a DAG (interior node == struct decl, leaf == >> >>> non-struct field, >> >>> edge from node1 -> node2 if node2 is field of node1) is getting >> >>> created when struct decl >> >>> is a type-info object. >> >>> >> >>> I am not really clear on how we should proceed: >> >>> a) Keep using hash_map to store alignments so that every decl gets >> >>> visited only once. >> >>> b) Skip walking artificial record decls. >> >>> I am not sure if skipping all artificial struct decls would be a good >> >>> idea, but I don't >> >>> think it's possible to identify if a struct-decl is typeinfo struct at >> >>> middle-end ? >> >> >> >> You shouldn't end up walking those when walking the type of >> >> global decls. That is, don't walk typeinfo decls - yes, practically >> >> that means just not walking DECL_ARTIFICIAL things. >> > Hi, >> > I have done the changes in the patch (attached) and cross-tested >> > on arm*-*-* and aarch64*-*-* without failures. >> > Is it OK for stage-1 ? >> Hi, >> Is the attached patch OK for trunk ? >> Bootstrapped and tested on aarch64-linux-gnu, ppc64le-linux-gnu. >> Cross-tested on arm*-*-* and aarch64*-*-*. > > You can't simply use > > + offset = int_byte_position (field); > > as it can be placed at variable offset which will make int_byte_position > ICE. Note it also returns a truncated byte position (bit position > stripped) which may be undesirable here. I think you want to use > bit_position and if and only if DECL_FIELD_OFFSET and > DECL_FIELD_BIT_OFFSET are INTEGER_CST. oops, I didn't realize offsets could be variable. Will that be the case only for VLA member inside struct ? > > Your observation about the expensiveness of the walk still stands I guess > and eventually you should at least cache the > get_vec_alignment_for_record_decl cases. Please make those workers > _type rather than _decl helpers. Done > > You seem to simply get at the maximum vectorized field/array element > alignment possible for all arrays - you could restrict that to > arrays with at least vector size (as followup). Um sorry, I didn't understand this part. > > + /* Skip artificial decls like typeinfo decls or if > + record is packed. */ > + if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type)) > + return 0; > > I think we should honor DECL_USER_ALIGN as well and not mess with those > decls. Done > > Given the patch now does quite some extra work it might make sense > to split the symtab part out of the vect_can_force_dr_alignment_p > predicate and call that early. In the patch I call symtab_node::can_increase_alignment_p early. I tried moving that to it's callers - vect_compute_data_ref_alignment and increase_alignment::execute, however that failed some tests in vect, and hence I didn't add the following hunk in the patch. Did I miss some check ?
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 7652e21..2c1acee 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -795,7 +795,10 @@ vect_compute_data_ref_alignment (struct data_reference *dr) && TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR) base = TREE_OPERAND (TREE_OPERAND (base, 0), 0); - if (!vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype))) + if (!(TREE_CODE (base) == VAR_DECL + && decl_in_symtab_p (base) + && symtab_node::get (base)->can_increase_alignment_p () + && vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype)))) { if (dump_enabled_p ()) { Thanks, Prathamesh > > Richard.
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c new file mode 100644 index 0000000..7010a52 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-70.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target section_anchors } */ +/* { dg-require-effective-target vect_int } */ + +#define N 32 + +/* Increase alignment of struct if an array's offset is multiple of alignment of + vector type corresponding to it's scalar type. + For the below test-case: + offsetof(e) == 8 bytes. + i) For arm: let x = alignment of vector type corresponding to int, + x == 8 bytes. + Since offsetof(e) % x == 0, set DECL_ALIGN(a, b, c) to x. + ii) For aarch64, ppc: x == 16 bytes. + Since offsetof(e) % x != 0, don't increase alignment of a, b, c. +*/ + +static struct A { + int p1, p2; + int e[N]; +} a, b, c; + +int foo(void) +{ + for (int i = 0; i < N; i++) + a.e[i] = b.e[i] + c.e[i]; + + return a.e[0]; +} + +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target aarch64*-*-* } } } */ +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target powerpc64*-*-* } } } */ +/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 3 "increase_alignment" { target arm*-*-* } } } */ diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 2b25b45..c94bbcb 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -794,38 +794,129 @@ make_pass_slp_vectorize (gcc::context *ctxt) This should involve global alignment analysis and in the future also array padding. */ +static unsigned get_vec_alignment_for_type (tree); +static hash_map<tree, unsigned> *type_align_map; + +/* Return alignment of array's vector type corresponding to scalar type. + 0 if no vector type exists. */ +static unsigned +get_vec_alignment_for_array_type (tree type) +{ + gcc_assert (TREE_CODE (type) == ARRAY_TYPE); + + tree vectype = get_vectype_for_scalar_type (strip_array_types (type)); + return (vectype) ? TYPE_ALIGN (vectype) : 0; +} + +/* Return alignment of field having maximum alignment of vector type + corresponding to it's scalar type. For now, we only consider fields whose + offset is a multiple of it's vector alignment. + 0 if no suitable field is found. */ +static unsigned +get_vec_alignment_for_record_type (tree type) +{ + gcc_assert (TREE_CODE (type) == RECORD_TYPE); + unsigned max_align = 0, alignment; + HOST_WIDE_INT offset; + tree offset_tree; + + if (TYPE_PACKED (type)) + return 0; + + for (tree field = first_field (type); + field != NULL_TREE; + field = DECL_CHAIN (field)) + { + /* Skip if not FIELD_DECL or has variable offset. */ + if (TREE_CODE (field) != FIELD_DECL + || TREE_CODE (DECL_FIELD_OFFSET (field)) != INTEGER_CST + || TREE_CODE (DECL_FIELD_BIT_OFFSET (field)) != INTEGER_CST + || DECL_USER_ALIGN (field) + || DECL_ARTIFICIAL (field)) + continue; + + offset_tree = bit_position (field); + /* FIXME: is this check necessary since we check for variable offset above ? */ + if (TREE_CODE (offset_tree) != INTEGER_CST) + continue; + + offset = tree_to_uhwi (offset_tree); + alignment = get_vec_alignment_for_type (TREE_TYPE (field)); + + if (alignment + && (offset % alignment == 0) + && (alignment > max_align)) + max_align = alignment; + } + + return max_align; +} + +/* Return alignment of vector type corresponding to decl's scalar type + or 0 if it doesn't exist or the vector alignment is lesser than + decl's alignment. */ +static unsigned +get_vec_alignment_for_type (tree type) +{ + if (type == NULL_TREE) + return 0; + + gcc_assert (TYPE_P (type)); + + unsigned *slot = type_align_map->get (type); + if (slot) + return *slot; + + static unsigned alignment = 0; + switch (TREE_CODE (type)) + { + case ARRAY_TYPE: + alignment = get_vec_alignment_for_array_type (type); + break; + case RECORD_TYPE: + alignment = get_vec_alignment_for_record_type (type); + break; + default: + alignment = 0; + break; + } + + unsigned ret = (alignment > TYPE_ALIGN (type)) ? alignment : 0; + type_align_map->put (type, ret); + return ret; +} + +/* Entry point to increase_alignment pass. */ static unsigned int increase_alignment (void) { varpool_node *vnode; vect_location = UNKNOWN_LOCATION; + type_align_map = new hash_map<tree, unsigned>; /* Increase the alignment of all global arrays for vectorization. */ FOR_EACH_DEFINED_VARIABLE (vnode) { - tree vectype, decl = vnode->decl; - tree t; + tree decl = vnode->decl; unsigned int alignment; - t = TREE_TYPE (decl); - if (TREE_CODE (t) != ARRAY_TYPE) - continue; - vectype = get_vectype_for_scalar_type (strip_array_types (t)); - if (!vectype) - continue; - alignment = TYPE_ALIGN (vectype); - if (DECL_ALIGN (decl) >= alignment) - continue; - - if (vect_can_force_dr_alignment_p (decl, alignment)) + if ((decl_in_symtab_p (decl) + && !symtab_node::get (decl)->can_increase_alignment_p ()) + || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl)) + continue; + + alignment = get_vec_alignment_for_type (TREE_TYPE (decl)); + if (alignment && vect_can_force_dr_alignment_p (decl, alignment)) { - vnode->increase_alignment (TYPE_ALIGN (vectype)); + vnode->increase_alignment (alignment); dump_printf (MSG_NOTE, "Increasing alignment of decl: "); dump_generic_expr (MSG_NOTE, TDF_SLIM, decl); dump_printf (MSG_NOTE, "\n"); } } + + delete type_align_map; return 0; }
ChangeLog
Description: Binary data