On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > This patch tries to calculate conservatively-correct distance > vectors for two references whose base addresses are not the same. > It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence > isn't guaranteed to occur. > > The motivating example is: > > struct s { int x[8]; }; > void > f (struct s *a, struct s *b) > { > for (int i = 0; i < 8; ++i) > a->x[i] += b->x[i]; > } > > in which the "a" and "b" accesses are either independent or have a > dependence distance of 0 (assuming -fstrict-aliasing). Neither case > prevents vectorisation, so we can vectorise without an alias check. > > I'd originally wanted to do the same thing for arrays as well, e.g.: > > void > f (int a[][8], struct b[][8]) > { > for (int i = 0; i < 8; ++i) > a[0][i] += b[0][i]; > } > > I think this is valid because C11 6.7.6.2/6 says: > > For two array types to be compatible, both shall have compatible > element types, and if both size specifiers are present, and are > integer constant expressions, then both size specifiers shall have > the same constant value. > > So if we access an array through an int (*)[8], it must have type X[8] > or X[], where X is compatible with int. It doesn't seem possible in > either case for "a[0]" and "b[0]" to overlap when "a != b". > > However, Richard B said that (at least in gimple) we support arbitrary > overlap of arrays and allow arrays to be accessed with different > dimensionality. There are examples of this in PR50067. I've therefore > only handled references that end in a structure field access. > > There are two ways of handling these dependences in the vectoriser: > use them to limit VF, or check at runtime as before. I've gone for > the approach of checking at runtime if we can, to avoid limiting VF > unnecessarily. We still fall back to a VF cap when runtime checks > aren't allowed. > > The patch tests whether we queued an alias check with a dependence > distance of X and then picked a VF <= X, in which case it's safe to > drop the alias check. Since vect_prune_runtime_alias_check_list can > be called twice with different VF for the same loop, it's no longer > safe to clear may_alias_ddrs on exit. Instead we should use > comp_alias_ddrs to check whether versioning is necessary. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Hi Richard, It's nice to explore more alias opportunity, below are some simple comments embedded. > > Thanks, > Richard > > > gcc/ > 2017-05-03 Richard Sandiford <richard.sandif...@linaro.org> > > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try > to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back > to using the recorded distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 > +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 > @@ -191,6 +191,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; Is it better to follow existing code, i.e, name this as access_fn_a/access_fn_b. Thus we don't need to use const value 0/1 in various places, which is a little bit confusing. > + > /* A description of the iterations for which the elements are > accessed twice. */ > conflict_function *conflicting_iterations_in_a; > @@ -209,6 +212,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -264,6 +268,33 @@ struct data_dependence_relation > /* Set to true when the dependence relation is on the same data > access. */ > bool self_reference_p; > + > + /* True if the dependence described is conservatively correct rather > + than exact, and if it is still possible for the accesses to be > + conditionally independent. For example, the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a->f[i] += b->f[i]; > + > + conservatively have a distance vector of (0), for the case in which > + a == b, but the accesses are independent if a != b. Similarly, > + the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a[0].f[i] += b[i].f[i]; > + > + conservatively have a distance vector of (0), but they are indepenent > + when a != b + i. In contrast, the references in: > + > + struct s *a; > + for (int i = 0; i < n; ++i) > + a->f[i] += a->f[i]; > + > + have the same distance vector of (0), but the accesses can never be > + independent. */ > + bool could_be_independent_p; > }; > > typedef struct data_dependence_relation *ddr_p; > @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ > #define DDR_DIST_VECT(DDR, I) \ > DDR_DIST_VECTS (DDR)[I] > #define DDR_REVERSED_P(DDR) (DDR)->reversed_p > +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > bool dr_analyze_innermost (struct data_reference *, struct loop *); > @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data > return false; > > return true; > -} > - > -/* Return true when the DDR contains two data references that have the > - same access functions. */ > - > -static inline bool > -same_access_functions (const struct data_dependence_relation *ddr) > -{ > - unsigned i; > - > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), > - DR_ACCESS_FN (DDR_B (ddr), i))) > - return false; > - > - return true; > } > > /* Returns true when all the dependences are computable. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 > +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 > @@ -123,8 +123,7 @@ Software Foundation; either version 3, o > } dependence_stats; > > static bool subscript_dependence_tester_1 (struct data_dependence_relation *, > - struct data_reference *, > - struct data_reference *, > + unsigned int, unsigned int, > struct loop *); As mentioned, how about passing access_fn directly, rather than less meaningful 0/1 values? > /* Returns true iff A divides B. */ > > @@ -144,6 +143,21 @@ int_divides_p (int a, int b) > return ((b % a) == 0); > } > > +/* Return true if reference REF contains a union access. */ > + > +static bool > +ref_contains_union_access_p (tree ref) > +{ > + while (handled_component_p (ref)) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) > + return true; > + } > + return false; > +} > + > > > /* Dump into FILE all the data references from DATAREFS. */ > @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out > unsigned int i; > struct loop *loopi; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + subscript *sub; > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > fprintf (outf, " access_fn_A: "); > - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); > fprintf (outf, " access_fn_B: "); > - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); > - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); > + dump_subscript (outf, sub); > } > > fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); > @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str > struct data_dependence_relation *res; > unsigned int i; > > - res = XNEW (struct data_dependence_relation); > + res = XCNEW (struct data_dependence_relation); > DDR_A (res) = a; > DDR_B (res) = b; > DDR_LOOP_NEST (res).create (0); > - DDR_REVERSED_P (res) = false; > DDR_SUBSCRIPTS (res).create (0); > DDR_DIR_VECTS (res).create (0); > DDR_DIST_VECTS (res).create (0); > @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str > return res; > } > > - /* The case where the references are exactly the same. */ > - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) > + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); > + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); > + if (num_dimensions_a == 0 || num_dimensions_b == 0) > { > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], > - DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > - { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > - } > - DDR_AFFINE_P (res) = true; > - DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > - DDR_LOOP_NEST (res) = loop_nest; > - DDR_INNER_LOOP (res) = 0; > - DDR_SELF_REFERENCE (res) = true; > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > - { > - struct subscript *subscript; > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + /* For unconstrained bases, the outer (highest-index) subscript > + describes a variation in the base of the original DR_REF rather > + than a component access. We have no type that accurately describes > + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* > + applying the outer subscript) so limit the search to the last real > + component access. > + > + E.g. for: > > - subscript = XNEW (struct subscript); > - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > - SUB_DISTANCE (subscript) = chrec_dont_know; > - DDR_SUBSCRIPTS (res).safe_push (subscript); > + void > + f (int a[][8], int b[][8]) > + { > + for (int i = 0; i < 8; ++i) > + a[i * 2][0] = b[i][0]; > } > - return res; > + > + the a and b accesses have a single ARRAY_REF component reference [0] > + but have two subscripts. */ > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions_a -= 1; > + if (DR_UNCONSTRAINED_BASE (b)) > + num_dimensions_b -= 1; > + > + /* Now look for two sequences of component references that have the same > + type in both A and B. The first sequence includes an arbitrary mixture > + of array and structure references while the second always ends on a > + structure reference. > + > + The former (arbitrary) sequence uses access functions: > + > + [START_A, START_A + NUM_DIMENSIONS) of A > + [START_B, START_B + NUM_DIMENSIONS) of B > + > + The latter sequence uses access functions: > + > + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A > + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B > + > + STRUCT_REF_A and STRUCT_REF_B are the outer references for the IIUC, A and B always share the same latter sequence, and the common latter sequence ends at a structure reference providing alias information. Is it possible to record the the former arbitrary references instead of simple flag DDR_COULD_BE_INDEPENDENT_P. With this information, alias check can be simplified by stripping away address computation for the shared common sub-sequence. I doubt vect_create_cond_for_alias_checks could detect this kind CSE for now. Ah, I see you changed alias check code generation in order to handle this.
> + latter sequence. */ > + unsigned int start_a = 0; > + unsigned int start_b = 0; > + unsigned int num_dimensions = 0; > + unsigned int struct_start_a = 0; > + unsigned int struct_start_b = 0; > + unsigned int struct_num_dimensions = 0; > + unsigned int index_a = 0; > + unsigned int index_b = 0; > + tree next_ref_a = DR_REF (a); > + tree next_ref_b = DR_REF (b); > + tree struct_ref_a = NULL_TREE; > + tree struct_ref_b = NULL_TREE; > + while (index_a < num_dimensions_a && index_b < num_dimensions_b) > + { > + gcc_checking_assert (handled_component_p (next_ref_a)); > + gcc_checking_assert (handled_component_p (next_ref_b)); > + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); > + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); > + tree type_a = TREE_TYPE (outer_ref_a); > + tree type_b = TREE_TYPE (outer_ref_b); > + if (types_compatible_p (type_a, type_b)) > + { > + /* This pair of accesses belong to a suitable sequence. */ > + if (start_a + num_dimensions != index_a > + || start_b + num_dimensions != index_b) > + { > + /* Start a new sequence here. */ > + start_a = index_a; > + start_b = index_b; > + num_dimensions = 0; > + } > + num_dimensions += 1; > + if (TREE_CODE (type_a) == RECORD_TYPE) > + { > + struct_start_a = start_a; > + struct_start_b = start_b; > + struct_num_dimensions = num_dimensions; > + struct_ref_a = outer_ref_a; > + struct_ref_b = outer_ref_b; > + } > + next_ref_a = outer_ref_a; > + next_ref_b = outer_ref_b; > + index_a += 1; > + index_b += 1; > + continue; > + } > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); > + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); > + if (size_a <= size_b) > + { > + index_a += 1; > + next_ref_a = outer_ref_a; > + } > + if (size_b <= size_a) > + { > + index_b += 1; > + next_ref_b = outer_ref_b; > + } > } > > - /* If the references do not access the same object, we do not know > - whether they alias or not. We do not care about TBAA or alignment > - info so we can use OEP_ADDRESS_OF to avoid false negatives. > - But the accesses have to use compatible types as otherwise the > - built indices would not match. */ > - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), > OEP_ADDRESS_OF) > - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > - TREE_TYPE (DR_BASE_OBJECT (b)))) > + /* See whether the sequence ends at the base and whether the two bases > + are equal. We do not care about TBAA or alignment info so we can use > + OEP_ADDRESS_OF to avoid false negatives. */ > + tree base_a = DR_BASE_OBJECT (a); > + tree base_b = DR_BASE_OBJECT (b); > + bool same_base_p = (start_a + num_dimensions == num_dimensions_a > + && start_b + num_dimensions == num_dimensions_b > + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE > (b) > + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > + && types_compatible_p (TREE_TYPE (base_a), > + TREE_TYPE (base_b)) > + && (!loop_nest.exists () > + || (object_address_invariant_in_loop_p > + (loop_nest[0], base_a)))); Major change is in function initialize_data_dependence_relation in order to detect partial alias opportunity. The original equality check on DR_BASE_OBJECT is bypassed now. IMHO better to introduce a new parameter to compute_data_reference_for_loop etc., indicating whether we want to handle partial alias opportunity or not. After all, such computation is unnecessary for predcom/prefetch/parloop. It's only a waste of time computing it. > + > + /* If the bases are the same, we can include the base variation too. > + E.g. the b accesses in: > + > + for (int i = 0; i < n; ++i) > + b[i + 4][0] = b[i][0]; > + > + have a definite dependence distance of 4, while for: > + > + for (int i = 0; i < n; ++i) > + a[i + 4][0] = b[i][0]; > + > + the dependence distance depends on the gap between a and b. > + > + If the bases are different then we can only rely on the sequence > + rooted at a structure access, since arrays are allowed to overlap > + arbitrarily and change shape arbitrarily. E.g. we treat this as > + valid code: > + > + int a[256]; > + ... > + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; > + > + where two lvalues with the same int[4][3] type overlap, and where > + both lvalues are distinct from the object's declared type. */ > + if (same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions += 1; > + } > + else > + { > + start_a = struct_start_a; > + start_b = struct_start_b; > + num_dimensions = struct_num_dimensions; > } > > - /* If the base of the object is not invariant in the loop nest, we cannot > - analyze it. TODO -- in fact, it would suffice to record that there may > - be arbitrary dependences in the loops where the base object varies. */ > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT > (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + /* Punt if we didn't find a suitable sequence. */ > + if (num_dimensions == 0) > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > } > > - /* If the number of dimensions of the access to not agree we can have > - a pointer access to a component of the array element type and an > - array access while the base-objects are still the same. Punt. */ > - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) > + if (!same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + /* Partial overlap is possible for different bases when strict aliasing > + is not in effect. It's also possible if either base involves a union > + access; e.g. for: > + > + struct s1 { int a[2]; }; > + struct s2 { struct s1 b; int c; }; > + struct s3 { int d; struct s1 e; }; > + union u { struct s2 f; struct s3 g; } *p, *q; > + > + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at > + "p->g.e" (base "p->g") and might partially overlap the s1 at > + "q->g.e" (base "q->g"). */ > + if (!flag_strict_aliasing > + || ref_contains_union_access_p (struct_ref_a) > + || ref_contains_union_access_p (struct_ref_b)) > + { > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + DDR_COULD_BE_INDEPENDENT_P (res) = true; > } > > DDR_AFFINE_P (res) = true; > DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > + DDR_SUBSCRIPTS (res).create (num_dimensions); > DDR_LOOP_NEST (res) = loop_nest; > DDR_INNER_LOOP (res) = 0; > DDR_SELF_REFERENCE (res) = false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = 0; i < num_dimensions; ++i) > { > struct subscript *subscript; > > subscript = XNEW (struct subscript); > + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); > + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); > SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende > } > > /* Return false when fail to represent the data dependence as a > - distance vector. INIT_B is set to true when a component has been > + distance vector. A_INDEX is the index of the first reference > + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the > + second reference. INIT_B is set to true when a component has been > added to the distance vector DIST_V. INDEX_CARRY is then set to > the index in DIST_V that carries the dependence. */ > > static bool > build_classic_dist_vector_1 (struct data_dependence_relation *ddr, > - struct data_reference *ddr_a, > - struct data_reference *ddr_b, > + unsigned int a_index, unsigned int b_index, > lambda_vector dist_v, bool *init_b, > int *index_carry) > { > @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data > return false; > } > > - access_fn_a = DR_ACCESS_FN (ddr_a, i); > - access_fn_b = DR_ACCESS_FN (ddr_b, i); > + access_fn_a = SUB_ACCESS_FN (subscript, a_index); > + access_fn_b = SUB_ACCESS_FN (subscript, b_index); > > if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC > && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) > @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data > constant_access_functions (const struct data_dependence_relation *ddr) > { > unsigned i; > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) > - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) > + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) > return false; > > return true; > @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de > lambda_vector dist_v; > unsigned i; > int index_carry = DDR_NB_LOOPS (ddr); > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); > + tree access_fun = SUB_ACCESS_FN (sub, 0); > > if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) > { > @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de > return; > } > > - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); > > if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) > add_multivariate_self_dist (ddr, access_fun); > @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d > } > } > > +/* Return true when the DDR contains two data references that have the > + same access functions. */ > + > +static inline bool > +same_access_functions (const struct data_dependence_relation *ddr) > +{ > + unsigned i; > + subscript *sub; > + > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), > + SUB_ACCESS_FN (sub, 1))) > + return false; > + > + return true; > +} > + > /* Compute the classic per loop distance vector. DDR is the data > dependence relation to build a vector from. Return false when fail > to represent the data dependence as a distance vector. */ > @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d > } > > dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), > - dist_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, > &index_carry)) > return false; > > /* Save the distance vector if we initialized one. */ > @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d > if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) > { > lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - save_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, > + &index_carry)) > return false; > save_dist_v (ddr, save_v); > DDR_REVERSED_P (ddr) = true; > @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d > { > lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS > (ddr)); > > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), > - DDR_A (ddr), loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - opposite_v, &init_b, > + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, > &init_b, > &index_carry)) > return false; > > @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de > } > } > > -/* Helper function. Returns true when there is a dependence between > - data references DRA and DRB. */ > +/* Helper function. Returns true when there is a dependence between the > + data references. A_INDEX is the index of the first reference (0 for > + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ > > static bool > subscript_dependence_tester_1 (struct data_dependence_relation *ddr, > - struct data_reference *dra, > - struct data_reference *drb, > + unsigned int a_index, unsigned int b_index, > struct loop *loop_nest) > { > unsigned int i; > @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da > { > conflict_function *overlaps_a, *overlaps_b; > > - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), > - DR_ACCESS_FN (drb, i), > + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), > + SUB_ACCESS_FN (subscript, b_index), > &overlaps_a, &overlaps_b, > &last_conflicts, loop_nest); > > @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da > subscript_dependence_tester (struct data_dependence_relation *ddr, > struct loop *loop_nest) > { > - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), > loop_nest)) > + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) > dependence_stats.num_dependence_dependent++; > > compute_subscript_distance (ddr); > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 > @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * > refb = (struct mem_ref *) DDR_B (dep)->aux; > > if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know > + || DDR_COULD_BE_INDEPENDENT_P (dep) > || DDR_NUM_DIST_VECTS (dep) == 0) > { > /* If the dependence cannot be analyzed, assume that there might be As said, we could avoid computing such information in the first place. I can see predcom ingores it by explicitly checking DR_BASE_OBJECT, what about tree-parloops.c? > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100 > +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100 > @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ > ((L)->may_misalign_stmts.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ > - ((L)->may_alias_ddrs.length () > 0) > + ((L)->comp_alias_ddrs.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ > (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) > #define LOOP_REQUIRES_VERSIONING(L) \ > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100 > +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100 > @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct > } > > loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); > + > + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) > + /* For dependence distances of 2 or more, we have the option of > + limiting VF or checking for an alias at runtime. Prefer to check > + at runtime if we can, to avoid limiting the VF unnecessarily when > + the bases are in fact independent. > + > + Note that the alias checks will be removed if the VF ends up > + being small enough. */ > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + int dist = dist_v[loop_depth]; > + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) > + { > + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) > + return false; > + break; > + } > + } In theory, we could end up with DDR pushed more than once for alias test? Flag it in FOR loop and only mark it later? Thanks, bin > + > FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > { > int dist = dist_v[loop_depth]; > @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * > return false; > } > > +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH > + in DDR is >= VF. */ > + > +static bool > +dependence_distance_ge_vf (data_dependence_relation *ddr, > + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) > +{ > + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE > + || DDR_NUM_DIST_VECTS (ddr) == 0) > + return false; > + > + /* If the dependence is exact, we should have limited the VF instead. */ > + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); > + > + unsigned int i; > + lambda_vector dist_v; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + HOST_WIDE_INT dist = dist_v[loop_depth]; > + if (dist != 0 > + && !(dist > 0 && DDR_REVERSED_P (ddr)) > + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "dependence distance between "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); > + dump_printf (MSG_NOTE, " is >= VF\n"); > + } > + > + return true; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop > > comp_alias_ddrs.create (may_alias_ddrs.length ()); > > + unsigned int loop_depth > + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, > + LOOP_VINFO_LOOP_NEST (loop_vinfo)); > + > /* First, we collect all data ref pairs for aliasing checks. */ > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop > tree segment_length_a, segment_length_b; > gimple *stmt_a, *stmt_b; > > + /* Ignore the alias if the VF we chose ended up being no greater > + than the dependence distance. */ > + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) > + continue; > + > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop > return false; > } > > - /* All alias checks have been resolved at compilation time. */ > - if (!comp_alias_ddrs.length ()) > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return true; > } > > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 > 08:48:48.736031902 +0100 > @@ -0,0 +1,104 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N + 1]; }; > +struct t { struct s x[N + 1]; }; > +struct u { int x[N + 1]; int y; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i]; > +} > + > +void > +f2 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i]; > +} > + > +void > +f3 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i]; > +} > + > +void > +f4 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i]; > +} > + > +void > +f5 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i + 1]; > +} > + > +void > +f6 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i + 1]; > +} > + > +void > +f7 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i + 1]; > +} > + > +void > +f8 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i + 1]; > +} > + > +void > +f9 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[1].x[i]; > +} > + > +void > +f10 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i].x[i]; > +} > + > +void > +f11 (struct u *a, struct u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i] + b[i].y; > +} > + > +void > +f12 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f13 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 > 08:48:48.736031902 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +#define N 16 > + > +struct s1 { int a[N]; }; > +struct s2 { struct s1 b; int c; }; > +struct s3 { int d; struct s1 e; }; > +union u { struct s2 f; struct s3 g; }; > + > +/* We allow a and b to overlap arbitrarily. */ > + > +void > +f1 (int a[][N], int b[][N]) > +{ > + for (int i = 0; i < N; ++i) > + a[0][i] += b[0][i]; > +} > + > +void > +f2 (union u *a, union u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->f.b.a[i] += b->g.e.a[i]; > +} > + > +void > +f3 (struct s1 *a, struct s1 *b) > +{ > + for (int i = 0; i < N - 1; ++i) > + a->a[i + 1] += b->a[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 > 08:48:48.736031902 +0100 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N]; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } > } */ > +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 > to 0" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */