Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >>>>> <richard.sandif...@linaro.org> wrote: >>>>>> Index: gcc/tree-data-ref.c >>>>>> =================================================================== >>>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>>>>> return alignment; >>>>>> } >>>>>> >>>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>>>>> + is based on. Return this object X on success and store the alignment >>>>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>>>>> + >>>>>> +static tree >>>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>>>>> +{ >>>>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE >>>>>> (base))) >>>>>> + return NULL_TREE; >>>>>> + >>>>>> + gimple *def = SSA_NAME_DEF_STMT (base); >>>>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >>>>> >>>>> I think it is an error to use the def stmt of base here. Instead you >>>>> need to pass down the point/loop of the use. For the same reason you >>>>> also want to instantiate parameters after analyzing the evolution. >>>>> >>>>> In the end, if the BB to be vectorized is contained in a loop nest >>>>> we want to >>>>> instantiate up to the point where (eventually) a DECL we can re-align >>>>> appears, >>>>> but using the containing loop for now looks OK. >>>> >>>> Why's that necessary/better though? We're not interested in the >>>> evolution of the value at the point it*s used by the potential vector >>>> code, but in how we get from the ultimate base (if there is one) to the >>>> definition of the SSA name. >>> >>> Hmm, ok. >>> >>>> I don't think instantiating the SCEV would give any new information, >>>> but it could lose some. E.g. if we have: >>>> >>>> x = &foo; >>>> do >>>> x += 8; >>>> while (*y++ < 10) >>>> ...potential vector use of *x... >>>> >>>> we wouldn't be able to express the address of x after the do-while >>>> loop, because there's nothing that counts the number of iterations. >>>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for >>>> some (unknown) N. >>> >>> Not sure that it works that way. I'm not 100% sure what kind of >>> parameters are left symbolic, and if analysis loop and instantiation >>> "loop" are the same I guess it doesn't make any difference... >>> >>>> (OK, so that doesn't actually work unless we take the effort >>>> to look through loop-closed SSA form, but still...) >>>> >>>>>> + /* Peel chrecs and record the minimum alignment preserved by >>>>>> + all steps. */ >>>>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>>>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>>>>> + { >>>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >>>>>> (base)); >>>>>> + alignment = MIN (alignment, step_alignment); >>>>>> + base = CHREC_LEFT (base); >>>>>> + } >>>>>> + >>>>>> + /* Punt if the expression is too complicated to handle. */ >>>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>>>>> (base))) >>>>>> + return NULL_TREE; >>>>>> + >>>>>> + /* Analyze the base to which the steps we peeled were applied. */ >>>>>> + poly_int64 bitsize, bitpos, bytepos; >>>>>> + machine_mode mode; >>>>>> + int unsignedp, reversep, volatilep; >>>>>> + tree offset; >>>>>> + base = get_inner_reference (build_fold_indirect_ref (base), >>>>> >>>>> ick :/ >>>>> >>>>> what happens if you simply use get_pointer_alignment here and >>>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >>>>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >>>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or >>>>> &MEM[a_1 + 4]. >>>> >>>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost >>>> already, which should have stripped off both the constant and variable >>>> parts of the offset. So I think SSA names are the only interesting >>>> input. The problem is that once we follow the definitions of an SSA >>>> name through CHREC_LEFTs, we get a general address again. >>>> >>>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want >>>> because they take the alignment of the existing object into account, >>>> whereas here we explicitly want to ignore that and only calculate the >>>> alignment of the offset. >>> >>> Ah, indeed - I missed that fact. >>> >>>> I guess we could pass a flag down to ignore the current alignment though? >>> >>> But we need the base anyway... so, given we can only ever re-align decls >>> and never plain pointers instead of using build_fold_indirect_ref do >>> >>> if (TREE_CODE (base) != ADDR_EXPR) >>> return NULL_TREE; >>> >>> else use TREE_OPERAND (base, 0)? >> >> The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR, >> which is what we get for things like the do-while above (e.g. >> { &a + 64, +, 64 }_n if x points to a double *.) >> >> I guess everything we care about would be handled by fold_indirect_ref_1 >> though, so would that be OK instead? > > Sounds like a compromise ;) > > Patch is ok with that change.
Thanks, here's what I installed. Tested as before. Richard 2018-03-24 Richard Sandiford <richard.sandif...@linaro.org> gcc/ PR tree-optimization/84005 * tree-data-ref.h (get_base_for_alignment): Declare. * tree-data-ref.c (get_base_for_alignment_1): New function. (get_base_for_alignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use get_base_for_alignment to find a suitable base object, instead of always using drb->base_address. gcc/testsuite/ PR tree-optimization/84005 * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about failing to force the alignment. Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2018-03-18 10:24:43.602669700 +0000 +++ gcc/tree-data-ref.h 2018-03-24 10:47:51.281625630 +0000 @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec extern tree find_data_references_in_bb (struct loop *, basic_block, vec<data_reference_p> *); extern unsigned int dr_alignment (innermost_loop_behavior *); +extern tree get_base_for_alignment (tree, unsigned int *); /* Return the alignment in bytes that DR is guaranteed to have at all times. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2018-03-21 20:51:32.502280975 +0000 +++ gcc/tree-data-ref.c 2018-03-24 10:47:51.281625630 +0000 @@ -5202,6 +5202,81 @@ dr_alignment (innermost_loop_behavior *d return alignment; } +/* If BASE is a pointer-typed SSA name, try to find the object that it + is based on. Return this object X on success and store the alignment + in bytes of BASE - &X in *ALIGNMENT_OUT. */ + +static tree +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) +{ + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) + return NULL_TREE; + + gimple *def = SSA_NAME_DEF_STMT (base); + base = analyze_scalar_evolution (loop_containing_stmt (def), base); + + /* Peel chrecs and record the minimum alignment preserved by + all steps. */ + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; + while (TREE_CODE (base) == POLYNOMIAL_CHREC) + { + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); + alignment = MIN (alignment, step_alignment); + base = CHREC_LEFT (base); + } + + /* Punt if the expression is too complicated to handle. */ + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base))) + return NULL_TREE; + + /* The only useful cases are those for which a dereference folds to something + other than an INDIRECT_REF. */ + tree ref_type = TREE_TYPE (TREE_TYPE (base)); + tree ref = fold_indirect_ref_1 (UNKNOWN_LOCATION, ref_type, base); + if (!ref) + return NULL_TREE; + + /* Analyze the base to which the steps we peeled were applied. */ + poly_int64 bitsize, bitpos, bytepos; + machine_mode mode; + int unsignedp, reversep, volatilep; + tree offset; + base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode, + &unsignedp, &reversep, &volatilep); + if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos)) + return NULL_TREE; + + /* Restrict the alignment to that guaranteed by the offsets. */ + unsigned int bytepos_alignment = known_alignment (bytepos); + if (bytepos_alignment != 0) + alignment = MIN (alignment, bytepos_alignment); + if (offset) + { + unsigned int offset_alignment = highest_pow2_factor (offset); + alignment = MIN (alignment, offset_alignment); + } + + *alignment_out = alignment; + return base; +} + +/* Return the object whose alignment would need to be changed in order + to increase the alignment of ADDR. Store the maximum achievable + alignment in *MAX_ALIGNMENT. */ + +tree +get_base_for_alignment (tree addr, unsigned int *max_alignment) +{ + tree base = get_base_for_alignment_1 (addr, max_alignment); + if (base) + return base; + + if (TREE_CODE (addr) == ADDR_EXPR) + addr = TREE_OPERAND (addr, 0); + *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; + return addr; +} + /* Recursive helper function. */ static bool Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-03-18 10:24:43.602669700 +0000 +++ gcc/tree-vect-data-refs.c 2018-03-24 10:47:51.283625566 +0000 @@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct if (base_alignment < vector_alignment) { - tree base = drb->base_address; - if (TREE_CODE (base) == ADDR_EXPR) - base = TREE_OPERAND (base, 0); - if (!vect_can_force_dr_alignment_p (base, - vector_alignment * BITS_PER_UNIT)) + unsigned int max_alignment; + tree base = get_base_for_alignment (drb->base_address, &max_alignment); + if (max_alignment < vector_alignment + || !vect_can_force_dr_alignment_p (base, + vector_alignment * BITS_PER_UNIT)) { if (dump_enabled_p ()) { Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2018-03-18 10:24:43.602669700 +0000 +++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2018-03-24 10:47:51.280625662 +0000 @@ -54,5 +54,5 @@ int main (void) return 0; } +/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */ /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */ -