On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> Index: gcc/tree-data-ref.c >>>>> =================================================================== >>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>>>> return alignment; >>>>> } >>>>> >>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>>>> + is based on. Return this object X on success and store the alignment >>>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>>>> + >>>>> +static tree >>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>>>> +{ >>>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >>>>> + return NULL_TREE; >>>>> + >>>>> + gimple *def = SSA_NAME_DEF_STMT (base); >>>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >>>> >>>> I think it is an error to use the def stmt of base here. Instead you >>>> need to pass down the point/loop of the use. For the same reason you >>>> also want to instantiate parameters after analyzing the evolution. >>>> >>>> In the end, if the BB to be vectorized is contained in a loop nest we want >>>> to >>>> instantiate up to the point where (eventually) a DECL we can re-align >>>> appears, >>>> but using the containing loop for now looks OK. >>> >>> Why's that necessary/better though? We're not interested in the >>> evolution of the value at the point it*s used by the potential vector >>> code, but in how we get from the ultimate base (if there is one) to the >>> definition of the SSA name. >> >> Hmm, ok. >> >>> I don't think instantiating the SCEV would give any new information, >>> but it could lose some. E.g. if we have: >>> >>> x = &foo; >>> do >>> x += 8; >>> while (*y++ < 10) >>> ...potential vector use of *x... >>> >>> we wouldn't be able to express the address of x after the do-while >>> loop, because there's nothing that counts the number of iterations. >>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for >>> some (unknown) N. >> >> Not sure that it works that way. I'm not 100% sure what kind of >> parameters are left symbolic, and if analysis loop and instantiation >> "loop" are the same I guess it doesn't make any difference... >> >>> (OK, so that doesn't actually work unless we take the effort >>> to look through loop-closed SSA form, but still...) >>> >>>>> + /* Peel chrecs and record the minimum alignment preserved by >>>>> + all steps. */ >>>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>>>> + { >>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >>>>> (base)); >>>>> + alignment = MIN (alignment, step_alignment); >>>>> + base = CHREC_LEFT (base); >>>>> + } >>>>> + >>>>> + /* Punt if the expression is too complicated to handle. */ >>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>>>> (base))) >>>>> + return NULL_TREE; >>>>> + >>>>> + /* Analyze the base to which the steps we peeled were applied. */ >>>>> + poly_int64 bitsize, bitpos, bytepos; >>>>> + machine_mode mode; >>>>> + int unsignedp, reversep, volatilep; >>>>> + tree offset; >>>>> + base = get_inner_reference (build_fold_indirect_ref (base), >>>> >>>> ick :/ >>>> >>>> what happens if you simply use get_pointer_alignment here and >>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >>>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + >>>> 4]. >>> >>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost >>> already, which should have stripped off both the constant and variable >>> parts of the offset. So I think SSA names are the only interesting >>> input. The problem is that once we follow the definitions of an SSA >>> name through CHREC_LEFTs, we get a general address again. >>> >>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want >>> because they take the alignment of the existing object into account, >>> whereas here we explicitly want to ignore that and only calculate the >>> alignment of the offset. >> >> Ah, indeed - I missed that fact. >> >>> I guess we could pass a flag down to ignore the current alignment though? >> >> But we need the base anyway... so, given we can only ever re-align decls >> and never plain pointers instead of using build_fold_indirect_ref do >> >> if (TREE_CODE (base) != ADDR_EXPR) >> return NULL_TREE; >> >> else use TREE_OPERAND (base, 0)? > > The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR, > which is what we get for things like the do-while above (e.g. > { &a + 64, +, 64 }_n if x points to a double *.) > > I guess everything we care about would be handled by fold_indirect_ref_1 > though, so would that be OK instead?
Sounds like a compromise ;) Patch is ok with that change. Thanks, Richard. > Thanks, > Richard