Richard Biener <richard.guent...@gmail.com> writes: > On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> Index: gcc/tree-data-ref.c >>>> =================================================================== >>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>>> return alignment; >>>> } >>>> >>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>>> + is based on. Return this object X on success and store the alignment >>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>>> + >>>> +static tree >>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>>> +{ >>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >>>> + return NULL_TREE; >>>> + >>>> + gimple *def = SSA_NAME_DEF_STMT (base); >>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >>> >>> I think it is an error to use the def stmt of base here. Instead you >>> need to pass down the point/loop of the use. For the same reason you >>> also want to instantiate parameters after analyzing the evolution. >>> >>> In the end, if the BB to be vectorized is contained in a loop nest we want >>> to >>> instantiate up to the point where (eventually) a DECL we can re-align >>> appears, >>> but using the containing loop for now looks OK. >> >> Why's that necessary/better though? We're not interested in the >> evolution of the value at the point it*s used by the potential vector >> code, but in how we get from the ultimate base (if there is one) to the >> definition of the SSA name. > > Hmm, ok. > >> I don't think instantiating the SCEV would give any new information, >> but it could lose some. E.g. if we have: >> >> x = &foo; >> do >> x += 8; >> while (*y++ < 10) >> ...potential vector use of *x... >> >> we wouldn't be able to express the address of x after the do-while >> loop, because there's nothing that counts the number of iterations. >> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for >> some (unknown) N. > > Not sure that it works that way. I'm not 100% sure what kind of > parameters are left symbolic, and if analysis loop and instantiation > "loop" are the same I guess it doesn't make any difference... > >> (OK, so that doesn't actually work unless we take the effort >> to look through loop-closed SSA form, but still...) >> >>>> + /* Peel chrecs and record the minimum alignment preserved by >>>> + all steps. */ >>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>>> + { >>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >>>> (base)); >>>> + alignment = MIN (alignment, step_alignment); >>>> + base = CHREC_LEFT (base); >>>> + } >>>> + >>>> + /* Punt if the expression is too complicated to handle. */ >>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>>> (base))) >>>> + return NULL_TREE; >>>> + >>>> + /* Analyze the base to which the steps we peeled were applied. */ >>>> + poly_int64 bitsize, bitpos, bytepos; >>>> + machine_mode mode; >>>> + int unsignedp, reversep, volatilep; >>>> + tree offset; >>>> + base = get_inner_reference (build_fold_indirect_ref (base), >>> >>> ick :/ >>> >>> what happens if you simply use get_pointer_alignment here and >>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + >>> 4]. >> >> Yeah, but those have (hopefully) been handled by dr_analyze_innermost >> already, which should have stripped off both the constant and variable >> parts of the offset. So I think SSA names are the only interesting >> input. The problem is that once we follow the definitions of an SSA >> name through CHREC_LEFTs, we get a general address again. >> >> get_pointer_alignment and get_pointer_alignment_1 don't do what we want >> because they take the alignment of the existing object into account, >> whereas here we explicitly want to ignore that and only calculate the >> alignment of the offset. > > Ah, indeed - I missed that fact. > >> I guess we could pass a flag down to ignore the current alignment though? > > But we need the base anyway... so, given we can only ever re-align decls > and never plain pointers instead of using build_fold_indirect_ref do > > if (TREE_CODE (base) != ADDR_EXPR) > return NULL_TREE; > > else use TREE_OPERAND (base, 0)?
The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR, which is what we get for things like the do-while above (e.g. { &a + 64, +, 64 }_n if x points to a double *.) I guess everything we care about would be handled by fold_indirect_ref_1 though, so would that be OK instead? Thanks, Richard