"Bin.Cheng" <amker.ch...@gmail.com> writes: > On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> "Bin.Cheng" <amker.ch...@gmail.com> writes: >>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> The first loop in the testcase regressed after my recent changes to >>>> dr_analyze_innermost. Previously we would treat "i" as an iv even >>>> for bb analysis and end up with: >>>> >>>> DR_BASE_ADDRESS: p or q >>>> DR_OFFSET: 0 >>>> DR_INIT: 0 or 4 >>>> DR_STEP: 16 >>>> >>>> We now always keep the step as 0 instead, so for an int "i" we'd have: >>>> >>>> DR_BASE_ADDRESS: p or q >>>> DR_OFFSET: (intptr_t) i >>>> DR_INIT: 0 or 4 >>>> DR_STEP: 0 >>>> >>>> This is also what we'd like to have for the unsigned "i", but the >>>> problem is that strip_constant_offset thinks that the "i + 1" in >>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1". >>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA >>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no >>>> longer seem to be related to the [i] ones. >>> >>> Didn't read the change in detail, so sorry if I mis-understood the issue. >>> I made changes in scev to better fold type conversion by various sources >>> of information, for example, vrp, niters, undefined overflow behavior etc. >>> In theory these information should be available for other optimizers without >>> querying scev. For the mentioned test, vrp should compute accurate range >>> information for "i" so that we can fold (intptr_t) (i + 1) it without >>> worrying >>> overflow. Note we don't do it in generic folding because (intptr_t) (i) + 1 >>> could be more expensive (especially in case of (T)(i + j)), or because the >>> CST part is in bigger precision after conversion. >>> But such folding is wanted in several places, e.g, IVOPTs. To provide such >>> an interface, we changed tree-affine and made it do aggressive fold. I am >>> curious if it's possible to use aff_tree to implement strip_constant_offset >>> here since aggressive folding is wanted. After all, using additional chrec >>> looks like a little heavy wrto the simple test. >> >> Yeah, using aff_tree does work here when the bounds are constant. >> It doesn't look like it works for things like: >> >> double p[1000]; >> double q[1000]; >> >> void >> f4 (unsigned int n) >> { >> for (unsigned int i = 0; i < n; i += 4) >> { >> double a = q[i] + p[i]; >> double b = q[i + 1] + p[i + 1]; >> q[i] = a; >> q[i + 1] = b; >> } >> } >> >> though, where the bounds on the global arrays guarantee that [i + 1] can't >> overflow, even though "n" is unconstrained. The patch as posted handles >> this case too. > BTW is this a missed optimization in value range analysis? The range > information for i should flow in a way like: array boundary -> niters > -> scev/vrp. > I think that's what niters/scev do in analysis.
Yeah, maybe :-) It looks like the problem is that when SLP runs, the previous VRP pass came before loop header copying, so the (single) header has to cope with n == 0 case. Thus we get: Visiting statement: i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 0] to [0, 0] EQUIVALENCES: { i_6 } (1 elements) Intersecting [0, 0] EQUIVALENCES: { i_6 } (1 elements) and [0, 1000] to [0, 0] EQUIVALENCES: { i_6 } (1 elements) Found new range for i_15: [0, 0] Visiting statement: _3 = i_15 + 1; Match-and-simplified i_15 + 1 to 1 Intersecting [1, 1] and [0, +INF] to [1, 1] Found new range for _3: [1, 1] (where _3 is the index we care about), followed by: Visiting statement: i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>; Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 4] to [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) Intersecting [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) and [0, 1000] to [0, n_9(D) + 4294967295] EQUIVALENCES: { i_6 } (1 elements) Found new range for i_15: [0, n_9(D) + 4294967295] Visiting statement: _3 = i_15 + 1; Intersecting [0, +INF] and [0, +INF] to [0, +INF] Found new range for _3: [0, +INF] I guess in this case it would be better to intersect the i_15 ranges to [0, 1000] rather than [0, n_9(D) + 4294967295]. It does work if another VRP pass runs after CH. But even then, is it a good idea to rely on the range info being kept up-to-date all the way through to SLP? A lot happens inbetween. FWIW, the old simple_iv check that I removed for bb data-ref analysis relies on SCEV analysis too, so I don't think this is more expensive than what we had before. Thanks, Richard