"Bin.Cheng" <amker.ch...@gmail.com> writes:
> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
> <richard.sandif...@linaro.org> wrote:
>> "Bin.Cheng" <amker.ch...@gmail.com> writes:
>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>> <richard.sandif...@linaro.org> wrote:
>>>> The first loop in the testcase regressed after my recent changes to
>>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>>> for bb analysis and end up with:
>>>>
>>>>    DR_BASE_ADDRESS: p or q
>>>>    DR_OFFSET: 0
>>>>    DR_INIT: 0 or 4
>>>>    DR_STEP: 16
>>>>
>>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>>
>>>>    DR_BASE_ADDRESS: p or q
>>>>    DR_OFFSET: (intptr_t) i
>>>>    DR_INIT: 0 or 4
>>>>    DR_STEP: 0
>>>>
>>>> This is also what we'd like to have for the unsigned "i", but the
>>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>>> longer seem to be related to the [i] ones.
>>>
>>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>>> I made changes in scev to better fold type conversion by various sources
>>> of information, for example, vrp, niters, undefined overflow behavior etc.
>>> In theory these information should be available for other optimizers without
>>> querying scev.  For the mentioned test, vrp should compute accurate range
>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>> worrying
>>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>>> could be more expensive (especially in case of (T)(i + j)), or because the
>>> CST part is in bigger precision after conversion.
>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>>> curious if it's possible to use aff_tree to implement strip_constant_offset
>>> here since aggressive folding is wanted.  After all, using additional chrec
>>> looks like a little heavy wrto the simple test.
>>
>> Yeah, using aff_tree does work here when the bounds are constant.
>> It doesn't look like it works for things like:
>>
>>     double p[1000];
>>     double q[1000];
>>
>>     void
>>     f4 (unsigned int n)
>>     {
>>       for (unsigned int i = 0; i < n; i += 4)
>>         {
>>           double a = q[i] + p[i];
>>           double b = q[i + 1] + p[i + 1];
>>           q[i] = a;
>>           q[i + 1] = b;
>>         }
>>     }
>>
>> though, where the bounds on the global arrays guarantee that [i + 1] can't
>> overflow, even though "n" is unconstrained.  The patch as posted handles
>> this case too.
> BTW is this a missed optimization in value range analysis?  The range
> information for i should flow in a way like: array boundary -> niters
> -> scev/vrp.
> I think that's what niters/scev do in analysis.

Yeah, maybe :-)  It looks like the problem is that when SLP runs,
the previous VRP pass came before loop header copying, so the (single)
header has to cope with n == 0 case.  Thus we get:

  Visiting statement:
  i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
  Intersecting
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 0]
  to
    [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
    [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 1000]
  to
    [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, 0]

  Visiting statement:
  _3 = i_15 + 1;
  Match-and-simplified i_15 + 1 to 1
  Intersecting
    [1, 1]
  and
    [0, +INF]
  to
    [1, 1]
  Found new range for _3: [1, 1]

(where _3 is the index we care about), followed by:

  Visiting statement:
  i_15 = ASSERT_EXPR <i_6, i_6 < n_9(D)>;
  Intersecting
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 4]
  to
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
    [0, 1000]
  to
    [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, n_9(D) + 4294967295]

  Visiting statement:
  _3 = i_15 + 1;
  Intersecting
    [0, +INF]
  and
    [0, +INF]
  to
    [0, +INF]
  Found new range for _3: [0, +INF]

I guess in this case it would be better to intersect the i_15 ranges
to [0, 1000] rather than [0, n_9(D) + 4294967295].

It does work if another VRP pass runs after CH.  But even then,
is it a good idea to rely on the range info being kept up-to-date
all the way through to SLP?  A lot happens inbetween.

FWIW, the old simple_iv check that I removed for bb data-ref analysis
relies on SCEV analysis too, so I don't think this is more expensive
than what we had before.

Thanks,
Richard

Reply via email to