Hi Bill,
On 11/13/13 18:04, Bill Schmidt wrote:
Hi Yufeng,
On Tue, 2013-11-12 at 22:34 +0000, Yufeng Zhang wrote:
Hi Bill,
Many thanks for the review.
I find your suggestion on using the next_interp field quite
enlightening. I prepared a patch which adds changes without modifying
the framework. With the patch, the slsr pass now tries to create a
second candidate for each memory accessing gimple statement, and chain
it to the first one via the next_interp field.
There are two implications in this approach though:
1) For each memory accessing gimple statement, there can be two
candidates, and these two candidates can be part of different dependency
graphs respectively (based on different base expr). Only one of the
dependency graph should be traversed to do replace_refs. Most of the
changes in the patch is to handle this implication.
I am aware that you suggest to follow the next-interp chain only when
the searching fails for the first interpretation. However, that doesn't
work very well, as it can result in worse code-gen. Taking a varied
form of the added test slsr-41.c for example:
i1: a2 [i] [j] = 1;
i2: a2 [i] [j+1] = 2;
i3: a2 [i+20] [j] = i;
With the 2nd interpretation created conditionally, the following two
dependency chains will be established:
i1 --> i2 (base expr is an SSA_NAME defined as (a2 + i * 200))
i1 --> i3 (base expr is a tree expression of (a2 + i * 200))
So it seems to me that really what needs to happen is to unify those two
base_exprs. We don't currently have logic in this pass to look up an
SSA name based on {base, index, stride, cand_type}, but that could be
done with a hash table. For now to save processing time it would make
sense to only do that for MEM candidates, though the cand_type should be
included in the hash to allow this to be used for other candidate types
if necessary. Of course, the SSA name definition must dominate the
candidate to be eligible as a basis, and that should be checked, but
this should generally be the case.
I'm not quite sure if the SSA_NAME look-up works; maybe I haven't fully
understood what you suggest.
For i1 --> i3, the base_expr is the tree expression (a2 + i * 200),
which is the result of a sequence of operations (conversion to affine,
immediate offset removal and conversion to tree), with another SSA_NAME
as the input. In other words, there are two SSA_NAMEs involved in the
example:
_s1: (a2 + i * 200).
_s2: (a2 + (i * 200 + 4000))
their strides and indexes are different.
I guess what you suggest is that given the tree expression (a2 + i *
200), look up an SSA_NAME and return _s1. If that is the case, the
challenge will be how to analyze the tree expression and get the
information on its {base, index, stride, cand_type}. While it would be
too specific and narrative to check for a POINTER_PLUS_EXPR expression,
the existing framework (e.g. create_add_ssa_cand) seems to assume that
the analyzed tree represent a genuine gimple statement.
Moreover, there may not be an SSA_NAME exists, for example in the
following case:
i1: a2 [i+1] [j] = 1;
i2: a2 [i+1] [j+1] = 2;
i3: a2 [i+20] [j] = i;
you wouldn't be able to find an SSA_NAME for (a2 + i * 200).
[snip]
A couple of quick comments on the next_interp patch:
* You don't need num_of_dependents (). You should be able to add a
forward declaration for count_candidates () and use it.
Missed count_candidates (); thanks!
* Your new test case is missing a final newline, so your patch doesn't
apply cleanly.
I'll fix it.
Please look into unifying the base expressions, as I believe you should
not need the preferred_ref_cand logic if you do that.
I would also like to live without preferred_ref_cand if feasible . :)
I still prefer the approach of using next_interp for its generality and
expandibility.
Sure; this approach indeed fit the framework better.
Regards,
Yufeng