On 11/10/2017 01:00 AM, Richard Biener wrote:
On Thu, 9 Nov 2017, Jeff Law wrote:
On 11/02/2017 05:48 AM, Richard Biener wrote:
There were elaborate transforms of ptr + CST to ptr->a.b.c[3] in the
past. We have ripped out _most_ of them because of bad interaction
with dependence analysis and _b_o_s warnings.
But for example PRE might still end up propagating
_1 = &ptr->a.b.c;
_2 = (*_1)[i_3];
as
_2 = ptr->a.b.c[i_3];
But it's not so much GCC building up GIMPLE expressions that would
cause false positives but "valid" C code and "invalid" C code
represented exactly the same in GCC. Let's say
I think this is a key point.
It's the usual issue with an optimizing compiler vs. a static analyzer.
We try to get rid of the little semantic details of the input languages
that in the end do not matter for code-generation but that makes
using those semantic details hard (sometimes the little details
are useful, like signed overflow being undefined).
For GIMPLE it's also often the case that we didn't really thoroughly
specify the semantics of the IL - like is an aggregate copy a
block copy (that's how we expand it to RTL) or a memberwise copy?
SRA treats it like the latter in some cases but memcpy folding
turns memcpy into aggregate assignments ... (now think about padding).
It's not that GCC doesn't have its set of existing issues with
respect to interpreting GIMPLE semantic as it seems fit in one way
here and in another way there. I'm just always nervous when adding
new "interpretations" where I know the non-existing formal definition
of GIMPLE leaves things unspecified.
For example we _do_ use array bounds and array accesses (but _not_
and for now _nowhere_ if they appear in address computations!)
to derive niter information. At the same time, because of this
exploitation, we try very very hard to never (ok, PRE above as a
couter-example) create an actual array access when dereferencing
a pointer that is constructed by taking the address of an array-ref.
That's why Martin added the warning to forwprop because that pass,
when forwarding such addresses, gets rid of the array-ref.
Or, if that's not it, what exactly is your concern with this
enhancement? If it's that it's implemented in forwprop, what
would be a better place, e.g., earlier in the optimization
phase? If it's something something else, I'd appreciate it
if you could explain what.
For one implementing this in forwprop looks like a move in the
wrong direction. I'd like to have separate warning passes or
at most amend warnings from optimization passes, not add new ones.
I tend to agree. That's one of the reasons why I pushed Aldy away from
doing this kind of stuff within VRP.
What I envision is a pass which does a dominator walk through the
blocks. It gathers context sensitive range information as it does the walk.
As we encounter array references, we try to check them against the
current range information. We could also try to warn about certain
pointer computations, though we have to be more careful with those.
Though I certainly still worry that the false positive cases which led
Aldy, Andrew and myself to look at path sensitive ranges arent' resolved
and will limit the utility of doing more array range checking.
I fear while this might be a little bit cleaner you'd still have to
do this very very early in the optimization pipeline (see all the
hard time we had with __builtin_object_size) and thus you won't catch
very many cases unless you start doing an IPA pass and handle propagating
through memory. Which is when you arrived at a full-blown static
analyzer.
I have a different concern with the general idea of moving these
kinds of warnings into passes of their own. It would unavoidably
result in duplicating some code from the optimization passes (at
a minimum, the GIMPLE traversal, but likely more than that). It
would also require pretty tight coupling between the warning pass
and the optimization passes as the latter can defeat the work of
the former (as happens in this case). But most significantly,
the separation isn't feasible in cases where the optimization
pass computes and maintains non-trivial internal state (like,
say, tree-object-size or tree-ssa-strlen).
I think of these kinds of "basic" warnings as error checking and
handling. It makes no more sense to me to have a separate pass
check for input errors than it would to have one to check the
validity of other kinds of input within GCC (such as trees of
the wrong kind being passed from one function to another due to
a GCC programmer's error). They go hand in hand.
To be sure, there are more sophisticated kinds of checkers that
depend on an in-depth analysis of the program that's outside of
what most existing optimization passes do. The sprintf warning
comes to mind as an example. But even those can be (and the
sprintf pass has been) used to do both: detect and diagnose
errors _and_ optimize code. And even these passes (or especially
they) do or would benefit from data computed by other optimization
passes to deliver better results (whether it be avoiding false
positives or improving the code they generate).
That said, it would make perfect sense to me to have the warning
code live in separate files from the optimization code, and keep
the separation of concerns and responsibilities crisp and well
defined by their APIs.
Martin