Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)

Richard Biener Tue, 14 Nov 2017 00:49:37 -0800

On Mon, 13 Nov 2017, Martin Sebor wrote:

> On 11/10/2017 01:00 AM, Richard Biener wrote:
> > On Thu, 9 Nov 2017, Jeff Law wrote:
> > 
> > > On 11/02/2017 05:48 AM, Richard Biener wrote:
> > > 
> > > > 
> > > > There were elaborate transforms of ptr + CST to ptr->a.b.c[3] in the
> > > > past.  We have ripped out _most_ of them because of bad interaction
> > > > with dependence analysis and _b_o_s warnings.
> > > > 
> > > > But for example PRE might still end up propagating
> > > > 
> > > >  _1 = &ptr->a.b.c;
> > > >  _2 = (*_1)[i_3];
> > > > 
> > > > as
> > > > 
> > > >  _2 = ptr->a.b.c[i_3];
> > > > 
> > > > But it's not so much GCC building up GIMPLE expressions that would
> > > > cause false positives but "valid" C code and "invalid" C code
> > > > represented exactly the same in GCC.  Let's say
> > > I think this is a key point.
> > 
> > It's the usual issue with an optimizing compiler vs. a static analyzer.
> > We try to get rid of the little semantic details of the input languages
> > that in the end do not matter for code-generation but that makes
> > using those semantic details hard (sometimes the little details
> > are useful, like signed overflow being undefined).
> > 
> > For GIMPLE it's also often the case that we didn't really thoroughly
> > specify the semantics of the IL - like is an aggregate copy a
> > block copy (that's how we expand it to RTL) or a memberwise copy?
> > SRA treats it like the latter in some cases but memcpy folding
> > turns memcpy into aggregate assignments ... (now think about padding).
> > 
> > It's not that GCC doesn't have its set of existing issues with
> > respect to interpreting GIMPLE semantic as it seems fit in one way
> > here and in another way there.  I'm just always nervous when adding
> > new "interpretations" where I know the non-existing formal definition
> > of GIMPLE leaves things unspecified.
> > 
> > For example we _do_ use array bounds and array accesses (but _not_
> > and for now _nowhere_ if they appear in address computations!)
> > to derive niter information.  At the same time, because of this
> > exploitation, we try very very hard to never (ok, PRE above as a
> > couter-example) create an actual array access when dereferencing
> > a pointer that is constructed by taking the address of an array-ref.
> > That's why Martin added the warning to forwprop because that pass,
> > when forwarding such addresses, gets rid of the array-ref.
> > 
> > > > > Or, if that's not it, what exactly is your concern with this
> > > > > enhancement?  If it's that it's implemented in forwprop, what
> > > > > would be a better place, e.g., earlier in the optimization
> > > > > phase?  If it's something something else, I'd appreciate it
> > > > > if you could explain what.
> > > > 
> > > > For one implementing this in forwprop looks like a move in the
> > > > wrong direction.  I'd like to have separate warning passes or
> > > > at most amend warnings from optimization passes, not add new ones.
> > > I tend to agree.  That's one of the reasons why I pushed Aldy away from
> > > doing this kind of stuff within VRP.
> > > 
> > > What I envision is a pass which does a dominator walk through the
> > > blocks.  It gathers context sensitive range information as it does the
> > > walk.
> > > 
> > > As we encounter array references, we try to check them against the
> > > current range information.  We could also try to warn about certain
> > > pointer computations, though we have to be more careful with those.
> > > 
> > > Though I certainly still worry that the false positive cases which led
> > > Aldy, Andrew and myself to look at path sensitive ranges arent' resolved
> > > and will limit the utility of doing more array range checking.
> > 
> > I fear while this might be a little bit cleaner you'd still have to
> > do this very very early in the optimization pipeline (see all the
> > hard time we had with __builtin_object_size) and thus you won't catch
> > very many cases unless you start doing an IPA pass and handle propagating
> > through memory.  Which is when you arrived at a full-blown static
> > analyzer.
> 
> I have a different concern with the general idea of moving these
> kinds of warnings into passes of their own.  It would unavoidably
> result in duplicating some code from the optimization passes (at
> a minimum, the GIMPLE traversal, but likely more than that).  It
> would also require pretty tight coupling between the warning pass
> and the optimization passes as the latter can defeat the work of
> the former (as happens in this case).  But most significantly,
> the separation isn't feasible in cases where the optimization
> pass computes and maintains non-trivial internal state (like,
> say, tree-object-size or tree-ssa-strlen).


Well, as I said elsewhere the goal is to modularize the "optimizations"
enough so you can run their analysis phase together from a
"warning pass" and get access to their internal state.  Like what
Jeff is doing with VRP.  I'm currently working on some region-based
value-numbering where the iteration scheme is supposed to allow
pluggin in, say, a value-range lattice.  It's currently not
allowing path separation (simulating jump-threading) at the moment
but you have to start somewhere ...

> I think of these kinds of "basic" warnings as error checking and
> handling.  It makes no more sense to me to have a separate pass
> check for input errors than it would to have one to check the
> validity of other kinds of input within GCC (such as trees of
> the wrong kind being passed from one function to another due to
> a GCC programmer's error).  They go hand in hand.
> 
> To be sure, there are more sophisticated kinds of checkers that
> depend on an in-depth analysis of the program that's outside of
> what most existing optimization passes do.  The sprintf warning
> comes to mind as an example.  But even those can be (and the
> sprintf pass has been) used to do both: detect and diagnose
> errors _and_ optimize code.  And even these passes (or especially
> they) do or would benefit from data computed by other optimization
> passes to deliver better results (whether it be avoiding false
> positives or improving the code they generate).

Sure - no disagreement here.  But then those warnings have to
operate within the limitations of the GIMPLE IL which was designed
for an optimizing compiler with a variety of input source languages.
That is, you simply can't implement every diagnostic you'd like
to have when coming from a source level perspective.

It seems to be you're somewhat pushing the boundary here ;)

Richard.

> That said, it would make perfect sense to me to have the warning
> code live in separate files from the optimization code, and keep
> the separation of concerns and responsibilities crisp and well
> defined by their APIs.

Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)

Reply via email to