On Mon, 13 Nov 2017, Jeff Law wrote:

> On 11/10/2017 01:00 AM, Richard Biener wrote:
> > 
> > It's the usual issue with an optimizing compiler vs. a static analyzer.
> > We try to get rid of the little semantic details of the input languages
> > that in the end do not matter for code-generation but that makes
> > using those semantic details hard (sometimes the little details
> > are useful, like signed overflow being undefined).
> > 
> > For GIMPLE it's also often the case that we didn't really thoroughly
> > specify the semantics of the IL - like is an aggregate copy a
> > block copy (that's how we expand it to RTL) or a memberwise copy?
> > SRA treats it like the latter in some cases but memcpy folding
> > turns memcpy into aggregate assignments ... (now think about padding).
> Understood far too well.  In fact, I was looking at the aggregate copy
> stuff not terribly long ago and concluded that depending on either
> particular behavior was undesirable. Something (glibc IIRC) was
> depending on the padding being copied because they were actually shoving
> live data into the pad.  Ugh.
> 
> > 
> > It's not that GCC doesn't have its set of existing issues with
> > respect to interpreting GIMPLE semantic as it seems fit in one way
> > here and in another way there.  I'm just always nervous when adding
> > new "interpretations" where I know the non-existing formal definition
> > of GIMPLE leaves things unspecified.
> Right.  No disagreement from me.  We have these issues and address
> representation is just one of a class of things which we can represent
> in gimple that don't "properly" map back to the source language.  And
> that's probably inherent in the lowering from the source language to
> something like GIMPLE.
> 
> 
> > 
> > For example we _do_ use array bounds and array accesses (but _not_
> > and for now _nowhere_ if they appear in address computations!)
> > to derive niter information.  At the same time, because of this
> > exploitation, we try very very hard to never (ok, PRE above as a
> > couter-example) create an actual array access when dereferencing
> > a pointer that is constructed by taking the address of an array-ref.
> > That's why Martin added the warning to forwprop because that pass,
> > when forwarding such addresses, gets rid of the array-ref.
> Right.  IIRC there's some BZs around this issue that come up
> release-to-release related to how we've changed this code over the last
> few years.
> 
> 
> > 
> >>>> Or, if that's not it, what exactly is your concern with this
> >>>> enhancement?  If it's that it's implemented in forwprop, what
> >>>> would be a better place, e.g., earlier in the optimization
> >>>> phase?  If it's something something else, I'd appreciate it
> >>>> if you could explain what.
> >>>
> >>> For one implementing this in forwprop looks like a move in the
> >>> wrong direction.  I'd like to have separate warning passes or
> >>> at most amend warnings from optimization passes, not add new ones.
> >> I tend to agree.  That's one of the reasons why I pushed Aldy away from
> >> doing this kind of stuff within VRP.
> >>
> >> What I envision is a pass which does a dominator walk through the
> >> blocks.  It gathers context sensitive range information as it does the 
> >> walk.
> >>
> >> As we encounter array references, we try to check them against the
> >> current range information.  We could also try to warn about certain
> >> pointer computations, though we have to be more careful with those.
> >>
> >> Though I certainly still worry that the false positive cases which led
> >> Aldy, Andrew and myself to look at path sensitive ranges arent' resolved
> >> and will limit the utility of doing more array range checking.
> > 
> > I fear while this might be a little bit cleaner you'd still have to
> > do this very very early in the optimization pipeline (see all the
> > hard time we had with __builtin_object_size) and thus you won't catch
> > very many cases unless you start doing an IPA pass and handle propagating
> > through memory.  Which is when you arrived at a full-blown static
> > analyzer.
> Could be.  We won't know until we give it a whirl.  FWIW, we saw far
> more problems due to the lack of path sensitivity than anything.
> Nothing I'm suggesting in this thread addresses that problem.
> 
> Doing a really good job for warnings with path sensitivity shares a lot
> of properties with jump threading.  Specifically that you need to
> propagate range knowledge along a path, often past join points in the
> CFG (ie, you're propagating beyond the dominance frontier).

Right.

> Once you do a good job there, I'd strongly suspect that IPA issues would
> then dominate.

I suspect once you're dealing with C++ code you run into the issue
that even early inlining exposes code with forwprop run on it
before running forwprop again on the inlined-into body.

So the IPA issues start very early.  Of course if you are doing
path-sensitive processing then processing call/return "edges" as if
they were CFG edges shouldn't be too hard.

Then the only issue remaining is that there are very many
paths in a program compared to # edges or # blocks which means
you'll quickly run into compile-time issues.

Static analyzers are hard ;)  But I appreciate somebody finally
trying that route.  Ideally we'd do the static analysis in parallel
to the compilation given we'd need an "early" LTO phase before
early inlining.  Thus do a LTO out then in parallel do WPA
static analysis with diagnostics.

Richard.

Reply via email to