On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalc...@redhat.com>
> wrote:
> > > > The checker is implemented as a GCC plugin.
> > > > 
> > > > The patch kit adds support for "in-tree" plugins i.e. GCC
> > > > plugins
> > > > that
> > > > would live in the GCC source tree and be shipped as part of the
> > > > GCC
> > > > tarball,
> > > > with a new:
> > > >   --enable-plugins=[LIST OF PLUGIN NAMES]
> > > > configure option, analogous to --enable-languages (the
> > > > Makefile/configure
> > > > machinery for handling in-tree GCC plugins is adapted from how
> > > > we
> > > > support
> > > > frontends).
> > > 
> > > I like that.  Implementing this as a plugin surely must help to
> > > either
> > > document the GCC plugin interface as powerful/mature for such a
> > > task.  Or
> > > make it so, if it isn't yet.  ;-)
> > 
> > Our plugin "interface" as such is very broad.
> 
> Just to sneak in here I don't like exposing our current plugin "non-
> API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).

My aim here is to provide a way for distributors to be able to disable
its build - indeed, for now, for it to be disabled by default,
requiring opting-in.

My reasoning here is that the analyzer is middle-end code, but isn't as
mature as the rest of the middle-end (but I'm working on getting it
more mature).

I want some way to label the code as a "technology preview", that
people may want to experiment with, but to set expectations that this
is a lot of new code and there will be bugs - but to make it available
to make it easier for adventurous users to try it out.

I hope that makes sense.

I went down the "in-tree plugin" path by seeing the analogy with
frontends, but yes, it would probably be simpler to just build it into
GCC, guarded with a configure-time variable.  It's many thousand lines
of non-trivial C++ code, and associated selftests and DejaGnu tests.

Building with --enable-checking=release, and stripping the binaries and
the plugin, I see:

$ ls -al cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david 25921600 Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david 27473568 Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david   645256 Dec  3 11:22
plugin/analyzer_plugin.so

$ ls -alh cc1 cc1plus plugin/analyzer_plugin.so 
-rwxrwxr-x. 1 david david  25M Dec  3 11:22 cc1
-rwxrwxr-x. 1 david david  27M Dec  3 11:22 cc1plus
-rwxrwxr-x. 1 david david 631K Dec  3 11:22 plugin/analyzer_plugin.so

so the plugin is about 2.5% of the size of the existing compiler.

The analysis pass is very time-consuming when enabled via -fanalyzer. 
I'm aiming for "x2 compile-time in exchange for finding lots of bugs"
as a tradeoff that users will be happy to make (by supplying
-fanalyzer) - that's faster than comparable static analyzers I've been
playing with.

> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

We've gone back and forth about what a GCC plugin API should look like;
I'm not sure what the objectives are.

For example, are we hoping to offer some kind of ABI guarantee to
plugins so that we can patch GCC without plugins needing to be rebuilt?
If so, how strong is the ABI guarantee?  For example, do we directly
expose the tree code enums and the gimple code enums?

Or is it more ambitious, and hoping to be cross-compiler, in which case
are these enums themselves hidden?

This feels like opening a massive can of worms, and orthogonal to my
goal of giving GCC a static analysis framework.

> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.

FWIW the symbols consumed by the plugin can be seen at:
 https://dmalcolm.fedorapeople.org/gcc/2019-11-27/symbols-used.txt

This is the result of:
  eu-readelf -s plugin/analyzer_plugin.so |c++filt|grep UNDEF

Surveying that, the plugin:
- creates a pass
- views the callgraph and the functions (e.g. ipa_reverse_postorder)
- views CFGs and SSA representation (including statements)
- uses the diagnostic subsystem (which parts of the patch kit extend,
adding e.g. control flow paths), e.g. creating and subclassing
rich_locations, subclassing diagnostic_path and diagnostic_event
- calls into middle-end support functions like
useless_type_conversion_p
- uses GCC types such as bitmap, inchash, wideint
- creates temporary trees
- has selftests
...etc.

But there are inline uses of various functions that don't show up in
that list (e.g. the various gimple_* accessor functions - grepping the
source shows over a hundred uses of these, but they're all inlined and
so don't show up in the above view).

My gut feeling is that writing a plugin API and then rewriting the
analyzer to use it would be a huge amount of work: I'd strongly prefer
not to do so (and to use the existing API, either as a plugin, or
directly, dropping the plugin machinery from the analyzer).

Perhaps the best way forward is to build this directly into the
compiler, but guard it by a configure-time option?

Dave

Reply via email to