On Wed, 2019-11-20 at 11:18 +0100, Richard Biener wrote: > On Tue, Nov 19, 2019 at 11:02 PM David Malcolm <dmalc...@redhat.com> > wrote: > > > > The checker is implemented as a GCC plugin. > > > > > > > > The patch kit adds support for "in-tree" plugins i.e. GCC > > > > plugins > > > > that > > > > would live in the GCC source tree and be shipped as part of the > > > > GCC > > > > tarball, > > > > with a new: > > > > --enable-plugins=[LIST OF PLUGIN NAMES] > > > > configure option, analogous to --enable-languages (the > > > > Makefile/configure > > > > machinery for handling in-tree GCC plugins is adapted from how > > > > we > > > > support > > > > frontends). > > > > > > I like that. Implementing this as a plugin surely must help to > > > either > > > document the GCC plugin interface as powerful/mature for such a > > > task. Or > > > make it so, if it isn't yet. ;-) > > > > Our plugin "interface" as such is very broad. > > Just to sneak in here I don't like exposing our current plugin "non- > API" > more. In fact I'd just build the analyzer into GCC with maybe an > option to disable its build (in case it is very fat?).
My aim here is to provide a way for distributors to be able to disable its build - indeed, for now, for it to be disabled by default, requiring opting-in. My reasoning here is that the analyzer is middle-end code, but isn't as mature as the rest of the middle-end (but I'm working on getting it more mature). I want some way to label the code as a "technology preview", that people may want to experiment with, but to set expectations that this is a lot of new code and there will be bugs - but to make it available to make it easier for adventurous users to try it out. I hope that makes sense. I went down the "in-tree plugin" path by seeing the analogy with frontends, but yes, it would probably be simpler to just build it into GCC, guarded with a configure-time variable. It's many thousand lines of non-trivial C++ code, and associated selftests and DejaGnu tests. Building with --enable-checking=release, and stripping the binaries and the plugin, I see: $ ls -al cc1 cc1plus plugin/analyzer_plugin.so -rwxrwxr-x. 1 david david 25921600 Dec 3 11:22 cc1 -rwxrwxr-x. 1 david david 27473568 Dec 3 11:22 cc1plus -rwxrwxr-x. 1 david david 645256 Dec 3 11:22 plugin/analyzer_plugin.so $ ls -alh cc1 cc1plus plugin/analyzer_plugin.so -rwxrwxr-x. 1 david david 25M Dec 3 11:22 cc1 -rwxrwxr-x. 1 david david 27M Dec 3 11:22 cc1plus -rwxrwxr-x. 1 david david 631K Dec 3 11:22 plugin/analyzer_plugin.so so the plugin is about 2.5% of the size of the existing compiler. The analysis pass is very time-consuming when enabled via -fanalyzer. I'm aiming for "x2 compile-time in exchange for finding lots of bugs" as a tradeoff that users will be happy to make (by supplying -fanalyzer) - that's faster than comparable static analyzers I've been playing with. > From what I read it seems the analyzer could do with a proper > plugin API that just exposes introspection - and I really hope > somebody finds the time to complete (or rewrite...) the > proposed introspection API that ideally is even cross-compiler > (proven by implementing said API ontop of both GCC and clang/llvm). > That way the Analyzer would work with both GCC and clang [and golang > and rustc...]. We've gone back and forth about what a GCC plugin API should look like; I'm not sure what the objectives are. For example, are we hoping to offer some kind of ABI guarantee to plugins so that we can patch GCC without plugins needing to be rebuilt? If so, how strong is the ABI guarantee? For example, do we directly expose the tree code enums and the gimple code enums? Or is it more ambitious, and hoping to be cross-compiler, in which case are these enums themselves hidden? This feels like opening a massive can of worms, and orthogonal to my goal of giving GCC a static analysis framework. > So it would be interesting if you could try to sketch the kind of API > the Analyzer needs? That is, merely the detail on which it inspects > statements, the CFG and the callgraph. FWIW the symbols consumed by the plugin can be seen at: https://dmalcolm.fedorapeople.org/gcc/2019-11-27/symbols-used.txt This is the result of: eu-readelf -s plugin/analyzer_plugin.so |c++filt|grep UNDEF Surveying that, the plugin: - creates a pass - views the callgraph and the functions (e.g. ipa_reverse_postorder) - views CFGs and SSA representation (including statements) - uses the diagnostic subsystem (which parts of the patch kit extend, adding e.g. control flow paths), e.g. creating and subclassing rich_locations, subclassing diagnostic_path and diagnostic_event - calls into middle-end support functions like useless_type_conversion_p - uses GCC types such as bitmap, inchash, wideint - creates temporary trees - has selftests ...etc. But there are inline uses of various functions that don't show up in that list (e.g. the various gimple_* accessor functions - grepping the source shows over a hundred uses of these, but they're all inlined and so don't show up in the above view). My gut feeling is that writing a plugin API and then rewriting the analyzer to use it would be a huge amount of work: I'd strongly prefer not to do so (and to use the existing API, either as a plugin, or directly, dropping the plugin machinery from the analyzer). Perhaps the best way forward is to build this directly into the compiler, but guard it by a configure-time option? Dave