RFC: "firehose" : an interchange format (and code) for static code analysis results

David Malcolm Wed, 30 Jan 2013 15:07:59 -0800

[I hope this is sufficiently on-topic for the gcc ML: it is likely to be
of interest to people using gcc plugins to do static analysis, and also
touches on gcc output; my apologies if it isn't]


I've been working on running various static analysis tools on a large
subset of the packages in Fedora, aiming to build a tracking tool so
that I can ask questions like "what new warnings were caused due to this
commit?" etc.

Doing this sanely means trying to coerce the results from different
static analyzers into a consistent data format.

One of the "analysis tools" is GCC itself, in that I'm capturing GCC
warnings as one of my sources of data.

Each tool seemed to have its own output format, and none seemed to have
all the data that I wanted, so I went ahead and created a new format,
which I'm calling "firehose" (since reading reports from some code
analysis tools can feel like "drinking from a firehose").

It can be seen at:
https://github.com/fedora-static-analysis/firehose
(Free Software, GPLv3 or later)

You can see some examples here:
https://github.com/fedora-static-analysis/firehose/tree/master/examples

It's XML so that it can be easily validated: there's a RELAX-NG schema
here:
https://github.com/fedora-static-analysis/firehose/blob/master/firehose.rng
which documents things somewhat.

Essentially a code issue is a message, with additional attributes (such
as file/line/column, optional CWE identifier, etc), and optionally a
trace of execution to reach the error (so that an analysis tool can
identify e.g. that a memory leak happens on a particular error-handling
path after once through a loop, or whatever, potentially including a
view of pertinent variables/expressions in the code).

The format's not set in stone yet (hence this email) - anything I've
missed?  (e.g. I know the <sut> software-under-test metadata needs some
work).

I have parsers for importing into the format:
 * for GCC warnings (textual parsing of stderr, assuming LANG=C)
 * for clang-analyzer (its --plist output format)
 * for cppcheck (its XMLv2 output format).
There's also a Python API and extensive unit tests (though the API is
not set in stone yet).

I also have a branch of my gcc-python-plugin in which the "cpychecker"
example uses the firehose API as its internal representation of errors,
using that to emit actual gcc warnings and generate HTML trace reports
(I plan to move the error trace visualization code from out of my
gcc-python-plugin and into the firehose thing, so that other projects
can use it).

Hope this of interest - would any other plugin authors be interested in
using any of the above?

Dave

FWIW you can see some notes on what I'm building with this at
http://lists.fedoraproject.org/pipermail/devel/2012-December/175232.html
http://lists.fedoraproject.org/pipermail/devel/2013-January/176872.html
http://lists.fedoraproject.org/pipermail/devel/2013-January/177633.html

RFC: "firehose" : an interchange format (and code) for static code analysis results

Reply via email to