Hi, Joshua. First off, I really like that you submitted a /patch/ for a better
plugin API, including all the documentation and example changes that would have
otherwise been necessary to put into an e-mail. It shows that your proposal is
feasible and well-defined already, and gives us something to work with.
> Why would you need two copies of Basic/Lex/etc.? Clang is already built with
> all of this stuff, so the plugin would be able to use clang's version of
> everything here in normal codegen anyways, unless I'm misunderstanding your
> statement.
What I meant was that if we wanted to run some action instead of CodeGen, we
would need to use libTooling instead...but that means building a separate
executable with everything in it again. But I forgot about -fsyntax-only, and
plugins could require this in their clang_plugin_begin_file check.
(It might be nice if they could check this sooner, though...maybe they can get
the CompilerInvocation in clang_plugin_init, possibly even as non-const!
-fsyntax-only isn't the only interesting setting.)
On the flip side, it might be desirable for plugins to be able to produce
errors and a nonzero result code. Whether or not these plugin diagnostics can
halt CodeGen is something to consider as well.
Remaining comments inline (both on the spec/documentation and the
implementation).
On Jun 20, 2012, at 2:58 PM, Joshua Cranmer wrote:
> -<p>The main difference from writing normal FrontendActions is that you can
> -handle plugin command line options. The
> -PluginASTAction base class declares a ParseArgs method which you have to
> -implement in your plugin.
> -</p>
> +<p>The first function called when a plugin is loaded is clang_plugin_init.
> This
> +function returns a boolean indicating whether or not the plugin can run on
> the
> +code. As arguments, this function is given the arguments of the plugin as
> well
> +as the version of clang being run. An example looks like this:</p>
> <pre>
> - bool ParseArgs(const CompilerInstance &CI,
> - const std::vector<std::string>& args) {
> - for (unsigned i = 0, e = args.size(); i != e; ++i) {
> - if (args[i] == "-some-arg") {
> - // Handle the command line argument.
> + bool clang_plugin_init(int argc, plugin::PluginArg *argv,
> + plugin::Version *version) {
> + // Check the version of clang
> + if (version->major != CLANG_VERSION_MAJOR ||
> + version->minor != CLANG_VERSION_MINOR)
> + return false;
> +
> + for (int i = 0; i < argc; ++i) {
> + if (!strcmp(argv[i].name, "-some-arg")) {
> + // Handle the command line argument
> }
> }
> return true;
> }
> </pre>
>
Hooray for using a designated entry point instead of a static constructor! This
is something that was a problem on Windows, IIRC.
That said, I'd go even further with the version checks, and /require/ that the
major and minor versions match, at the very least. I don't think we can trust
objects to be layout-compatible or functions to have all the same arguments
even between SVN revisions or point fixes, let alone major/minor versions --
only libclang is stable enough for that. I implemented something like this for
static analyzer plugins by requiring plugins to export a symbol that just
contains the version string they were compiled against. You can see how it
works in CheckerRegistry.h and CheckerRegistration.cpp.
Also, you might want to mention here that these entry points are extern "C", in
case someone decides to ignore the advice to include Plugin.h.
> +<p>To run a plugin, you merely need to specify the plugin to be loaded using
> the
> +-fplugin command line options. Additional parameters for the plugins can be
> passed with -fplugin-arg-<plugin-name>-<arg>=<value> or
> +-fplugin-arg<plugin-name>-<arg>. These options work the same when both
> run
> +with the normal driver and the cc1 process.</p>
The second one is missing a dash between the literal 'arg' and the plugin name.
Also, personally I would switch the order of these two to put the flag-only
version first, since it looks simpler.
This is STILL rather unwieldy, but I guess the idea was that if you really want
to write your own command line, you're supposed to use libTooling? Or a wrapper
script like the one that appeared awhile back.
> diff --git a/examples/PrintFunctionNames/PrintFunctionNames.exports
> b/examples/PrintFunctionNames/PrintFunctionNames.exports
> --- a/examples/PrintFunctionNames/PrintFunctionNames.exports
> +++ b/examples/PrintFunctionNames/PrintFunctionNames.exports
> @@ -1,1 +1,1 @@
> -_ZN4llvm8Registry*
> +clang_plugin_*
:-)
> +#ifndef LLVM_CLANG_BASIC_PLUGIN_H
> +#define LLVM_CLANG_BASIC_PLUGIN_H
> +
> +#include "llvm/ADT/StringRef.h"
> +
> +namespace llvm {
> + class StringRef;
> +}
You already imported StringRef, so this is unnecessary. If you'd like to use
StringRef without the llvm prefix, import "clang/Basic/LLVM.h".
> +namespace clang {
> +
> + class ASTConsumer;
> + class CompilerInstance;
> + class DiagnosticConsumer;
> + class PPCallbacks;
We typically don't indent forward-declared classes, just like we don't indent
regular classes.
> +namespace plugin {
> +
> +/**
> + * An individual argument for a plugin.
> + *
> + * Plugin arguments are specified as -fplugin-arg-<plugin>-<name>=<value> or
> + * -fplugin-arg-<plugin>-<name> attributes. In the latter case, the value
> + * attribute would be an empty string.
> + */
> +struct PluginArg {
> + const char *name;
> + const char *value;
> +};
Any reason why 'const char *' and not StringRef?
> +// The following functions are not implemented within clang itself, but are
> +// prototypes of the various functions that may be implemented by plugins.
> +// Including it in a header file allows these hooks to be documented with
> +// doxygen, and the use of extern "C" on declarations allows people who
> forget
> +// to define these functions with extern "C" to still have them work.
> +extern "C" {
> +/**
> + * Callback to initialize the plugin.
> + *
> + * \note This is the only function whose ABI compatibility is guaranteed to
> + * remain stable. All plugins should verify that the version passed
> into
> + * this function is the same version that they were compiled with and
> + * abort if this is not the case. To detect the version compiled with,
> + * include clang/Basic/Version.h and check CLANG_VERSION_MAJOR and
> + * CLANG_VERSION_MINOR.
> + *
> + * \arg argc The number of arguments in the argv array
> + * \arg argv An array of the arguments passed to this plugin from the
> + * command line.
> + * \arg version The version of the compiler being called.
> + * \return Whether or not the plugin initialization succeeded. If
> initialization
> + * fails, the compilation process is terminated.
> + */
> +extern bool clang_plugin_init(int argc, clang::plugin::PluginArg *argv,
> + clang::plugin::Version *version);
Suggestion: replace argc/argv with ArrayRef.
Very nice doc comment. Some coding standards issues here, to match up with the
rest of LLVM:
- We prefer C++-style comments even for Doxygen blocks, even for extern "C"
blocks (unless the entire file is C-compatible): ///
- Anything that's a noun (types, variables, and parameters) should start with
an uppercase letter. The exceptions are types that look like standard library
types. Functions and methods should start with lowercase letters. This
convention is fairly new and so there are large chunks of the codebase that
don't look like this, but we're trying to be consistent going forwards.
- Parameter lists that are too long should line up under the open paren.
Most of these are already in http://llvm.org/docs/CodingStandards.html; we
should probably update the indentation bit as well.
I'm on the fence about whether or not the namespaces should be included. On the
one hand, we don't explicitly write "clang" /anywhere/ in our header files
(exaggerating but only slightly). On the other, this is going to be referenced
by plugin writers who might not know to write "using namespace clang". And
probably won't be "using namespace clang::plugin".
> +/**
> + * Callback that is called before a file is compiled.
> + *
> + * Before this function is called, the callbacks object will have all of its
> + * pointers set to null. All non-null callbacks will be deleted by the
> compiler
> + * internally, so one should always return a new pointer for every file.
> + *
> + * \arg fileName The name of the function being called.
> + * \arg CI The compiler instance, which contains information about how
> + * the file is being compiled and other utility classes.
> + * \arg callbacks The pointer to the callbacks that this plugin will define
> for
> + * this file.
> + */
> +extern void clang_plugin_begin_file(llvm::StringRef fileName,
> + const clang::CompilerInstance &CI,
> + clang::plugin::PluginFileCallbacks *callbacks);
Since the 'callbacks' argument is assured to be present, it should be a
reference. (This actually applies to all plugin entry points.)
I know I for one feel a little squeamish about putting reference types in
extern "C" functions, but there's nothing in the standard that prohibits it,
and you already have "const CompilerInstance &CI".
Finally, I would suggest changing "file" to "compilation" or "translation
unit", since this isn't called for includes and such.
> +/**
> + * Callback that is called after a file has been compiled.
> + *
> + * \arg fileName The name of the file that was just compiled.
> + */
> +extern void clang_plugin_end_file(llvm::StringRef fileName);
Does this need the file name? It should always be the same as the matching
clang_plugin_begin_file.
> diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/Options.td
> --- a/include/clang/Driver/Options.td
> +++ b/include/clang/Driver/Options.td
> @@ -586,16 +586,18 @@ def fpack_struct_EQ : Joined<"-fpack-str
> HelpText<"Specify the default maximum struct packing alignment">;
> def fpascal_strings : Flag<"-fpascal-strings">, Group<f_Group>,
> Flags<[CC1Option]>,
> HelpText<"Recognize and construct Pascal-style string literals">;
> def fpch_preprocess : Flag<"-fpch-preprocess">, Group<f_Group>;
> def fpic : Flag<"-fpic">, Group<f_Group>;
> def fno_pic : Flag<"-fno-pic">, Group<f_Group>;
> def fpie : Flag<"-fpie">, Group<f_Group>;
> def fno_pie : Flag<"-fno-pie">, Group<f_Group>;
> +def fplugin : Joined<"-fplugin=">, Group<f_Group>, Flags<[CC1Option]>;
> +def fplugin_arg : Joined<"-fplugin-arg-">, Group<f_Group>,
> Flags<[CC1Option]>;
Take a look at the old plugin and plugin_arg flags in CC1Options.td for
examples on how to make these look nicer. (In particular, fplugin should
probably be fplugin_EQ, and fplugin_arg can be more descriptive.) Also, you'll
have to decide whether we want to accept a split "-fplugin Plugin".
> --- a/include/clang/Frontend/FrontendOptions.h
> +++ b/include/clang/Frontend/FrontendOptions.h
> @@ -160,16 +160,23 @@ public:
> std::vector<std::string> AddPluginActions;
>
> /// Args to pass to the additional plugins
> std::vector<std::vector<std::string> > AddPluginArgs;
Presumably these are deprecated? And would go away eventually?
> /// The list of plugins to load.
> std::vector<std::string> Plugins;
But this probably has to stay, since it includes LLVM plugins.
> + /// The list of compiler plugins to load.
> + std::vector<std::string> CompilerPlugins;
> +
> + /// Argments for the above compiler plugins
> + std::vector<std::vector<std::pair<std::string, std::string> > >
> + CompilerPluginArgs;
Yuck. If these two vectors are supposed to be kept in parallel, can we just
make a struct like this?
struct Plugin {
std::string Name;
SmallVector<std::pair<std::string, std::string> >, 4> Args;
};
That may not be how the old way did it, but it's a lot nicer IMHO.
> --- a/lib/Frontend/CompilerInvocation.cpp
> +++ b/lib/Frontend/CompilerInvocation.cpp
> @@ -1476,16 +1476,56 @@ static InputKind ParseFrontendArgs(Front
> for (int i = 0, e = Opts.AddPluginActions.size(); i != e; ++i) {
> for (arg_iterator it = Args.filtered_begin(OPT_plugin_arg),
> end = Args.filtered_end(); it != end; ++it) {
> if ((*it)->getValue(Args, 0) == Opts.AddPluginActions[i])
> Opts.AddPluginArgs[i].push_back((*it)->getValue(Args, 1));
> }
> }
>
> + Opts.CompilerPlugins = Args.getAllArgValues(OPT_fplugin);
> + Opts.CompilerPluginArgs.resize(Opts.CompilerPlugins.size());
> +
> + // Parse the plugin arguments
> + std::vector<std::pair<std::string, std::string> > ParsedPluginArgs;
> + for (arg_iterator it = Args.filtered_begin(OPT_fplugin_arg),
> + end = Args.filtered_end(); it != end; ++it) {
> + std::string value = (*it)->getValue(Args);
This should probably use StringRef instead of std::string, so that we don't
copy the value just yet. Also, you can probably just say it->getValue; most of
our smart pointers will do the right thing here.
> + // Add the plugin arguments to a parsedPluginArgs
> + for (int i = 0, e = Opts.CompilerPlugins.size(); i != e; ++i) {
> + std::string base = llvm::sys::path::stem(Opts.CompilerPlugins[i]);
Ditto StringRef.
> + for (std::vector<std::pair<std::string, std::string> >::iterator
> + it = ParsedPluginArgs.begin(), end = ParsedPluginArgs.end();
> + it != end; ++it) {
> + if (it->first == base) {
> + std::string argpair = it->second;
Ditto StringRef. The only times we use std::string is when we actually need to
own the string.
> namespace {
> + // A class that holds data associated with a single plugin.
> + class Plugin {
No indents for namespaces...we run out of our 80 chars too fast. :-)
> + llvm::sys::DynamicLibrary PluginLibrary;
> + clang::plugin::PluginFileCallbacks callbacks;
Here you definitely should not include the 'clang' namespace.
plugin::PluginFileCallbacks is fine. I'd even be okay with "using
llvm::sys::DynamicLibrary".
> + public:
> + Plugin(llvm::sys::DynamicLibrary library) : PluginLibrary(library) {}
> + clang::plugin::PluginFileCallbacks *getCallbacks() { return &callbacks; }
Return by reference, please.
> + intptr_t getFunctionCall(const char *name) {
> + return (intptr_t)PluginLibrary.getAddressOfSymbol(name);
> + }
Why is this an intptr_t? That's not really any better than void * for casting
to and from function pointers. If we're on a platform where function pointers
are bigger than object pointers, getAddressOfSymbol is already going to break.
> +/// Helper function to call a given function in a plugin.
> +#define CALL_PLUGIN_FUNCTION(plugin, fname, args) \
> + do { \
> + Plugin::fname func_ =
> (Plugin::fname)(plugin)->getFunctionCall(#fname); \
> + if (func_) \
> + func_ args; \
> + } while (0)
Technically not a function. :-) More importantly, please show example usage;
this macro is not at all straightfoward.
Also, I would like if the macro took a reference rather than a pointer, but
that's just me. (That way it could in theory be called for plugin objects on
the stack...but that probably never happens, huh.)
> + // These typedefs must have the same signatures as the ones in
> + // clang/Basic/Plugin.h, including the same name (it makes the helper
> macro
> + // above work properly).
> + typedef void (*clang_plugin_begin_file)(StringRef, const
> CompilerInstance &,
> + clang::plugin::PluginFileCallbacks*);
> + typedef void (*clang_plugin_end_file)(StringRef);
> + typedef void (*clang_plugin_destroy)();
> + };
Why no init? Ah, the boolean flag.
> + // The wrapper action that manages all plugin invocations. This is what we
> use
> + // as the main FrontendAction for the CompilerInstance.
> + class PluginWrapperAction : public WrapperFrontendAction {
> + std::vector<Plugin> plugins;
> + std::string CurrentFile;
Again, I'm in favor of plugins each keeping CurrentFile around if they need it;
it's a StringRef that's safe to store, IIRC. The master copy of this
information is held in FrontendAction anyway.
> + public:
> + PluginWrapperAction(FrontendAction *WrappedAction,
> + std::vector<Plugin> &plugins)
ArrayRef, since it's free here.
> + for (std::vector<Plugin>::iterator it = plugins.begin(); it !=
> plugins.end();
> + ++it) {
> + // Retrieve the callbacks, and clear all of the values to NULL
> + clang::plugin::PluginFileCallbacks *callbacks = it->getCallbacks();
> + memset(callbacks, 0, sizeof(clang::plugin::PluginFileCallbacks));
This is silly. Plugin should just default-initialize the struct when it's
created. (Empty parens will do the right thing.)
> + // Get the callbacks from the plugin
> + CALL_PLUGIN_FUNCTION(it, clang_plugin_begin_file, (Filename, CI,
> callbacks));
> +
> + // ASTConsumer is handled by GetASTConsumer; others are handled now
> + if (callbacks->ppCallback)
> + CI.getPreprocessor().addPPCallbacks(callbacks->ppCallback);
> + if (callbacks->diagnostics)
> + {
> + // The front end has already run this, so let's do this now.
> + callbacks->diagnostics->BeginSourceFile(CI.getLangOpts(),
> + &CI.getPreprocessor());
> + CI.getDiagnostics().setClient(new ChainedDiagnosticConsumer(
> + &CI.getDiagnosticClient(), callbacks->diagnostics));
> + }
> + }
If we initialize plugins before calling the underlying action, is this
necessary?
> + return true;
No way for plugins to signal an error?
> +
> +ASTConsumer *PluginWrapperAction::CreateASTConsumer(CompilerInstance &CI,
> + StringRef InFile) {
> + std::vector<ASTConsumer*> Consumers(1,
> + WrapperFrontendAction::CreateASTConsumer(CI, InFile));
> + for (std::vector<Plugin>::iterator it = plugins.begin(); it !=
> plugins.end();
> + ++it) {
> + if (ASTConsumer *consumer = it->getCallbacks()->astConsumer) {
> + Consumers.push_back(consumer);
> + }
> + }
> + return new MultiplexConsumer(Consumers);
> +}
TinyPtrVector or SmallVector? Most of the time we will not have plugins, and
even when we do there are probably very few.
(This is not a performance-sensitive piece of code, I know, but we might as
well get it right.)
> + // Construct the arguments
> + std::vector<std::pair<std::string, std::string> > &pluginArgs =
> + Clang->getFrontendOpts().CompilerPluginArgs[i];
ArrayRef...?
> + plugin::PluginArg *args = new plugin::PluginArg[pluginArgs.size()];
SmallVector<PluginArg, 4> makes sense here -- it's a vector that has a certain
number of slots allocated on the stack, but switches to acting like std::vector
if it needs to grow. Also, no forgetting to delete[] if this gets refactored
some day.
And finally, if you do decide to allow plugins to mess with the command line
options, this will have to happen sooner. :-)
Really I think this is a great refactoring. I do have my concerns about the
amount of control plugins can leverage over the compilation, but maybe that
will turn out to be immaterial. Thanks for working on this.
Jordan
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits