Re: [gradle-dev] C++ concept: code generation

Adam Murdoch Wed, 19 Dec 2012 13:24:28 -0800

On 18/12/2012, at 9:47 PM, Jay Berkenbilt wrote:

> 
> The idea of code generation should be pretty familiar to people who are
> used to gradle or maven since it is a common thing to have to do in Java
> builds.  Abuild supports code generation for C++ builds with support for
> two important cases:
> 
> * The code generator is built by an earlier part of the build
> 
> * The code generator itself may not be available on all platforms or
>   environment, but the generated code is portable
> 
> The first case of the code generator being built by an earlier part of
> the build is important because it means that you can't have a naive
> approach to the build like saying that all sources are compiled to
> object code before any executables are built.  Hopefully this is pretty
> obvious anyway.  In abuild, the build is divided into "build items"
> which are roughly equivalent to projects in gradle if my understanding
> is correct.  At the build item level, dependencies are explicitly
> specified to other build items.  Within the build item, dependencies can
> be specified at a more granular level.  (File-level dependencies like
> the dependency of an object file on a header file are automatic work
> fine across build item boundaries, but that's another topic.)  So with
> abuild, you can have build item A depend on build item B and have one of
> B's artifacts be a code generator.  B's build creates the code generator
> (if needed -- maybe it's a script) and adds rules to the build
> environment for using that generator.  When A is built, by virtue of the
> fact that A depends on B, those rules are available.  Let's say A builds
> some "y" files out of "x" files and B provides rules to generate "y"
> files from "x" files.  All A has to do is depend on B and list the y
> files in its list of targets.  Abuild would fully build B before
> starting to build A (because of the A -> B dependency), so that when A
> is ready to be built, the code generator is in place and abuild will
> know automatically how to build y files from x files.


Generally, Gradle deals with this in a similar way - you declare a dependency 
on the code generator and by the time the task that needs to use the generator 
is executed, the generator has been built or downloaded. In practice, however, 
it's not quite as simple as that:

The approach doesn't work well if the code generator is packaged as a Gradle 
plugin or Gradle task, because the projects are configured and the task graph 
is assembled before any tasks are executed, and this means that all the plugin 
or task implementations must be available before any tasks are executed. Which 
means they cannot be built as part of the current build. You can use the 
buildSrc project, but then you can't publish the code generator for use outside 
the build, plus buildSrc has some other issues (IDE integration, etc).

We do plan to change the configuration phase of Gradle so that we can build the 
things that are required at configuration time on demand. A project would 
declare a configuration time dependency on a code generator produced in another 
project, and Gradle would take care of configuring the code generator project 
and building the code generator before configuring the consuming project. Right 
now, we're working on some experimental support for configuring projects on 
demand, so that projects are configured only as we discover that its outputs 
will be needed by the build, rather than configuring every project in every 
build. This should be available in Gradle 1.4. We can then extend this to also 
build the things that project configuration need on demand.

Another awkwardness we have at the moment is that dependency resolution is not 
aware of platform, so that you can't declare a dependency on 
'my-code-generator' and have Gradle just pick up the right binaries for the 
current platform. Instead, you have to declare a dependency on 'the windows 64 
bit variant of my-code-generator'. For some code generators, of course, this 
doesn't matter, but for some it does not.

Similarly, dependency resolution is not aware of artefacts that need to be 
installed in some way after being downloaded from a repository - shared 
libraries that need to live in certain locations, platform-specific naming 
schemes for binaries, execute bits that need to be set, ZIP archives that need 
to be expanded, some tool that needs to be executed over the binaries before 
they can be used, and so on. You deal with this at the moment by adding an 
install task of some kind that takes care of installing the downloaded binaries 
and declaring a dependency on the install task, rather than the thing itself.


> 
> The second case of the code generator not being available all the time
> is probably more common.  There are two common use cases of this: tools
> that aren't as portable as the output they generate and tools that are
> controlled by some sort of restrictive license.  For example, abuild
> itself uses flex and bison to implement the lexer and parser for its
> interface language.  (And, actually, it uses its own custom generator to
> generate the flex code.)  Abuild can be built for Windows using the
> Microsoft compilers and does not require flex and bison to be present on
> the system as long as the output of flex and bison are present.  For
> another example, an enterprise might use a commercially licensed tool to
> create C++ and Java code from a language-independent specification (like
> idl), and that code generator may be node-locked, may only run on one
> platform, or may be limited in the number of concurrent uses, but the
> code it generates may build everywhere that the system builds.  For
> either of these scenarios, you want the following behavior:
> 
> * The generated code must be controlled in the version control system
>   and must be present in a clean checkout or source distribution
> 
> * The build system must be able to tell using a mechanism that does not
>   rely on the generators (which may not be available at build time) or
>   on file modification times (which are generally not preserved by
>   version control systems) whether or not the generated files are up to
>   date
> 
> * If the build determines the generated files to be up to date, they
>   can be used as is.  If not, then if the generators are available,
>   they can be used.  If they are not available, the build has to fail.

This is an interesting use case.

I'd think about solving this by reusing the approach we want to take for build 
avoidance. The idea is that we want to be able to avoid building artifacts that 
have been built elsewhere and that are up-to-date wrt the local source, 
configuration and environment. When resolving a dependency on something in the 
local build, Gradle would first look for a compatible pre-built artefact is 
available remotely and use that instead of building it locally.

The code generator problem could be solved in the same way, with a different 
fallback when the prebuilt artefacts cannot be found - in the build avoidance 
case, we just build the artefacts if they are not available. In the 
platform-specific code generator case, we fail the build.

Later we can add more alternatives to 'fail the build' - we might go looking 
for a machine where the code generator can be used and run the code generation 
there instead of locally, syncing the inputs and outputs between the local and 
remote machine.

We were planning on using the binary artefact repository as the initial source 
for pre-built artefacts, so that the CI build publishes snapshot builds to the 
repository along with meta-data about the inputs used to build the artefacts. 
Gradle would use this meta-data to locate for compatible artefacts to use. We 
might also use the daemon here too, so that Gradle can broadcast to nearby 
daemons asking them if they have recently built the artefacts with compatible 
inputs. I guess version control is another place we could go looking for 
compatible artefacts.

This approach can also be reused for the case where we want to take binaries 
built for a number of different platforms and assemble them into a single 
multi-platform distribution of some kind (say, a jar that bundles a jni library 
for a bunch of platforms and extracts the appropriate one at runtime). The 
build describes how to build everything for every platform, but on a given 
platform, only a subset of things can be built. For the remaining things, we 
have to use compatible pre-built binaries.

The meta-data that we need to capture to decide whether a pre-built artefact is 
compatible and up-to-date or not we can also use for other things, such as 
reproducible builds, and for improving dependency resolution to automatically 
choose a compatible variant of some dependency.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Re: [gradle-dev] C++ concept: code generation

Reply via email to