Re: [gradle-dev] C++ concept: code generation

Jay Berkenbilt Wed, 26 Dec 2012 01:57:16 -0800

On 12/19/2012 04:24 PM, Adam Murdoch wrote:
>
> On 18/12/2012, at 9:47 PM, Jay Berkenbilt wrote:
>
>>
>> The idea of code generation should be pretty familiar to people who are
>> used to gradle or maven since it is a common thing to have to do in Java
>> builds.  Abuild supports code generation for C++ builds with support for
>> two important cases:
>>
>> * The code generator is built by an earlier part of the build
>>
>> * The code generator itself may not be available on all platforms or
>>   environment, but the generated code is portable
>>
>> The first case of the code generator being built by an earlier part of
>> the build is important because it means that you can't have a naive
>> approach to the build like saying that all sources are compiled to
>> object code before any executables are built.  Hopefully this is pretty
>> obvious anyway.  In abuild, the build is divided into "build items"
>> which are roughly equivalent to projects in gradle if my understanding
>> is correct.  At the build item level, dependencies are explicitly
>> specified to other build items.  Within the build item, dependencies can
>> be specified at a more granular level.  (File-level dependencies like
>> the dependency of an object file on a header file are automatic work
>> fine across build item boundaries, but that's another topic.)  So with
>> abuild, you can have build item A depend on build item B and have one of
>> B's artifacts be a code generator.  B's build creates the code generator
>> (if needed -- maybe it's a script) and adds rules to the build
>> environment for using that generator.  When A is built, by virtue of the
>> fact that A depends on B, those rules are available.  Let's say A builds
>> some "y" files out of "x" files and B provides rules to generate "y"
>> files from "x" files.  All A has to do is depend on B and list the y
>> files in its list of targets.  Abuild would fully build B before
>> starting to build A (because of the A -> B dependency), so that when A
>> is ready to be built, the code generator is in place and abuild will
>> know automatically how to build y files from x files.
>
> Generally, Gradle deals with this in a similar way - you declare a
> dependency on the code generator and by the time the task that needs
> to use the generator is executed, the generator has been built or
> downloaded. In practice, however, it's not quite as simple as that:
>
> The approach doesn't work well if the code generator is packaged as a
> Gradle plugin or Gradle task, because the projects are configured and
> the task graph is assembled before any tasks are executed, and this
> means that all the plugin or task implementations must be available
> before any tasks are executed. Which means they cannot be built as
> part of the current build. You can use the buildSrc project, but then
> you can't publish the code generator for use outside the build, plus
> buildSrc has some other issues (IDE integration, etc).


Yeah, I ran into problems like this too.  Abuild has support for
plugins, which are basically nothing more than make rules or the
equivalent with abuild's groovy-based Java support, but plugins get
resolved too early in the build process for them to have their own
dependencies, which makes it hard (not impossible, but you have to jump
through hoops) to have plugins use anything that's automatically
generated.  My feeling is that strictly defined build phases will
probably always lead to this problem, which is basically what you are
saying below.

>
> We do plan to change the configuration phase of Gradle so that we can
> build the things that are required at configuration time on demand. A
> project would declare a configuration time dependency on a code
> generator produced in another project, and Gradle would take care of
> configuring the code generator project and building the code generator
> before configuring the consuming project. Right now, we're working on
> some experimental support for configuring projects on demand, so that
> projects are configured only as we discover that its outputs will be
> needed by the build, rather than configuring every project in every
> build. This should be available in Gradle 1.4. We can then extend this
> to also build the things that project configuration need on demand.

As I was suggesting, I suspect that ultimately this needs to be
"recursive" for lack of a better term...in other words, the part of the
build that is restrictive because it's bootstrapping the build has to be
kept to a minimum.  With abuild, dividing the build into discrete build
items such that each item gets completely built before anything that
depends on it even starts getting built (which seems similar to what
you're saying about on-demand configuration) helps a lot.  The only real
problem I had with abuild was for plugins that may alter the shape of
the dependency graph.  Those necessarily have to be handled before
abuild can form its full dependency graph, which causes some
complication.  This is tangled up in the way abuild handles platforms,
which will be the topic of one of my next two posts, but the
one-sentence summary is that abuild generates the dependency graph in
two phases: one platform-agnostic and the other platform-aware.

>
> Another awkwardness we have at the moment is that dependency
> resolution is not aware of platform, so that you can't declare a
> dependency on 'my-code-generator' and have Gradle just pick up the
> right binaries for the current platform. Instead, you have to declare
> a dependency on 'the windows 64 bit variant of my-code-generator'. For
> some code generators, of course, this doesn't matter, but for some it
> does not.

Yup.  I'll post about abuild's platform handling.  It's pretty good
though not without some loose ends.  It was built into abuild at the
core from the start, so it's a deep part of abuild's design.

>
> Similarly, dependency resolution is not aware of artefacts that need
> to be installed in some way after being downloaded from a repository -
> shared libraries that need to live in certain locations,
> platform-specific naming schemes for binaries, execute bits that need
> to be set, ZIP archives that need to be expanded, some tool that needs
> to be executed over the binaries before they can be used, and so on.
> You deal with this at the moment by adding an install task of some
> kind that takes care of installing the downloaded binaries and
> declaring a dependency on the install task, rather than the thing itself.

In abuild, I handle this by having specific build items that serve as
glue between external dependencies and things that depend on them.  It
works nicely for the build, but it's a big hassle for the developer, and
sort of breaks down a bit when you deal with multiple versions of
dependencies.

>
>>
>> The second case of the code generator not being available all the time
>> is probably more common.  There are two common use cases of this: tools
>> that aren't as portable as the output they generate and tools that are
>> controlled by some sort of restrictive license.  For example, abuild
>> itself uses flex and bison to implement the lexer and parser for its
>> interface language.  (And, actually, it uses its own custom generator to
>> generate the flex code.)  Abuild can be built for Windows using the
>> Microsoft compilers and does not require flex and bison to be present on
>> the system as long as the output of flex and bison are present.  For
>> another example, an enterprise might use a commercially licensed tool to
>> create C++ and Java code from a language-independent specification (like
>> idl), and that code generator may be node-locked, may only run on one
>> platform, or may be limited in the number of concurrent uses, but the
>> code it generates may build everywhere that the system builds.  For
>> either of these scenarios, you want the following behavior:
>>
>> * The generated code must be controlled in the version control system
>>   and must be present in a clean checkout or source distribution
>>
>> * The build system must be able to tell using a mechanism that does not
>>   rely on the generators (which may not be available at build time) or
>>   on file modification times (which are generally not preserved by
>>   version control systems) whether or not the generated files are up to
>>   date
>>
>> * If the build determines the generated files to be up to date, they
>>   can be used as is.  If not, then if the generators are available,
>>   they can be used.  If they are not available, the build has to fail.
>
> This is an interesting use case.
>
> I'd think about solving this by reusing the approach we want to take
> for build avoidance. The idea is that we want to be able to avoid
> building artifacts that have been built elsewhere and that are
> up-to-date wrt the local source, configuration and environment. When
> resolving a dependency on something in the local build, Gradle would
> first look for a compatible pre-built artefact is available remotely
> and use that instead of building it locally.

This is also important when you have platform-dependent things that
depend on platform-independent things.  The biggest system that uses
abuild has a mix of compilation for four different platforms as well as
a some Java code and some static code generation.  Lots of the
platform-independent stuff ends up just being built multiple times,
mostly because the build wasn't specified well enough for the "null
build" (building when everything is up to date) to be empty.  In other
words, the build was not idempotent.  Some things always seemed to be
out of date.  Abuild also has the ability to be run in a mode that
suppresses builds of all items in a certain platform or all independent
items, but I don't know anyone that ever used it that way in practice.

>
> The code generator problem could be solved in the same way, with a
> different fallback when the prebuilt artefacts cannot be found - in
> the build avoidance case, we just build the artefacts if they are not
> available. In the platform-specific code generator case, we fail the
> build.
>
> Later we can add more alternatives to 'fail the build' - we might go
> looking for a machine where the code generator can be used and run the
> code generation there instead of locally, syncing the inputs and
> outputs between the local and remote machine.
>
> We were planning on using the binary artefact repository as the
> initial source for pre-built artefacts, so that the CI build publishes
> snapshot builds to the repository along with meta-data about the
> inputs used to build the artefacts. Gradle would use this meta-data to
> locate for compatible artefacts to use. We might also use the daemon
> here too, so that Gradle can broadcast to nearby daemons asking them
> if they have recently built the artefacts with compatible inputs. I
> guess version control is another place we could go looking for
> compatible artefacts.

Abuild doesn't have any concept of artifact repository, but it needed
one.  It would have solved a number of big problems.  This concept was
not present in any build system I had encountered up to the time.

>
> This approach can also be reused for the case where we want to take
> binaries built for a number of different platforms and assemble them
> into a single multi-platform distribution of some kind (say, a jar
> that bundles a jni library for a bunch of platforms and extracts the
> appropriate one at runtime). The build describes how to build
> everything for every platform, but on a given platform, only a subset
> of things can be built. For the remaining things, we have to use
> compatible pre-built binaries.
>
> The meta-data that we need to capture to decide whether a pre-built
> artefact is compatible and up-to-date or not we can also use for other
> things, such as reproducible builds, and for improving dependency
> resolution to automatically choose a compatible variant of some
> dependency.

I'll describe how abuild dealt with this in another post.

--Jay


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [gradle-dev] C++ concept: code generation

Reply via email to