[gradle-dev] C++ concept: code generation

Jay Berkenbilt Tue, 18 Dec 2012 02:47:09 -0800

The idea of code generation should be pretty familiar to people who are
used to gradle or maven since it is a common thing to have to do in Java
builds.  Abuild supports code generation for C++ builds with support for
two important cases:


 * The code generator is built by an earlier part of the build

 * The code generator itself may not be available on all platforms or
   environment, but the generated code is portable

The first case of the code generator being built by an earlier part of
the build is important because it means that you can't have a naive
approach to the build like saying that all sources are compiled to
object code before any executables are built.  Hopefully this is pretty
obvious anyway.  In abuild, the build is divided into "build items"
which are roughly equivalent to projects in gradle if my understanding
is correct.  At the build item level, dependencies are explicitly
specified to other build items.  Within the build item, dependencies can
be specified at a more granular level.  (File-level dependencies like
the dependency of an object file on a header file are automatic work
fine across build item boundaries, but that's another topic.)  So with
abuild, you can have build item A depend on build item B and have one of
B's artifacts be a code generator.  B's build creates the code generator
(if needed -- maybe it's a script) and adds rules to the build
environment for using that generator.  When A is built, by virtue of the
fact that A depends on B, those rules are available.  Let's say A builds
some "y" files out of "x" files and B provides rules to generate "y"
files from "x" files.  All A has to do is depend on B and list the y
files in its list of targets.  Abuild would fully build B before
starting to build A (because of the A -> B dependency), so that when A
is ready to be built, the code generator is in place and abuild will
know automatically how to build y files from x files.

The second case of the code generator not being available all the time
is probably more common.  There are two common use cases of this: tools
that aren't as portable as the output they generate and tools that are
controlled by some sort of restrictive license.  For example, abuild
itself uses flex and bison to implement the lexer and parser for its
interface language.  (And, actually, it uses its own custom generator to
generate the flex code.)  Abuild can be built for Windows using the
Microsoft compilers and does not require flex and bison to be present on
the system as long as the output of flex and bison are present.  For
another example, an enterprise might use a commercially licensed tool to
create C++ and Java code from a language-independent specification (like
idl), and that code generator may be node-locked, may only run on one
platform, or may be limited in the number of concurrent uses, but the
code it generates may build everywhere that the system builds.  For
either of these scenarios, you want the following behavior:

 * The generated code must be controlled in the version control system
   and must be present in a clean checkout or source distribution

 * The build system must be able to tell using a mechanism that does not
   rely on the generators (which may not be available at build time) or
   on file modification times (which are generally not preserved by
   version control systems) whether or not the generated files are up to
   date

 * If the build determines the generated files to be up to date, they
   can be used as is.  If not, then if the generators are available,
   they can be used.  If they are not available, the build has to fail.

Abuild implements this functionality in a particular way which may or
may not be the way gradle should do it, but I'll describe abuild's
implementation.  Abuild provides a script called "codegen-wrapper".
codegen-wrapper takes options that specify the source directory, the
cache directory, the list of input files, the list of output files, a
flag indicating whether line endings are significant, and the command to
run.  In the cache directory, codegen-wrapper stores a copy of each
output file and a record of the MD5 checksum of every input file.  This
cache directory is controlled in version control and distributed as part
of the source distribution.  When codegen-wrapper is run, it checks the
checksums of every input file against its cached value.  This can be a
straight checksum, or optionally, it can be a checksum on the file with
line endings normalized (important since some version control systems
replace line endings based on platform).  If any file does not have a
cached checksum or has a cached checksum that doesn't match, then the
build is considered out of date.  If all files' checksums match, then
the cache is considered valid.  If the cache is valid, the cached output
files are copied into the output directory without preserving
modification time.  Otherwise, an attempt is made to run the code
generator.  If running the code generator fails, the build fails.  If it
succeeds, then codegen-wrapper creates or updates the checksums of the
input files and caches new copies of the output files.

The assumption is that builds that modify the cache are relatively
rare.  These builds modify controlled files, which is generally a bad
idea, but which is necessary in this case.  One might argue that the
build should fail after successfully updating controlled files, or that
this should at least be an option, in order for you to be able to
assert that the result of a full build followed by a full clean should
be the initial state.  This way, if you did a build that included
regeneration of the cached files, you would have an opportunity to check
in those changes and rebuild.

Abuild also allows you to run in a mode that forces regeneration of the
cached files or that leaves them alone but ignores them.  These modes
can be useful for testing or upgrading to new versions of the code
generators.  For example, maybe the code generator runs on both Linux
and Windows but, when run on Windows, the generated files only work on
Windows, and when run on Linux, the generated files work in both places.
I might want to test that I can build on Windows with files generated on
Windows but not replace my cached files.  In that case, I would tell
abuild to ignore and not touch the cache.  Or maybe a new version of
flex comes out and I want to force regeneration of my lexer even though
I haven't changed any of the sources.  In that case, I could tell abuild
to discard the old cache and regenerate everything.  At present, for
abuild, these are both manual operations, which is probably okay, though
having a rule that could automatically tell the build system whether or
not it should disregard/replace the cache could also be useful.  Such a
rule could, for example, notice that the version of flex on the
development system is newer than the version used to generate the cached
files.

Here's a fragment of gnu make code that generates C or C++ files from
flex input files using codegen-wrapper.  In this case, all the developer
has to do to enable codegen-wrapper is set the variable FLEX_CACHE to
the directory in which they want the cached files to be stored.

# --------
FLEX := flex
ifdef FLEX_CACHE
 define flex_to_c
        @$(PRINT) Generating $@ from $< with $(FLEX)
        $(RM) $@
        $(CODEGEN_WRAPPER) --cache $(FLEX_CACHE) \
            --input $< --output $@ --command \
            $(FLEX) -o$@ $<
 endef
else
 define flex_to_c
        @$(PRINT) Generating $@ from $< with $(FLEX)
        $(RM) $@
        $(FLEX) -o$@ $<
 endef
endif

%.fl.cc: %.fl
        $(flex_to_c)

%.fl.cpp: %.fl
        $(flex_to_c)

%.fl.c: %.l
        $(flex_to_c)
# --------

Abuild's documentation includes additional examples of using
codegen-wrapper for custom code generators.

--Jay

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

[gradle-dev] C++ concept: code generation

Reply via email to