Re: [gradle-dev] source sets and non-jvm languages

Adam Murdoch Tue, 22 May 2012 18:31:29 -0700

On 21/05/2012, at 11:08 PM, Luke Daley wrote:

> 
> On 21/05/2012, at 7:24 AM, Adam Murdoch wrote:
> 
>> Hi,
>> 
>> Now that we're looking at adding some experimental javascript support, we 
>> now have 2 non-JVM languages we need to model. The current SourceSet model 
>> does not really work for these new languages (and there are problems with 
>> the JVM languages, too).
>> 
>> Some issues with SourceSet:
>> 
>> * Has a runtimeClasspath. This doesn't make sense when we're not targeting 
>> the JVM. C++ and javascript source do have runtime dependencies, but these 
>> dependencies do not end up as a classpath. For both C++ and javascript, 
>> we're also interested in building different variants, where each variant can 
>> potentially have a different set of resolved dependencies. Supporting 
>> variants also makes a lot of sense in the JVM world too (groovy-1.7 vs 
>> groovy-1.8, for example).
>> 
>> * SourceSetOutput effectively represents a set of JVM byte code. This is the 
>> same as the issue above. Modelling the compiled source as byte code doesn't 
>> make sense when we're not targeting the JVM. Also, each variant of the same 
>> source will generally end up with different compiled output.
>> 
>> * Has a compileClasspath. As above. Also assumes that we're actually 
>> compiling something. And that all languages share the same compile classpath.
>> 
>> * Has a (possibly empty) set of Java source to be compiled and included in 
>> the runtime classpath. This doesn't make any sense if there's no Java source 
>>  in the project.
>> 
>> * Has a set of resources to be included in the runtime classpath. This 
>> doesn't make any sense if we're not targeting the JVM.
>> 
>> There are also some language specific issues:
>> 
>> * Java should have a source language level, and an annotation-processor 
>> classpath.
>> 
>> * Groovy should have a source language level, and separate compile, 
>> language-runtime, compile-implementation, and transformer class paths. Scala 
>> should have something similar.
>> 
>> * The ANTLR plugin assumes we're generating a parser to run on the JVM. The 
>> tooling may run on the JVM, but the generated source may not.
>> 
>> * For each language, we should distinguish between generated and 
>> non-generated source.
>> 
>> I think I'd like to turn the current 
>> SourceSet/SourceSetOutput/GroovySourceSet/ScalaSourceSet/CppSourceSet into 
>> something like this:
>> 
>> * Interfaces that represent language-specific set of source, and specifies 
>> output-independent meta-data about the source: things like source 
>> directories and include/exclude patterns, compile and runtime dependencies, 
>> language level, and so on. So, we'd have a JavaSourceSet, GroovySourceSet, 
>> ScalaSourceSet, CppSourceSet, JavaScriptSourceSet and so on.
>> 
>> * An interface that represents a composite set of source files. This would 
>> be used to group language-specific sets to describe their purpose. This type 
>> would be used for 'sourceSets.main' and 'sourceSets.test'.
>> 
>> * Interfaces that represent runtime-specific set of executable files. These 
>> would be used to represent the output of the source sets, one per variant 
>> that we can build. For JVM languages, we'd use something that represents a 
>> tree of class and resource files. For native languages, we'd use something 
>> that represents a set of object files. For javascript, we'd use a 
>> JavaSourceSet.
>> 
>> * All of the above would extend Buildable. This better models generated 
>> source (but doesn't quite solve the problem on its own), allows a separate 
>> processing pipeline to be assembled to build the output for each variant, 
>> and allows us to handle executable files that we don't build, but need to 
>> bundle or publish.
>> 
>> * There would be some way to navigate between the outputs of a source set 
>> and the source set itself. Not sure exactly how that should look. Each 
>> language source set ends up built into one or more output. Each runtime 
>> output is built from one or more language source sets. Maybe the association 
>> is only by name.
> 
> I'm just repeating you here, but trying to distill this down…
> 
> A SourceSet as we know it now couples source with one transform operation. We 
> need to bust them apart and also make it one-to-many. That is, we need to 
> isolate the concept of a “set of source” and “the operations that transform 
> it”. The “set of source” bit should easy to generalise across languages. The 
> model of a transform operation will likely have language/runtime specific 
> characteristics (e.g. Java has a compile classpath, JavaScript does not).


Exactly right.

> 
> It's not clear to me what the concepts of “main” and “test” become. Is the 
> “main” source set the “main” Java and cpp code? Or does it only make sense to 
> talk about the “main java”? I think we are probably going to need both. At 
> the wiring level, I need the main “java” but it's feasible we will want to 
> ask questions of all the “main” source. 

"main" becomes a set that contains all the production source for the project, 
and "test" becomes a set that contains all the test source. There's useful 
stuff we can do with this information: build a production source zip, configure 
the IDE to point to all the production and test source directories, run 
inspections on the production source, etc. One question is whether it is 
worthwhile keeping an explicit grouping, or whether we just infer it.

In other words, do we have this structure:

sourceSets {
    main {
        cpp { … }
        java { …. }
    }
}

or this structure:

java {
    sourceSets {
         main  { …. }
    }
}

groovy {
    sourceSets { 
        main { … }
    }
}

cpp {
    sourceSets { 
       …
   }
}

The first option allows you to easily do common stuff with, say, production 
code or test code. The second option allows you to easily do common stuff for a 
given language, which is probably more useful. I guess there's no reason why we 
couldn't provide both views over the source.


> 
> An implied challenge in this is creating a more powerful model that scales, 
> but stays simple and understandable for the “compile my java code and run 
> tests” case.

Indeed. This probably doesn't meet that challenge yet (but 'has many concepts' 
is not quite the same as 'is not understandable'):

My thought is that we want to model the transformed output as concrete things, 
rather than abstract 'transformed source' things. That is, we should model 
things like jvm libraries, javascript libraries, reports, web applications, 
native binaries, and so on, each with type specific meta-data and configuration.

Each of these things should have a name, and a type. They would be Buildable, 
which gives us a place to define the processing pipeline to build the thing. 
You should be able to navigate to the inputs of each thing. I'd say a source 
set would be a kind of this thing, too.

This way, we have a graph of buildable things, and we know the inputs and 
outputs of each thing in the graph, plus the associated tasks to execute to 
transform the inputs things into the output thing. Sounds kinda familiar :)

This gives us a bunch of potential goodness:
* We introduce some concrete models for the things we actually build. This in 
turn leads to lots of goodness.
* We can short-circuit the execution and configuration of the tasks that build 
a thing whose inputs and outputs are up-to-date.
* We can log progress in terms of these things, rather than at the task level 
(eg. just log 'myBinary UP-TO-DATE' instead of each of the tasks that build 
myBinary).
* We can define the inputs and output of a project in terms of these things. 
Which means we can short-circuit the configuration of a project whose outgoing 
things are up-to-date wrt their input things. Or we can substitute in things 
pre-built elsewhere, and short-circuit configuring the target project.
* Incoming dependencies can be expressed in terms of these concrete things, 
instead of abstract 'modules' and 'configurations'. This, I think, helps with 
understandability.
* Works nicely with the 'keep this thing continuously up-to-date' feature we 
were discussing a few days ago.

So, the Gradle model becomes a graph of nodes that represent the things you 
want to build or use, with edges representing dependencies on other things. 
Things can either be built locally, or can come from somewhere else.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Re: [gradle-dev] source sets and non-jvm languages

Reply via email to