Re: [gradle-dev] spiking incremental java compilation

Luke Daley Sat, 21 Dec 2013 05:50:15 -0800


On 20 Dec 2013, at 3:37, Adam Murdoch wrote:

Hi,
Just some thoughts on how we might spike a solution for incrementaljava compilation, to see if it’s worthwhile and what the effortmight be:
The goal is to improve the Java compile tasks, so that they do lesswork for certain kinds of changes. Here, ‘less work’ meanscompiling fewer source files, and also touching fewer output files sothat consumers of the task output can also do less work. It doesn’tmean compiling the *fewest* possible number of source files - justfewer than we do now.
The basic approach comes down to keeping track of dependencies betweensource files and the other compilation inputs - where inputs aresource files, the compile classpath, the compile settings, and so on.Then, when an input changes, we would recompile the source files thatdepend on that input. Currently, we assume that every source filedepends on every input, so that when an input changes we recompileeverything.
Note that we don’t necessarily need to track dependencies at afine-grained level. For example, we may track dependencies betweenpackages rather than classes, or we may continue to assume that everysource file depends on every class in the compile classpath.
A basic solution would look something like:

1. Determine which inputs have changed.
2. If the compile settings have changed, or if we don’t have anyhistory, then schedule every source file for compilation, and skip to#5.3. If a class in the compile classpath has changed, then schedule forcompilation every source file that depends on this class.4. If a source file has changed, then schedule for compilation everysource file that depends on the classes of the source file.5. For each source file scheduled for compilation, remove the previousoutput for that source file.
6. Invoke the compiler.
7. For each successfully compiled source file, extract the dependencyinformation for the classes in the source file and persist this fornext time.
For the above, “depends on” includes indirect dependencies.
Steps #1 and #2 are already covered by the incremental task API, atleast enough to spike this.
Step #3 isn’t quite as simple as it is described above:
- Firstly, we can ignore changes for a class with a given name, if aclass with the same name appears before it in the classpath (thisincludes the source files).- If a class is removed, this counts as a ‘change’, so that werecompile any source files that used to depend on this class.- If a class is added before some other class with the same name inthe classpath, then we recompile any source files that used to dependon the old class.- Dependencies can travel through other classes in the classpath, orsource files, or a combination of both (e.g. a source class depends ona classpath class depends on a source class depends on a classpathclass).
Step #4 is similar to step #3.
For a spike, it might be worth simply invalidating everything when thecompile classpath changes, and just deal with changes in the sourcefiles.
For step #7 we have three basic approaches for extracting thedependencies:
The first approach is to use asm to extract the dependencies from thebyte code after compilation. The upside is that this is very simple toimplement and very fast. We have an implementation already that we usein the tooling API (ClasspathInferer - but it’s mixed in with someother stuff). It also works for things that we only have the byte codefor.
The downside is that it’s lossy: the compiler inlines constants intothe byte code and discards source-only annotations. We also don’teasily know what type of dependency it is (is it an implementationdetail or is is visible in the API of the class?)
Both these downsides can be addressed: For example we might treat aclass with a constant field or a class for a source-only annotation asa dependency of every source file, so that when one of these thingschange, we would recompile everything. And to determine the type ofdependency, we just need to dig deeper into the byte code.
The second approach is to use the compiler API that we are alreadyusing to invoke the compiler to query the dependencies duringcompilation. The upside is that we get the full source dependencyinformation. The downsides are that we have to use a sun-specificextension of the compiler API to do this and it’s a very complicatedAPI, which means fiddly to get right.
The third approach is to parse and analyse the source separately fromcompilation.
I’d probably try out the first option, as it’s the simplest toimplement and probably the fastest at execution time.
There are some issues around making this efficient.
First, we need to make the persistence mechanism fast. For the spike,let’s assume we can do this. I would just keep the state in somestatic field somewhere and not bother with persistence.
Second, we need to make the calculation of affected source files fast.One option is to calculate this when something changes rather thaneach time we run the compilation task, so that we keep, basically, amap from input file to the closure of all source files affected bythat input file.


This is a direction we are no doubt going to go into anyway.

Third, we need to keep the dependency graph as small as we can. So, wemight play around with tracking dependencies between packages ratherthan classes.

Will be interesting to see how this works in the real world on nastycode bases where packages are monolithic and have lots of dependencies.

We should also ignore dependencies that are not visible to theconsumer, so that we don’t traverse the dependencies of methodbodies, or private elements.


What do you mean here?

Finally, we should ignore changes that are not visible to theconsumer, so that we ignore changes to method bodies, private elementsof a class, the annotations of classes, debug info and so on. This isrelatively easy for changes to the compile classpath. For changes tosource files, it’s a bit trickier, as we don’t know what’schanged until we compile the source file. We could, potentially,compile in two passes - first source files that have changed and thensecond source files that have not change but depend on those thathave. Something, potentially, to play with as part of a spike.

I'm pretty dubious about all of this. Looks to me like a difficult thingto pull off outside of the compiler. I'm sure we can get somethingworking, but whether it's reliable enough and fast enough is anotherquestion (hopefully answered by the spike). I also wonder whetherinvesting into more fine grained parallelism and coarser avoidance (e.g.ignoring non visible classpath changes) wouldn't be more fruitful andmore generally applicable.


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [gradle-dev] spiking incremental java compilation

Reply via email to