On 10 April 2013 15:33, Adam Murdoch <[email protected]> wrote:
> > On 10/04/2013, at 8:53 AM, Daz DeBoer <[email protected]> > wrote: > > On 9 April 2013 05:25, Luke Daley <[email protected]> wrote: > > > On 09/04/2013, at 11:55 AM, Luke Daley <[email protected]> wrote: > > > On 27/03/2013, at 3:43 PM, Daz DeBoer <[email protected]> > wrote: > > G'day > > Now in master is a pretty cool new feature: you can now implement an > 'incremental' task that is informed about exactly which input files > have changed when the task is out of date. > This is very useful for something like a C++ compile task, as it means > that only the changed files need to be recompiled, rather than the > entire set of inputs. > > I've got a 'draft' DSL functioning, and would appreciate any feedback > you guys have. Here's a sample: > > class IncrementalSync extends DefaultTask { > @InputFiles > def FileCollection src > > @OutputDirectory > def File destination > > @TaskAction > void execute(TaskInputChanges inputs) { > if (inputs.allOutOfDate) { > FileUtils.forceDelete(destination) > } > > inputs.outOfDate({ > FileUtils.copyFile(change.file, targetFile(change.file)) > } as Action) > .removed({ > FileUtils.forceDelete(targetFile(change.file)) > } as Action) > .process() > } > > def targetFile(def inputFile) { > new File(destination, change.file.name) > } > } > > > Thinking more about this, I don't think what we have is right. I think we > can come up with a simpler API. > > interface TaskExecutionContext { > > boolean isHasPriorOutputs(); // this is an incremental execution > > void eachInputFileChange(Action<InputFileChange> action); > > interface InputFileChange { > boolean isAdded(); > boolean isModified(); > boolean isRemoved(); > File getFile(); > } > > } > > @TaskAction > void execute(TaskExecutionContext executionContext) { > if (executionContext.hasPriorOutputs) { > executionContext.eachInputFileChange { > > } > } else { > // full rebuild > } > } > > Some issues I have with the existing design: > > * I'm not sure "outOfDate" is the right term here. It doesn't quite > capture it and can imply more than what we actually mean. You could say > that an output is out of date, but it doesn't quite work for inputs. > * why do we need the outOfDate() and removed() methods? After reading the > Javadoc I'm not sure how I would use this API. What's the difference > between out-of-date and removed? I could guess, but it's not clear. > * I think we'll want to provide information that is not strictly an aspect > of the inputs or outputs, but of something emergent from the input/output > concept. That's why I think modelling an encompassing type is better. > > > Sorry, accidentally hit send… > > What I think we are trying to do is make it easy for the user to decide > whether they should do a fine grained incremental execution or a coarse > grained full. > > > The existing isAllOutOfDate() method goes too far, based on its Javadoc: > > /** > * Specifies if incremental build is not possible due to changed Input > Properties, Output Files, etc. > * In this case, every file will be considered to be 'out-of-date'. > */ > boolean isAllOutOfDate(); > > It is too presumptuous of us to say that all outputs are invalid just > because an input property changed. The task author may know better and can > only rebuild what is effected by the input property change. > > We can tell if a full build is needed with relative confidence based on > the lack of task history, or by verifying that the declared outputs don't > exist (if that's fast enough), hence the isHasPriorOutputs() method. It > would be nice to be able to tell the task that all of the input files have > changed in some way, but this seems like it would be very expensive to > compute. > > -- > Luke Daley > Principal Engineer, Gradleware > http://gradleware.com > > Join me at the Gradle Summit 2013, June 13th and 14th in Santa Clara, CA: > http://www.gradlesummit.com > > > For our usual "up-to-date" check, these are the things that can cause > a task to be out-of-date (in order of checking): > 1. Task has no declared outputs (always out-of-date with no declared > outputs) > 2. Task.upToDateWhen() returns false > 3. Task has no record of previous execution > 4. Task class has changed since previous execution > 5. Task input property has changed > 6. Task output files have changed > 7. Task input files have changed > > So the interesting thing is how to deal with all of these cases with > the incremental API: > > (We are currently not honouring 1. for incremental tasks: this rule is > mostly to make lifecycle tasks (and similar) fast, by not > persisting/checking state. By making your task 'incremental' you're > also telling Gradle not to make this check.) > > The initial approach was that the InputFileChanges parameter on > TaskAction was an indication that the task could handle changes to > input files: anything other sort of state change could not be handled > by the task. The "out-of-date" files would be the set of files that > you needed to re-evaluate, based on _all_ changes, including those > that you cannot handle directly (like, say, input property changes). > We would later add other parameter types to declare that the task > could handle other incremental changes: properties, output files, etc. > > Having spent more time on this, I'm not really liking this approach so > much: I think that a simple "tell me the changed inputs" method is > more useful and less confusing than a clever "tell me what I should > re-evaluate" method (this is what Luke is suggesting, too). > > We then need a way to tell an incremental task that other > (non-input-file) changes have occurred since the prior execution, > leaving the task implementation to decide what to do in this case. > This API needs to be able to adapt to the future when a task can > handle input-file and input-property changes, but still needs to know > if task.upToDate() is false or outputs have changed since previous > execution. Right now, we need to be able to ask "has anything other > than 'input-files' changed since the last execution"? > > Perhaps something like: > > IncrementalExecutionContext { > boolean isIncremental(); // false for cases 2,3,4 above > > // We can pretty easily implement these right now > boolean isHasInputPropertyChanges() > boolean isHasInputFileChanges() > boolean isHasOutputFileChanges() > > void eachInputFileChange(Action<InputFileChange> action) > > // And to add to the API in the future > void eachInputPropertyChange(Action<InputPropertyChange> action) > void eachOutputFileChange(Action<OutputFileChange> action) > } > > > How does this actually make it easier to implement an incremental task, > given that most tasks don't care about output file changes and > approximately none care about property changes? > It seems like it's more understandable and obvious, and not that much less easy to use. People already know how to implement a non-incremental task, and it might be easier for them to choose to do it the "old way" when the task is not incremental, or there are changes that the task cannot handle. There's certainly a trade-off here between elegance and obviousness. The fact the current DSL has some non-obvious logic involved is definitely a cost that should be considered. > > What happens when we add new types of things that we detect changes for? > I guess the issue is that the task wants to ask: "are there any changes that are not included in [X,Y]" where X,Y are the set of changes that the task _is_ aware of. So a method like: hasChangesOtherThan([InputFileChange,OutputFileChange]) would give this functionality. Stepping back, there are several things a task requires from this API. 1) Task can declare the changeset(s) it can handle, and stream these changes 2) Task needs to know if the changeset(s) are "incomplete": are there other changes outside of the ones I can handle. The case where there is no prior history is a type of "incomplete". 3) Task needs to be able to handle the "incomplete" case in a sensible way 4) Task needs to be able to handle the "complete" case in a sensible way With the current DSL, this is done by: 1) @TaskAction execute(InputFileChanges inputs, OutputFileChanges outputs) // I can handle changes to input files and output files 2) inputs.allOutOfDate() == "incomplete". Not sure if there is an equivalent on outputs? 3) inputs.outOfDate({}) // Every file is considered "out-of-date" outputs.changedSincePreviousExecution({}) // No outputs files considered changed? 4) inputs.outOfDate({}) // Only the changed files inputs.removed({}) outputs.outOfDate({}) // Only files that have changed outputs.removed({}) // Output files that have been removed eg: def execute(InputFileChanges inputs, OutputFileChanges outputs) { outputs.outOfDate { // This may be a set of actual changes: not sure what to send when no task history, changed input property, etc } inputs.outOfDate { // This may be a set of changes, or it may be every input (no task history, changed input property, etc) } } An alternative mechanism could be: 1) Explicitly ask the context for the changesets you care about: def changeset = context.getChanges([InputFileChange, OutputFileChange]) 2) Ask the context (or the changeset) if it is "complete" changeset.complete 3) Task author iterates over all inputs (just like a non-incremental task): cannot use changeset 4) Task author iterates over changes eg: def execute(IncrementalTaskContext context) { def changeset = context.getChanges([InputFileChange, OutputFileChange]) if (changeset.complete) { changeset.eachChange(OutputFileChange) { ... } changeset.eachChange(InputFileChange) { ... } } else { // Regular non-incremental task action task.getInputs().eachFile { ... } } The latter is definitely more verbose, and probably a bit less convenient. But it's also more explicit and I think probably more understandable. -- Darrell (Daz) DeBoer Principal Engineer, Gradleware http://www.gradleware.com Join us at the Gradle Summit 2013, June 13th and 14th in Santa Clara, CA: http://www.gradlesummit.com
