On 12/04/2013, at 7:33 PM, Luke Daley <luke.da...@gradleware.com> wrote:
> > On 11/04/2013, at 4:37 PM, Daz DeBoer <darrell.deb...@gradleware.com> wrote: > >> On 11 April 2013 04:47, Luke Daley <luke.da...@gradleware.com> wrote: >>> >>> On 11/04/2013, at 8:29 AM, Adam Murdoch <adam.murd...@gradleware.com> wrote: >>> >>>> >>>> On 11/04/2013, at 2:55 PM, Daz DeBoer <darrell.deb...@gradleware.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On 10 April 2013 15:33, Adam Murdoch <adam.murd...@gradleware.com> wrote: >>>>> >>>>> On 10/04/2013, at 8:53 AM, Daz DeBoer <darrell.deb...@gradleware.com> >>>>> wrote: >>>>> >>>>>> On 9 April 2013 05:25, Luke Daley <luke.da...@gradleware.com> wrote: >>>>>>> >>>>>>> On 09/04/2013, at 11:55 AM, Luke Daley <luke.da...@gradleware.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> On 27/03/2013, at 3:43 PM, Daz DeBoer <darrell.deb...@gradleware.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> G'day >>>>>>>>> >>>>>>>>> Now in master is a pretty cool new feature: you can now implement an >>>>>>>>> 'incremental' task that is informed about exactly which input files >>>>>>>>> have changed when the task is out of date. >>>>>>>>> This is very useful for something like a C++ compile task, as it means >>>>>>>>> that only the changed files need to be recompiled, rather than the >>>>>>>>> entire set of inputs. >>>>>>>>> >>>>>>>>> I've got a 'draft' DSL functioning, and would appreciate any feedback >>>>>>>>> you guys have. Here's a sample: >>>>>>>>> >>>>>>>>> class IncrementalSync extends DefaultTask { >>>>>>>>> @InputFiles >>>>>>>>> def FileCollection src >>>>>>>>> >>>>>>>>> @OutputDirectory >>>>>>>>> def File destination >>>>>>>>> >>>>>>>>> @TaskAction >>>>>>>>> void execute(TaskInputChanges inputs) { >>>>>>>>> if (inputs.allOutOfDate) { >>>>>>>>> FileUtils.forceDelete(destination) >>>>>>>>> } >>>>>>>>> >>>>>>>>> inputs.outOfDate({ >>>>>>>>> FileUtils.copyFile(change.file, >>>>>>>>> targetFile(change.file)) >>>>>>>>> } as Action) >>>>>>>>> .removed({ >>>>>>>>> FileUtils.forceDelete(targetFile(change.file)) >>>>>>>>> } as Action) >>>>>>>>> .process() >>>>>>>>> } >>>>>>>>> >>>>>>>>> def targetFile(def inputFile) { >>>>>>>>> new File(destination, change.file.name) >>>>>>>>> } >>>>>>>>> } >>>>>>>> >>>>>>>> Thinking more about this, I don't think what we have is right. I think >>>>>>>> we can come up with a simpler API. >>>>>>>> >>>>>>>> interface TaskExecutionContext { >>>>>>>> >>>>>>>> boolean isHasPriorOutputs(); // this is an incremental execution >>>>>>>> >>>>>>>> void eachInputFileChange(Action<InputFileChange> action); >>>>>>>> >>>>>>>> interface InputFileChange { >>>>>>>> boolean isAdded(); >>>>>>>> boolean isModified(); >>>>>>>> boolean isRemoved(); >>>>>>>> File getFile(); >>>>>>>> } >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> @TaskAction >>>>>>>> void execute(TaskExecutionContext executionContext) { >>>>>>>> if (executionContext.hasPriorOutputs) { >>>>>>>> executionContext.eachInputFileChange { >>>>>>>> >>>>>>>> } >>>>>>>> } else { >>>>>>>> // full rebuild >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> Some issues I have with the existing design: >>>>>>>> >>>>>>>> * I'm not sure "outOfDate" is the right term here. It doesn't quite >>>>>>>> capture it and can imply more than what we actually mean. You could >>>>>>>> say that an output is out of date, but it doesn't quite work for >>>>>>>> inputs. >>>>>>>> * why do we need the outOfDate() and removed() methods? After reading >>>>>>>> the Javadoc I'm not sure how I would use this API. What's the >>>>>>>> difference between out-of-date and removed? I could guess, but it's >>>>>>>> not clear. >>>>>>>> * I think we'll want to provide information that is not strictly an >>>>>>>> aspect of the inputs or outputs, but of something emergent from the >>>>>>>> input/output concept. That's why I think modelling an encompassing >>>>>>>> type is better. >>>>>>> >>>>>>> Sorry, accidentally hit send… >>>>>>> >>>>>>>> What I think we are trying to do is make it easy for the user to >>>>>>>> decide whether they should do a fine grained incremental execution or >>>>>>>> a coarse grained full. >>>>>>> >>>>>>> The existing isAllOutOfDate() method goes too far, based on its Javadoc: >>>>>>> >>>>>>> /** >>>>>>> * Specifies if incremental build is not possible due to changed Input >>>>>>> Properties, Output Files, etc. >>>>>>> * In this case, every file will be considered to be 'out-of-date'. >>>>>>> */ >>>>>>> boolean isAllOutOfDate(); >>>>>>> >>>>>>> It is too presumptuous of us to say that all outputs are invalid just >>>>>>> because an input property changed. The task author may know better and >>>>>>> can only rebuild what is effected by the input property change. >>>>>>> >>>>>>> We can tell if a full build is needed with relative confidence based on >>>>>>> the lack of task history, or by verifying that the declared outputs >>>>>>> don't exist (if that's fast enough), hence the isHasPriorOutputs() >>>>>>> method. It would be nice to be able to tell the task that all of the >>>>>>> input files have changed in some way, but this seems like it would be >>>>>>> very expensive to compute. >>>>>>> >>>>>>>> -- >>>>>>>> Luke Daley >>>>>>>> Principal Engineer, Gradleware >>>>>>>> http://gradleware.com >>>>>>>> >>>>>>>> Join me at the Gradle Summit 2013, June 13th and 14th in Santa Clara, >>>>>>>> CA: http://www.gradlesummit.com >>>>>>>> >>>>>> >>>>>> For our usual "up-to-date" check, these are the things that can cause >>>>>> a task to be out-of-date (in order of checking): >>>>>> 1. Task has no declared outputs (always out-of-date with no declared >>>>>> outputs) >>>>>> 2. Task.upToDateWhen() returns false >>>>>> 3. Task has no record of previous execution >>>>>> 4. Task class has changed since previous execution >>>>>> 5. Task input property has changed >>>>>> 6. Task output files have changed >>>>>> 7. Task input files have changed >>>>>> >>>>>> So the interesting thing is how to deal with all of these cases with >>>>>> the incremental API: >>>>>> >>>>>> (We are currently not honouring 1. for incremental tasks: this rule is >>>>>> mostly to make lifecycle tasks (and similar) fast, by not >>>>>> persisting/checking state. By making your task 'incremental' you're >>>>>> also telling Gradle not to make this check.) >>>>>> >>>>>> The initial approach was that the InputFileChanges parameter on >>>>>> TaskAction was an indication that the task could handle changes to >>>>>> input files: anything other sort of state change could not be handled >>>>>> by the task. The "out-of-date" files would be the set of files that >>>>>> you needed to re-evaluate, based on _all_ changes, including those >>>>>> that you cannot handle directly (like, say, input property changes). >>>>>> We would later add other parameter types to declare that the task >>>>>> could handle other incremental changes: properties, output files, etc. >>>>>> >>>>>> Having spent more time on this, I'm not really liking this approach so >>>>>> much: I think that a simple "tell me the changed inputs" method is >>>>>> more useful and less confusing than a clever "tell me what I should >>>>>> re-evaluate" method (this is what Luke is suggesting, too). >>>>>> >>>>>> We then need a way to tell an incremental task that other >>>>>> (non-input-file) changes have occurred since the prior execution, >>>>>> leaving the task implementation to decide what to do in this case. >>>>>> This API needs to be able to adapt to the future when a task can >>>>>> handle input-file and input-property changes, but still needs to know >>>>>> if task.upToDate() is false or outputs have changed since previous >>>>>> execution. Right now, we need to be able to ask "has anything other >>>>>> than 'input-files' changed since the last execution"? >>>>>> >>>>>> Perhaps something like: >>>>>> >>>>>> IncrementalExecutionContext { >>>>>> boolean isIncremental(); // false for cases 2,3,4 above >>>>>> >>>>>> // We can pretty easily implement these right now >>>>>> boolean isHasInputPropertyChanges() >>>>>> boolean isHasInputFileChanges() >>>>>> boolean isHasOutputFileChanges() >>>>>> >>>>>> void eachInputFileChange(Action<InputFileChange> action) >>>>>> >>>>>> // And to add to the API in the future >>>>>> void eachInputPropertyChange(Action<InputPropertyChange> action) >>>>>> void eachOutputFileChange(Action<OutputFileChange> action) >>>>>> } >>>>> >>>>> How does this actually make it easier to implement an incremental task, >>>>> given that most tasks don't care about output file changes and >>>>> approximately none care about property changes? >>>>> >>>>> It seems like it's more understandable and obvious, and not that much >>>>> less easy to use. People already know how to implement a non-incremental >>>>> task, and it might be easier for them to choose to do it the "old way" >>>>> when the task is not incremental, or there are changes that the task >>>>> cannot handle. >>>> >>>> Why would it be easier? Do you have a concrete example? Why would the task >>>> not just go: for each input file that needs work, do the work. >>>> >>>>> >>>>> There's certainly a trade-off here between elegance and obviousness. The >>>>> fact the current DSL has some non-obvious logic involved is definitely a >>>>> cost that should be considered. >>>>> >>>>> >>>>> What happens when we add new types of things that we detect changes for? >>>>> >>>>> I guess the issue is that the task wants to ask: "are there any changes >>>>> that are not included in [X,Y]" where X,Y are the set of changes that the >>>>> task _is_ aware of. >>>>> So a method like: >>>>> hasChangesOtherThan([InputFileChange,OutputFileChange]) would give this >>>>> functionality. >>>>> >>>>> >>>>> Stepping back, there are several things a task requires from this API. >>>>> 1) Task can declare the changeset(s) it can handle, and stream these >>>>> changes >>>>> 2) Task needs to know if the changeset(s) are "incomplete": are there >>>>> other changes outside of the ones I can handle. The case where there is >>>>> no prior history is a type of "incomplete". >>>>> 3) Task needs to be able to handle the "incomplete" case in a sensible way >>>>> 4) Task needs to be able to handle the "complete" case in a sensible way >>>>> >>>>> With the current DSL, this is done by: >>>>> 1) @TaskAction execute(InputFileChanges inputs, OutputFileChanges >>>>> outputs) // I can handle changes to input files and output files >>>>> 2) inputs.allOutOfDate() == "incomplete". Not sure if there is an >>>>> equivalent on outputs? >>>>> 3) inputs.outOfDate({}) // Every file is considered "out-of-date" >>>>> outputs.changedSincePreviousExecution({}) // No outputs files >>>>> considered changed? >>>>> 4) inputs.outOfDate({}) // Only the changed files >>>>> inputs.removed({}) >>>>> outputs.outOfDate({}) // Only files that have changed >>>>> outputs.removed({}) // Output files that have been removed >>>>> >>>>> eg: >>>>> def execute(InputFileChanges inputs, OutputFileChanges outputs) { >>>>> outputs.outOfDate { >>>>> // This may be a set of actual changes: not sure what to send when >>>>> no task history, changed input property, etc >>>>> } >>>>> inputs.outOfDate { >>>>> // This may be a set of changes, or it may be every input (no task >>>>> history, changed input property, etc) >>>>> } >>>>> } >>>>> >>>>> An alternative mechanism could be: >>>>> 1) Explicitly ask the context for the changesets you care about: >>>>> def changeset = context.getChanges([InputFileChange, >>>>> OutputFileChange]) >>>>> 2) Ask the context (or the changeset) if it is "complete" >>>>> changeset.complete >>>>> 3) Task author iterates over all inputs (just like a non-incremental >>>>> task): cannot use changeset >>>>> 4) Task author iterates over changes >>>>> >>>>> eg: >>>>> def execute(IncrementalTaskContext context) { >>>>> def changeset = context.getChanges([InputFileChange, OutputFileChange]) >>>>> if (changeset.complete) { >>>>> changeset.eachChange(OutputFileChange) { >>>>> ... >>>>> } >>>>> changeset.eachChange(InputFileChange) { >>>>> ... >>>>> } >>>>> } else { >>>>> // Regular non-incremental task action >>>>> task.getInputs().eachFile { >>>>> ... >>>>> } >>>>> } >>>>> >>>>> The latter is definitely more verbose, and probably a bit less >>>>> convenient. But it's also more explicit and I think probably more >>>>> understandable. >>>> >>>> Maybe. It's better than your other proposal above, in that it doesn't >>>> force the task implementation to know about every type of change when it >>>> doesn't want to. This is important because very few task implementations >>>> are going to care about anything other than input file changes. >>>> >>>> However, the above also rules out certain performance optimisations as we >>>> don't know until the task executes what it will actually ask for. Instead, >>>> we have to be able to deal with the task asking for anything. Exactly what >>>> a task can handle, however, isn't dynamic - it's baked into the code and >>>> very much static. So, the API needs to expose this statically, so that we >>>> can do useful things with this information. >>>> >>>> This also make evolution easier, as we can inject various different views >>>> instead of a single context that exposes everything. >>>> >>>> As far as 'understandability' goes, I think the basic problem here is that >>>> we're defining the API in terms of changes, whereas the task >>>> implementations are interested in the work they have to do, and (very >>>> infrequently) the reason why. And while a change is a type of reason, but >>>> not all reasons are changes (eg --rerun-tasks and missing history are >>>> reasons, but they're not changes). >>>> >>>> So, I think the API needs to express things in terms of what work the task >>>> needs to do, not in terms of what has changed. This has correctness >>>> implications, as well, as the task doesn't need to (and possibly cannot) >>>> be involved in converting changes to work. And for this API, correctness >>>> is the primary goal, followed by performance, followed by >>>> understandability. It's not a beginner API. >>>> >>>> Here are the different kinds of work we're interested in: >>>> >>>> - Process all input files for which there is no corresponding output file. >>>> - Process all input files which have changed since their corresponding >>>> output files were generated. >>>> - Process all input files for which the corresponding output file has >>>> changed since they were generated. >>>> - Process all input files for which we don't have any history. >>>> - Process all input files for which the corresponding output file is >>>> incorrect due to a property change. >>>> - Remove outputs for all input files that have been removed. >>>> - Remove existing outputs before processing input files, for when we're >>>> rebuilding the outputs (for whatever reason). >>>> >>>> This says to me we want an API that >>>> >>>> - Allows a task to invalidate output files given the set of property >>>> changes. Default is to invalidate all output files for any property change. >>>> - Allows a task to invalidate input files given the set of output file >>>> changes. Default is to invalidate all input files for any output file >>>> change. >>>> - Allows a task to ask for the input files it needs to do work on. >>> >>> And you're saying that the current API, with just some renames, provides >>> this? >>> >> >> Well the current API provides for the third option, with the default >> behaviour for changed properties and changed outputs. >> We could actually remove TaskInputChanges.allOutOfDate, as it doesn't >> serve any of the purposes above. > > I originally thought we'd need to support telling the task that it needs to > do everything explicitly, so that it could potentially do its work more > efficiently (e.g. batching). I don't think so. There are certainly those sorts of tasks, such as an incremental Java compile task. For this kind of task, you'd just collect the files into a collection and then chop that up however you like at the end. So, the semantics of the action aren't 'do the work on this file', rather it means 'here's a file you need to do work on'. -- Adam Murdoch Gradle Co-founder http://www.gradle.org VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting http://www.gradleware.com Join us at the Gradle Summit 2013, June 13th and 14th in Santa Clara, CA: http://www.gradlesummit.com