On 13/04/2013, at 10:56 PM, Adam Murdoch <adam.murd...@gradleware.com> wrote:
> > On 14/04/2013, at 7:22 AM, Adam Murdoch <adam.murd...@gradleware.com> wrote: > >> >> On 12/04/2013, at 1:37 AM, Daz DeBoer <darrell.deb...@gradleware.com> wrote: >> >>> On 11 April 2013 04:47, Luke Daley <luke.da...@gradleware.com> wrote: >>>> >>>> On 11/04/2013, at 8:29 AM, Adam Murdoch <adam.murd...@gradleware.com> >>>> wrote: >>>> >>>>> >>>>> On 11/04/2013, at 2:55 PM, Daz DeBoer <darrell.deb...@gradleware.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 10 April 2013 15:33, Adam Murdoch <adam.murd...@gradleware.com> wrote: >>>>>> >>>>>> On 10/04/2013, at 8:53 AM, Daz DeBoer <darrell.deb...@gradleware.com> >>>>>> wrote: >>>>>> >>>>>>> On 9 April 2013 05:25, Luke Daley <luke.da...@gradleware.com> wrote: >>>>>>>> >>>>>>>> On 09/04/2013, at 11:55 AM, Luke Daley <luke.da...@gradleware.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On 27/03/2013, at 3:43 PM, Daz DeBoer <darrell.deb...@gradleware.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> G'day >>>>>>>>>> >>>>>>>>>> Now in master is a pretty cool new feature: you can now implement an >>>>>>>>>> 'incremental' task that is informed about exactly which input files >>>>>>>>>> have changed when the task is out of date. >>>>>>>>>> This is very useful for something like a C++ compile task, as it >>>>>>>>>> means >>>>>>>>>> that only the changed files need to be recompiled, rather than the >>>>>>>>>> entire set of inputs. >>>>>>>>>> >>>>>>>>>> I've got a 'draft' DSL functioning, and would appreciate any feedback >>>>>>>>>> you guys have. Here's a sample: >>>>>>>>>> >>>>>>>>>> class IncrementalSync extends DefaultTask { >>>>>>>>>> @InputFiles >>>>>>>>>> def FileCollection src >>>>>>>>>> >>>>>>>>>> @OutputDirectory >>>>>>>>>> def File destination >>>>>>>>>> >>>>>>>>>> @TaskAction >>>>>>>>>> void execute(TaskInputChanges inputs) { >>>>>>>>>> if (inputs.allOutOfDate) { >>>>>>>>>> FileUtils.forceDelete(destination) >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> inputs.outOfDate({ >>>>>>>>>> FileUtils.copyFile(change.file, >>>>>>>>>> targetFile(change.file)) >>>>>>>>>> } as Action) >>>>>>>>>> .removed({ >>>>>>>>>> FileUtils.forceDelete(targetFile(change.file)) >>>>>>>>>> } as Action) >>>>>>>>>> .process() >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> def targetFile(def inputFile) { >>>>>>>>>> new File(destination, change.file.name) >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>> >>>>>>>>> Thinking more about this, I don't think what we have is right. I >>>>>>>>> think we can come up with a simpler API. >>>>>>>>> >>>>>>>>> interface TaskExecutionContext { >>>>>>>>> >>>>>>>>> boolean isHasPriorOutputs(); // this is an incremental execution >>>>>>>>> >>>>>>>>> void eachInputFileChange(Action<InputFileChange> action); >>>>>>>>> >>>>>>>>> interface InputFileChange { >>>>>>>>> boolean isAdded(); >>>>>>>>> boolean isModified(); >>>>>>>>> boolean isRemoved(); >>>>>>>>> File getFile(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> @TaskAction >>>>>>>>> void execute(TaskExecutionContext executionContext) { >>>>>>>>> if (executionContext.hasPriorOutputs) { >>>>>>>>> executionContext.eachInputFileChange { >>>>>>>>> >>>>>>>>> } >>>>>>>>> } else { >>>>>>>>> // full rebuild >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> Some issues I have with the existing design: >>>>>>>>> >>>>>>>>> * I'm not sure "outOfDate" is the right term here. It doesn't quite >>>>>>>>> capture it and can imply more than what we actually mean. You could >>>>>>>>> say that an output is out of date, but it doesn't quite work for >>>>>>>>> inputs. >>>>>>>>> * why do we need the outOfDate() and removed() methods? After reading >>>>>>>>> the Javadoc I'm not sure how I would use this API. What's the >>>>>>>>> difference between out-of-date and removed? I could guess, but it's >>>>>>>>> not clear. >>>>>>>>> * I think we'll want to provide information that is not strictly an >>>>>>>>> aspect of the inputs or outputs, but of something emergent from the >>>>>>>>> input/output concept. That's why I think modelling an encompassing >>>>>>>>> type is better. >>>>>>>> >>>>>>>> Sorry, accidentally hit send… >>>>>>>> >>>>>>>>> What I think we are trying to do is make it easy for the user to >>>>>>>>> decide whether they should do a fine grained incremental execution or >>>>>>>>> a coarse grained full. >>>>>>>> >>>>>>>> The existing isAllOutOfDate() method goes too far, based on its >>>>>>>> Javadoc: >>>>>>>> >>>>>>>> /** >>>>>>>> * Specifies if incremental build is not possible due to changed >>>>>>>> Input Properties, Output Files, etc. >>>>>>>> * In this case, every file will be considered to be 'out-of-date'. >>>>>>>> */ >>>>>>>> boolean isAllOutOfDate(); >>>>>>>> >>>>>>>> It is too presumptuous of us to say that all outputs are invalid just >>>>>>>> because an input property changed. The task author may know better and >>>>>>>> can only rebuild what is effected by the input property change. >>>>>>>> >>>>>>>> We can tell if a full build is needed with relative confidence based >>>>>>>> on the lack of task history, or by verifying that the declared outputs >>>>>>>> don't exist (if that's fast enough), hence the isHasPriorOutputs() >>>>>>>> method. It would be nice to be able to tell the task that all of the >>>>>>>> input files have changed in some way, but this seems like it would be >>>>>>>> very expensive to compute. >>>>>>>> >>>>>>>>> -- >>>>>>>>> Luke Daley >>>>>>>>> Principal Engineer, Gradleware >>>>>>>>> http://gradleware.com >>>>>>>>> >>>>>>>>> Join me at the Gradle Summit 2013, June 13th and 14th in Santa Clara, >>>>>>>>> CA: http://www.gradlesummit.com >>>>>>>>> >>>>>>> >>>>>>> For our usual "up-to-date" check, these are the things that can cause >>>>>>> a task to be out-of-date (in order of checking): >>>>>>> 1. Task has no declared outputs (always out-of-date with no declared >>>>>>> outputs) >>>>>>> 2. Task.upToDateWhen() returns false >>>>>>> 3. Task has no record of previous execution >>>>>>> 4. Task class has changed since previous execution >>>>>>> 5. Task input property has changed >>>>>>> 6. Task output files have changed >>>>>>> 7. Task input files have changed >>>>>>> >>>>>>> So the interesting thing is how to deal with all of these cases with >>>>>>> the incremental API: >>>>>>> >>>>>>> (We are currently not honouring 1. for incremental tasks: this rule is >>>>>>> mostly to make lifecycle tasks (and similar) fast, by not >>>>>>> persisting/checking state. By making your task 'incremental' you're >>>>>>> also telling Gradle not to make this check.) >>>>>>> >>>>>>> The initial approach was that the InputFileChanges parameter on >>>>>>> TaskAction was an indication that the task could handle changes to >>>>>>> input files: anything other sort of state change could not be handled >>>>>>> by the task. The "out-of-date" files would be the set of files that >>>>>>> you needed to re-evaluate, based on _all_ changes, including those >>>>>>> that you cannot handle directly (like, say, input property changes). >>>>>>> We would later add other parameter types to declare that the task >>>>>>> could handle other incremental changes: properties, output files, etc. >>>>>>> >>>>>>> Having spent more time on this, I'm not really liking this approach so >>>>>>> much: I think that a simple "tell me the changed inputs" method is >>>>>>> more useful and less confusing than a clever "tell me what I should >>>>>>> re-evaluate" method (this is what Luke is suggesting, too). >>>>>>> >>>>>>> We then need a way to tell an incremental task that other >>>>>>> (non-input-file) changes have occurred since the prior execution, >>>>>>> leaving the task implementation to decide what to do in this case. >>>>>>> This API needs to be able to adapt to the future when a task can >>>>>>> handle input-file and input-property changes, but still needs to know >>>>>>> if task.upToDate() is false or outputs have changed since previous >>>>>>> execution. Right now, we need to be able to ask "has anything other >>>>>>> than 'input-files' changed since the last execution"? >>>>>>> >>>>>>> Perhaps something like: >>>>>>> >>>>>>> IncrementalExecutionContext { >>>>>>> boolean isIncremental(); // false for cases 2,3,4 above >>>>>>> >>>>>>> // We can pretty easily implement these right now >>>>>>> boolean isHasInputPropertyChanges() >>>>>>> boolean isHasInputFileChanges() >>>>>>> boolean isHasOutputFileChanges() >>>>>>> >>>>>>> void eachInputFileChange(Action<InputFileChange> action) >>>>>>> >>>>>>> // And to add to the API in the future >>>>>>> void eachInputPropertyChange(Action<InputPropertyChange> action) >>>>>>> void eachOutputFileChange(Action<OutputFileChange> action) >>>>>>> } >>>>>> >>>>>> How does this actually make it easier to implement an incremental task, >>>>>> given that most tasks don't care about output file changes and >>>>>> approximately none care about property changes? >>>>>> >>>>>> It seems like it's more understandable and obvious, and not that much >>>>>> less easy to use. People already know how to implement a non-incremental >>>>>> task, and it might be easier for them to choose to do it the "old way" >>>>>> when the task is not incremental, or there are changes that the task >>>>>> cannot handle. >>>>> >>>>> Why would it be easier? Do you have a concrete example? Why would the >>>>> task not just go: for each input file that needs work, do the work. >>>>> >>>>>> >>>>>> There's certainly a trade-off here between elegance and obviousness. The >>>>>> fact the current DSL has some non-obvious logic involved is definitely a >>>>>> cost that should be considered. >>>>>> >>>>>> >>>>>> What happens when we add new types of things that we detect changes for? >>>>>> >>>>>> I guess the issue is that the task wants to ask: "are there any changes >>>>>> that are not included in [X,Y]" where X,Y are the set of changes that >>>>>> the task _is_ aware of. >>>>>> So a method like: >>>>>> hasChangesOtherThan([InputFileChange,OutputFileChange]) would give >>>>>> this functionality. >>>>>> >>>>>> >>>>>> Stepping back, there are several things a task requires from this API. >>>>>> 1) Task can declare the changeset(s) it can handle, and stream these >>>>>> changes >>>>>> 2) Task needs to know if the changeset(s) are "incomplete": are there >>>>>> other changes outside of the ones I can handle. The case where there is >>>>>> no prior history is a type of "incomplete". >>>>>> 3) Task needs to be able to handle the "incomplete" case in a sensible >>>>>> way >>>>>> 4) Task needs to be able to handle the "complete" case in a sensible way >>>>>> >>>>>> With the current DSL, this is done by: >>>>>> 1) @TaskAction execute(InputFileChanges inputs, OutputFileChanges >>>>>> outputs) // I can handle changes to input files and output files >>>>>> 2) inputs.allOutOfDate() == "incomplete". Not sure if there is an >>>>>> equivalent on outputs? >>>>>> 3) inputs.outOfDate({}) // Every file is considered "out-of-date" >>>>>> outputs.changedSincePreviousExecution({}) // No outputs files >>>>>> considered changed? >>>>>> 4) inputs.outOfDate({}) // Only the changed files >>>>>> inputs.removed({}) >>>>>> outputs.outOfDate({}) // Only files that have changed >>>>>> outputs.removed({}) // Output files that have been removed >>>>>> >>>>>> eg: >>>>>> def execute(InputFileChanges inputs, OutputFileChanges outputs) { >>>>>> outputs.outOfDate { >>>>>> // This may be a set of actual changes: not sure what to send >>>>>> when no task history, changed input property, etc >>>>>> } >>>>>> inputs.outOfDate { >>>>>> // This may be a set of changes, or it may be every input (no task >>>>>> history, changed input property, etc) >>>>>> } >>>>>> } >>>>>> >>>>>> An alternative mechanism could be: >>>>>> 1) Explicitly ask the context for the changesets you care about: >>>>>> def changeset = context.getChanges([InputFileChange, >>>>>> OutputFileChange]) >>>>>> 2) Ask the context (or the changeset) if it is "complete" >>>>>> changeset.complete >>>>>> 3) Task author iterates over all inputs (just like a non-incremental >>>>>> task): cannot use changeset >>>>>> 4) Task author iterates over changes >>>>>> >>>>>> eg: >>>>>> def execute(IncrementalTaskContext context) { >>>>>> def changeset = context.getChanges([InputFileChange, OutputFileChange]) >>>>>> if (changeset.complete) { >>>>>> changeset.eachChange(OutputFileChange) { >>>>>> ... >>>>>> } >>>>>> changeset.eachChange(InputFileChange) { >>>>>> ... >>>>>> } >>>>>> } else { >>>>>> // Regular non-incremental task action >>>>>> task.getInputs().eachFile { >>>>>> ... >>>>>> } >>>>>> } >>>>>> >>>>>> The latter is definitely more verbose, and probably a bit less >>>>>> convenient. But it's also more explicit and I think probably more >>>>>> understandable. >>>>> >>>>> Maybe. It's better than your other proposal above, in that it doesn't >>>>> force the task implementation to know about every type of change when it >>>>> doesn't want to. This is important because very few task implementations >>>>> are going to care about anything other than input file changes. >>>>> >>>>> However, the above also rules out certain performance optimisations as we >>>>> don't know until the task executes what it will actually ask for. >>>>> Instead, we have to be able to deal with the task asking for anything. >>>>> Exactly what a task can handle, however, isn't dynamic - it's baked into >>>>> the code and very much static. So, the API needs to expose this >>>>> statically, so that we can do useful things with this information. >>>>> >>>>> This also make evolution easier, as we can inject various different views >>>>> instead of a single context that exposes everything. >>>>> >>>>> As far as 'understandability' goes, I think the basic problem here is >>>>> that we're defining the API in terms of changes, whereas the task >>>>> implementations are interested in the work they have to do, and (very >>>>> infrequently) the reason why. And while a change is a type of reason, but >>>>> not all reasons are changes (eg --rerun-tasks and missing history are >>>>> reasons, but they're not changes). >>>>> >>>>> So, I think the API needs to express things in terms of what work the >>>>> task needs to do, not in terms of what has changed. This has correctness >>>>> implications, as well, as the task doesn't need to (and possibly cannot) >>>>> be involved in converting changes to work. And for this API, correctness >>>>> is the primary goal, followed by performance, followed by >>>>> understandability. It's not a beginner API. >>>>> >>>>> Here are the different kinds of work we're interested in: >>>>> >>>>> - Process all input files for which there is no corresponding output file. >>>>> - Process all input files which have changed since their corresponding >>>>> output files were generated. >>>>> - Process all input files for which the corresponding output file has >>>>> changed since they were generated. >>>>> - Process all input files for which we don't have any history. >>>>> - Process all input files for which the corresponding output file is >>>>> incorrect due to a property change. >>>>> - Remove outputs for all input files that have been removed. >>>>> - Remove existing outputs before processing input files, for when we're >>>>> rebuilding the outputs (for whatever reason). >>>>> >>>>> This says to me we want an API that >>>>> >>>>> - Allows a task to invalidate output files given the set of property >>>>> changes. Default is to invalidate all output files for any property >>>>> change. >>>>> - Allows a task to invalidate input files given the set of output file >>>>> changes. Default is to invalidate all input files for any output file >>>>> change. >>>>> - Allows a task to ask for the input files it needs to do work on. >>>> >>>> And you're saying that the current API, with just some renames, provides >>>> this? >>>> >>> >>> Well the current API provides for the third option, with the default >>> behaviour for changed properties and changed outputs. >>> We could actually remove TaskInputChanges.allOutOfDate, as it doesn't >>> serve any of the purposes above. >> >> The list above wasn't complete. We still need to deal with the case where we >> don't know what old state you need to clean up. For example, say we make our >> dsl meta data task incremental. This task produces a single merged output >> file from a bunch of input files. If we don't have the history for the task >> or if this output file has been changed or if running with --rerun-tasks or >> if the task implementation has changed, then we need to make sure that >> before the task works on the input files, this output file has been cleaned >> of all existing state (eg deleted). >> >> Right now we use the allOutOfDate flag to let the task know this. Instead we >> might: >> >> - delete the output files of the task before calling the task action. >> - let the task know which output files have been invalidated. >> - keep the flag and rename it something like 'allOutputsInvalid' or >> 'rebuildRequired'. > > Another option would be to split the task action into 2 separate methods, one > which is called on rebuild and one which is called on incremental execution. This doesn't feel right to me. For 1:1 transforming tasks this would be a nuisance. > Or add an optional method that is called on rebuild before the task action > method is called. I can't see what this buys us over telling the user via the info we give the task action. -- Luke Daley Principal Engineer, Gradleware http://gradleware.com Join me at the Gradle Summit 2013, June 13th and 14th in Santa Clara, CA: http://www.gradlesummit.com --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email