On 11/04/2013, at 8:29 AM, Adam Murdoch <[email protected]> wrote:
> > On 11/04/2013, at 2:55 PM, Daz DeBoer <[email protected]> wrote: > >> >> >> >> On 10 April 2013 15:33, Adam Murdoch <[email protected]> wrote: >> >> On 10/04/2013, at 8:53 AM, Daz DeBoer <[email protected]> wrote: >> >>> On 9 April 2013 05:25, Luke Daley <[email protected]> wrote: >>>> >>>> On 09/04/2013, at 11:55 AM, Luke Daley <[email protected]> wrote: >>>> >>>>> >>>>> On 27/03/2013, at 3:43 PM, Daz DeBoer <[email protected]> >>>>> wrote: >>>>> >>>>>> G'day >>>>>> >>>>>> Now in master is a pretty cool new feature: you can now implement an >>>>>> 'incremental' task that is informed about exactly which input files >>>>>> have changed when the task is out of date. >>>>>> This is very useful for something like a C++ compile task, as it means >>>>>> that only the changed files need to be recompiled, rather than the >>>>>> entire set of inputs. >>>>>> >>>>>> I've got a 'draft' DSL functioning, and would appreciate any feedback >>>>>> you guys have. Here's a sample: >>>>>> >>>>>> class IncrementalSync extends DefaultTask { >>>>>> @InputFiles >>>>>> def FileCollection src >>>>>> >>>>>> @OutputDirectory >>>>>> def File destination >>>>>> >>>>>> @TaskAction >>>>>> void execute(TaskInputChanges inputs) { >>>>>> if (inputs.allOutOfDate) { >>>>>> FileUtils.forceDelete(destination) >>>>>> } >>>>>> >>>>>> inputs.outOfDate({ >>>>>> FileUtils.copyFile(change.file, targetFile(change.file)) >>>>>> } as Action) >>>>>> .removed({ >>>>>> FileUtils.forceDelete(targetFile(change.file)) >>>>>> } as Action) >>>>>> .process() >>>>>> } >>>>>> >>>>>> def targetFile(def inputFile) { >>>>>> new File(destination, change.file.name) >>>>>> } >>>>>> } >>>>> >>>>> Thinking more about this, I don't think what we have is right. I think we >>>>> can come up with a simpler API. >>>>> >>>>> interface TaskExecutionContext { >>>>> >>>>> boolean isHasPriorOutputs(); // this is an incremental execution >>>>> >>>>> void eachInputFileChange(Action<InputFileChange> action); >>>>> >>>>> interface InputFileChange { >>>>> boolean isAdded(); >>>>> boolean isModified(); >>>>> boolean isRemoved(); >>>>> File getFile(); >>>>> } >>>>> >>>>> } >>>>> >>>>> @TaskAction >>>>> void execute(TaskExecutionContext executionContext) { >>>>> if (executionContext.hasPriorOutputs) { >>>>> executionContext.eachInputFileChange { >>>>> >>>>> } >>>>> } else { >>>>> // full rebuild >>>>> } >>>>> } >>>>> >>>>> Some issues I have with the existing design: >>>>> >>>>> * I'm not sure "outOfDate" is the right term here. It doesn't quite >>>>> capture it and can imply more than what we actually mean. You could say >>>>> that an output is out of date, but it doesn't quite work for inputs. >>>>> * why do we need the outOfDate() and removed() methods? After reading the >>>>> Javadoc I'm not sure how I would use this API. What's the difference >>>>> between out-of-date and removed? I could guess, but it's not clear. >>>>> * I think we'll want to provide information that is not strictly an >>>>> aspect of the inputs or outputs, but of something emergent from the >>>>> input/output concept. That's why I think modelling an encompassing type >>>>> is better. >>>> >>>> Sorry, accidentally hit send… >>>> >>>>> What I think we are trying to do is make it easy for the user to decide >>>>> whether they should do a fine grained incremental execution or a coarse >>>>> grained full. >>>> >>>> The existing isAllOutOfDate() method goes too far, based on its Javadoc: >>>> >>>> /** >>>> * Specifies if incremental build is not possible due to changed Input >>>> Properties, Output Files, etc. >>>> * In this case, every file will be considered to be 'out-of-date'. >>>> */ >>>> boolean isAllOutOfDate(); >>>> >>>> It is too presumptuous of us to say that all outputs are invalid just >>>> because an input property changed. The task author may know better and can >>>> only rebuild what is effected by the input property change. >>>> >>>> We can tell if a full build is needed with relative confidence based on >>>> the lack of task history, or by verifying that the declared outputs don't >>>> exist (if that's fast enough), hence the isHasPriorOutputs() method. It >>>> would be nice to be able to tell the task that all of the input files have >>>> changed in some way, but this seems like it would be very expensive to >>>> compute. >>>> >>>>> -- >>>>> Luke Daley >>>>> Principal Engineer, Gradleware >>>>> http://gradleware.com >>>>> >>>>> Join me at the Gradle Summit 2013, June 13th and 14th in Santa Clara, CA: >>>>> http://www.gradlesummit.com >>>>> >>> >>> For our usual "up-to-date" check, these are the things that can cause >>> a task to be out-of-date (in order of checking): >>> 1. Task has no declared outputs (always out-of-date with no declared >>> outputs) >>> 2. Task.upToDateWhen() returns false >>> 3. Task has no record of previous execution >>> 4. Task class has changed since previous execution >>> 5. Task input property has changed >>> 6. Task output files have changed >>> 7. Task input files have changed >>> >>> So the interesting thing is how to deal with all of these cases with >>> the incremental API: >>> >>> (We are currently not honouring 1. for incremental tasks: this rule is >>> mostly to make lifecycle tasks (and similar) fast, by not >>> persisting/checking state. By making your task 'incremental' you're >>> also telling Gradle not to make this check.) >>> >>> The initial approach was that the InputFileChanges parameter on >>> TaskAction was an indication that the task could handle changes to >>> input files: anything other sort of state change could not be handled >>> by the task. The "out-of-date" files would be the set of files that >>> you needed to re-evaluate, based on _all_ changes, including those >>> that you cannot handle directly (like, say, input property changes). >>> We would later add other parameter types to declare that the task >>> could handle other incremental changes: properties, output files, etc. >>> >>> Having spent more time on this, I'm not really liking this approach so >>> much: I think that a simple "tell me the changed inputs" method is >>> more useful and less confusing than a clever "tell me what I should >>> re-evaluate" method (this is what Luke is suggesting, too). >>> >>> We then need a way to tell an incremental task that other >>> (non-input-file) changes have occurred since the prior execution, >>> leaving the task implementation to decide what to do in this case. >>> This API needs to be able to adapt to the future when a task can >>> handle input-file and input-property changes, but still needs to know >>> if task.upToDate() is false or outputs have changed since previous >>> execution. Right now, we need to be able to ask "has anything other >>> than 'input-files' changed since the last execution"? >>> >>> Perhaps something like: >>> >>> IncrementalExecutionContext { >>> boolean isIncremental(); // false for cases 2,3,4 above >>> >>> // We can pretty easily implement these right now >>> boolean isHasInputPropertyChanges() >>> boolean isHasInputFileChanges() >>> boolean isHasOutputFileChanges() >>> >>> void eachInputFileChange(Action<InputFileChange> action) >>> >>> // And to add to the API in the future >>> void eachInputPropertyChange(Action<InputPropertyChange> action) >>> void eachOutputFileChange(Action<OutputFileChange> action) >>> } >> >> How does this actually make it easier to implement an incremental task, >> given that most tasks don't care about output file changes and approximately >> none care about property changes? >> >> It seems like it's more understandable and obvious, and not that much less >> easy to use. People already know how to implement a non-incremental task, >> and it might be easier for them to choose to do it the "old way" when the >> task is not incremental, or there are changes that the task cannot handle. > > Why would it be easier? Do you have a concrete example? Why would the task > not just go: for each input file that needs work, do the work. > >> >> There's certainly a trade-off here between elegance and obviousness. The >> fact the current DSL has some non-obvious logic involved is definitely a >> cost that should be considered. >> >> >> What happens when we add new types of things that we detect changes for? >> >> I guess the issue is that the task wants to ask: "are there any changes that >> are not included in [X,Y]" where X,Y are the set of changes that the task >> _is_ aware of. >> So a method like: >> hasChangesOtherThan([InputFileChange,OutputFileChange]) would give this >> functionality. >> >> >> Stepping back, there are several things a task requires from this API. >> 1) Task can declare the changeset(s) it can handle, and stream these changes >> 2) Task needs to know if the changeset(s) are "incomplete": are there other >> changes outside of the ones I can handle. The case where there is no prior >> history is a type of "incomplete". >> 3) Task needs to be able to handle the "incomplete" case in a sensible way >> 4) Task needs to be able to handle the "complete" case in a sensible way >> >> With the current DSL, this is done by: >> 1) @TaskAction execute(InputFileChanges inputs, OutputFileChanges outputs) >> // I can handle changes to input files and output files >> 2) inputs.allOutOfDate() == "incomplete". Not sure if there is an equivalent >> on outputs? >> 3) inputs.outOfDate({}) // Every file is considered "out-of-date" >> outputs.changedSincePreviousExecution({}) // No outputs files considered >> changed? >> 4) inputs.outOfDate({}) // Only the changed files >> inputs.removed({}) >> outputs.outOfDate({}) // Only files that have changed >> outputs.removed({}) // Output files that have been removed >> >> eg: >> def execute(InputFileChanges inputs, OutputFileChanges outputs) { >> outputs.outOfDate { >> // This may be a set of actual changes: not sure what to send when >> no task history, changed input property, etc >> } >> inputs.outOfDate { >> // This may be a set of changes, or it may be every input (no task >> history, changed input property, etc) >> } >> } >> >> An alternative mechanism could be: >> 1) Explicitly ask the context for the changesets you care about: >> def changeset = context.getChanges([InputFileChange, >> OutputFileChange]) >> 2) Ask the context (or the changeset) if it is "complete" >> changeset.complete >> 3) Task author iterates over all inputs (just like a non-incremental task): >> cannot use changeset >> 4) Task author iterates over changes >> >> eg: >> def execute(IncrementalTaskContext context) { >> def changeset = context.getChanges([InputFileChange, OutputFileChange]) >> if (changeset.complete) { >> changeset.eachChange(OutputFileChange) { >> ... >> } >> changeset.eachChange(InputFileChange) { >> ... >> } >> } else { >> // Regular non-incremental task action >> task.getInputs().eachFile { >> ... >> } >> } >> >> The latter is definitely more verbose, and probably a bit less convenient. >> But it's also more explicit and I think probably more understandable. > > Maybe. It's better than your other proposal above, in that it doesn't force > the task implementation to know about every type of change when it doesn't > want to. This is important because very few task implementations are going to > care about anything other than input file changes. > > However, the above also rules out certain performance optimisations as we > don't know until the task executes what it will actually ask for. Instead, we > have to be able to deal with the task asking for anything. Exactly what a > task can handle, however, isn't dynamic - it's baked into the code and very > much static. So, the API needs to expose this statically, so that we can do > useful things with this information. > > This also make evolution easier, as we can inject various different views > instead of a single context that exposes everything. > > As far as 'understandability' goes, I think the basic problem here is that > we're defining the API in terms of changes, whereas the task implementations > are interested in the work they have to do, and (very infrequently) the > reason why. And while a change is a type of reason, but not all reasons are > changes (eg --rerun-tasks and missing history are reasons, but they're not > changes). > > So, I think the API needs to express things in terms of what work the task > needs to do, not in terms of what has changed. This has correctness > implications, as well, as the task doesn't need to (and possibly cannot) be > involved in converting changes to work. And for this API, correctness is the > primary goal, followed by performance, followed by understandability. It's > not a beginner API. > > Here are the different kinds of work we're interested in: > > - Process all input files for which there is no corresponding output file. > - Process all input files which have changed since their corresponding output > files were generated. > - Process all input files for which the corresponding output file has changed > since they were generated. > - Process all input files for which we don't have any history. > - Process all input files for which the corresponding output file is > incorrect due to a property change. > - Remove outputs for all input files that have been removed. > - Remove existing outputs before processing input files, for when we're > rebuilding the outputs (for whatever reason). > > This says to me we want an API that > > - Allows a task to invalidate output files given the set of property changes. > Default is to invalidate all output files for any property change. > - Allows a task to invalidate input files given the set of output file > changes. Default is to invalidate all input files for any output file change. > - Allows a task to ask for the input files it needs to do work on. And you're saying that the current API, with just some renames, provides this? -- Luke Daley Principal Engineer, Gradleware http://gradleware.com Join me at the Gradle Summit 2013, June 13th and 14th in Santa Clara, CA: http://www.gradlesummit.com --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
