On 30/03/2012, at 8:44 PM, Luke Daley wrote:

> 
> On 30/03/2012, at 5:53 AM, Adam Murdoch wrote:
> 
>> 
>> On 30/03/2012, at 8:35 AM, Daz DeBoer wrote:
>> 
>>> After a little pondering, I'd favour an approach that is simple to describe 
>>> and doesn't result in unexpected behaviour; I think an extra HEAD request 
>>> here or there is ok.
>>> 
>>> How about we perform a HEAD request if we have any cache candidates, be 
>>> they local files or previous accesses to this URL.
>>> So the logic would be:
>>> Do we have any cache candidates? 
>>> If not, just HTTP GET the resource and we're done.
>>> HTTP HEAD to get the resource meta-data (and possibly the SHA1)
>>> If we got a 404, the resource is missing, we're done.
>>> If we match a cached URL resource, just use it and we're done.
>>> If we have a local file candidate, HTTP GET SHA1. 
>>> If published SHA1 was found and matches then we can cache the URL resource 
>>> and we're done. 
>>> HTTP GET the actual resource
>>> Pros:
>>> - We can get the SHA1 from the headers if available, and avoid the GET-SHA1 
>>> call.
>>> - If a local file matches, we can cache the URL resolution as if we did an 
>>> HTTP GET, since we have the full HTTP headers + the content. We never have 
>>> a cached resource without an origin.
>>> - After initially using a file from say .m2/repo to satisfy a request, from 
>>> then on it will be just like we actually downloaded it from the URL. So 
>>> there are no residual effects of using a local file in place of a 
>>> downloaded one. Use of local files is a pure optimisation.
>>> - If the artifact is missing altogether, we get a single 404 for the HEAD, 
>>> rather than 404 for the SHA1 + 404 for the GET
>>> - It's simpler to understand, I think.
>> 
>> - This approach works nicely as a decoration over all the transports we're 
>> interested in (http, sftp, webdav, local file, network file). These all 
>> offer a way to get at least (content-length + last-modified-time) without 
>> fetching the entire content. So, we could have a number of Resource 
>> implementations that sit directly on top of the transport and which don't 
>> care about caching, and a single Resource implementation that sits on top of 
>> this to apply this caching algorithm. This would allow us, for example, to 
>> start efficiently caching file resources, regardless of whether they are 
>> sitting on local or network file system.
> 
> I've actually made this kind of thing NOT the responsibility of the 
> ExternalResource (named so to differentiate from Ivy's Resource type) object.
> 
> https://github.com/gradle/gradle/blob/master/subprojects/core-impl/src/main/groovy/org/gradle/api/internal/externalresource/transfer/ExternalResourceAccessor.java
> 
> My thinking was that we are likely to have different strategies for cache 
> optimisations here for different transports. That's starting to look like 
> that's not going to be the case.

I don't think it matters too much at this stage.

We want to keep the transports and the caching as separate as possible, so we 
can reuse the caching across transports. This may not necessarily mean that 
every caching strategy will work with every transport, but it would be nice to 
have at least one strategy that can work across any transport. And it looks 
like the 'best' option we've come up with for http also happens to be a generic 
option, too (except for the etag check, but I'm sure we can deal with that), so 
perhaps we only need one strategy. At some point we might end up with some 
other transport-specific strategies, but ideally we can base these on optional 
abstract capabilities of the transport (e.g. 'can you provide 
content-length+last-modified-time efficiently?', 'can you do a 
get-content-if-sha1-does-not-match?' and so on) rather than on concrete 
transports.


--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
VP of Engineering, Gradleware Inc. - Gradle Training, Support, Consulting
http://www.gradleware.com

Reply via email to