On 30/03/2012, at 9:24 PM, Adam Murdoch <[email protected]> wrote:

> 
> On 30/03/2012, at 8:44 PM, Luke Daley wrote:
> 
>> 
>> On 30/03/2012, at 5:53 AM, Adam Murdoch wrote:
>> 
>>> 
>>> On 30/03/2012, at 8:35 AM, Daz DeBoer wrote:
>>> 
>>>> After a little pondering, I'd favour an approach that is simple to 
>>>> describe and doesn't result in unexpected behaviour; I think an extra HEAD 
>>>> request here or there is ok.
>>>> 
>>>> How about we perform a HEAD request if we have any cache candidates, be 
>>>> they local files or previous accesses to this URL.
>>>> So the logic would be:
>>>> Do we have any cache candidates? 
>>>> If not, just HTTP GET the resource and we're done.
>>>> HTTP HEAD to get the resource meta-data (and possibly the SHA1)
>>>> If we got a 404, the resource is missing, we're done.
>>>> If we match a cached URL resource, just use it and we're done.
>>>> If we have a local file candidate, HTTP GET SHA1. 
>>>> If published SHA1 was found and matches then we can cache the URL resource 
>>>> and we're done. 
>>>> HTTP GET the actual resource
>>>> Pros:
>>>> - We can get the SHA1 from the headers if available, and avoid the 
>>>> GET-SHA1 call.
>>>> - If a local file matches, we can cache the URL resolution as if we did an 
>>>> HTTP GET, since we have the full HTTP headers + the content. We never have 
>>>> a cached resource without an origin.
>>>> - After initially using a file from say .m2/repo to satisfy a request, 
>>>> from then on it will be just like we actually downloaded it from the URL. 
>>>> So there are no residual effects of using a local file in place of a 
>>>> downloaded one. Use of local files is a pure optimisation.
>>>> - If the artifact is missing altogether, we get a single 404 for the HEAD, 
>>>> rather than 404 for the SHA1 + 404 for the GET
>>>> - It's simpler to understand, I think.
>>> 
>>> - This approach works nicely as a decoration over all the transports we're 
>>> interested in (http, sftp, webdav, local file, network file). These all 
>>> offer a way to get at least (content-length + last-modified-time) without 
>>> fetching the entire content. So, we could have a number of Resource 
>>> implementations that sit directly on top of the transport and which don't 
>>> care about caching, and a single Resource implementation that sits on top 
>>> of this to apply this caching algorithm. This would allow us, for example, 
>>> to start efficiently caching file resources, regardless of whether they are 
>>> sitting on local or network file system.
>> 
>> I've actually made this kind of thing NOT the responsibility of the 
>> ExternalResource (named so to differentiate from Ivy's Resource type) object.
>> 
>> https://github.com/gradle/gradle/blob/master/subprojects/core-impl/src/main/groovy/org/gradle/api/internal/externalresource/transfer/ExternalResourceAccessor.java
>> 
>> My thinking was that we are likely to have different strategies for cache 
>> optimisations here for different transports. That's starting to look like 
>> that's not going to be the case.
> 
> I don't think it matters too much at this stage.
> 
> We want to keep the transports and the caching as separate as possible, so we 
> can reuse the caching across transports. This may not necessarily mean that 
> every caching strategy will work with every transport, but it would be nice 
> to have at least one strategy that can work across any transport. And it 
> looks like the 'best' option we've come up with for http also happens to be a 
> generic option, too (except for the etag check, but I'm sure we can deal with 
> that), so perhaps we only need one strategy. At some point we might end up 
> with some other transport-specific strategies, but ideally we can base these 
> on optional abstract capabilities of the transport (e.g. 'can you provide 
> content-length+last-modified-time efficiently?', 'can you do a 
> get-content-if-sha1-does-not-match?' and so on) rather than on concrete 
> transports.

This is more or less what we have now.

https://github.com/gradle/gradle/blob/master/subprojects/core-impl/src/main/groovy/org/gradle/api/internal/externalresource/transfer/DefaultCacheAwareExternalResourceAccessor.java

The ExternalResourceAccessor contract is kinda flexible so I think we could 
make it work for most transports and still use this general caching algorithm.

At the moment this is built in ExternslResourceRepository, but we could allow 
injection of a custom one easily enough.

There are two things I still four things I want to do before wrapping this up:

* treat 403 and 405 HEAD requests as "metadata unknown"
* if the server is googlecode, treat 404 HEAD requests as "metadata unknown"
* when reusing a locally found resource, store the real metadata in the index.
* where it's safe to, extract the sha1 from the etag (e.g. Artifactory).

All of this things are small.

Reply via email to