On 30/03/2012, at 5:53 AM, Adam Murdoch wrote: > > On 30/03/2012, at 8:35 AM, Daz DeBoer wrote: > >> After a little pondering, I'd favour an approach that is simple to describe >> and doesn't result in unexpected behaviour; I think an extra HEAD request >> here or there is ok. >> >> How about we perform a HEAD request if we have any cache candidates, be they >> local files or previous accesses to this URL. >> So the logic would be: >> Do we have any cache candidates? >> If not, just HTTP GET the resource and we're done. >> HTTP HEAD to get the resource meta-data (and possibly the SHA1) >> If we got a 404, the resource is missing, we're done. >> If we match a cached URL resource, just use it and we're done. >> If we have a local file candidate, HTTP GET SHA1. >> If published SHA1 was found and matches then we can cache the URL resource >> and we're done. >> HTTP GET the actual resource >> Pros: >> - We can get the SHA1 from the headers if available, and avoid the GET-SHA1 >> call. >> - If a local file matches, we can cache the URL resolution as if we did an >> HTTP GET, since we have the full HTTP headers + the content. We never have a >> cached resource without an origin. >> - After initially using a file from say .m2/repo to satisfy a request, from >> then on it will be just like we actually downloaded it from the URL. So >> there are no residual effects of using a local file in place of a downloaded >> one. Use of local files is a pure optimisation. >> - If the artifact is missing altogether, we get a single 404 for the HEAD, >> rather than 404 for the SHA1 + 404 for the GET >> - It's simpler to understand, I think. > > - This approach works nicely as a decoration over all the transports we're > interested in (http, sftp, webdav, local file, network file). These all offer > a way to get at least (content-length + last-modified-time) without fetching > the entire content. So, we could have a number of Resource implementations > that sit directly on top of the transport and which don't care about caching, > and a single Resource implementation that sits on top of this to apply this > caching algorithm. This would allow us, for example, to start efficiently > caching file resources, regardless of whether they are sitting on local or > network file system.
I've actually made this kind of thing NOT the responsibility of the ExternalResource (named so to differentiate from Ivy's Resource type) object. https://github.com/gradle/gradle/blob/master/subprojects/core-impl/src/main/groovy/org/gradle/api/internal/externalresource/transfer/ExternalResourceAccessor.java My thinking was that we are likely to have different strategies for cache optimisations here for different transports. That's starting to look like that's not going to be the case. -- Luke Daley Principal Engineer, Gradleware http://gradleware.com
