On 22/03/2012, at 7:15 AM, Daz DeBoer wrote: > It looks like storing the artifact lastModified date is required for the > url->artifact cache only, and the artifact url is not required for either the > artifactId->artifact cache or the url->artifact cache.
Right, in the earlier implementations it was used but not anymore. > I guess one question is whether the lastModified date can actually be > considered some sort of meta-data about the artifact itself, or if it's > actually more like meta-data about the artifact retrieval. I kind of think > the lastModified date should be only used when checking if a particular URL > retrieval is up-to-date: and that this value should not leak into the model > further. I've got no problem conceptualising it as meta data about the resolution. Say if we provided some kind of report on the contents of the cache relevant to the particular build you are working on. I can see providing the url and last modified of the artifacts when they were resolved in a particular repository being useful information. > At the moment, if we get the same artifact from 2 different URLs in the same > repository, it looks like we will overwrite the first retrieval (lastModified > + url) when we cache the second. This doesn't feel great to me. For the “by-repository” cache, yes. But that in itself is interesting information (i.e. given the same repository and module vectors we got something at a different URL). It becomes more interesting if the repository is actually serving back redirects too. We could potentially then do some short circuiting here by using this info (need to think about that more). > So while I quite like the fact that these 2 caches are consistent in API, I > don't know if they accurately model the information available. The issue with > diverging them, of course, is that you may lose that nice shared > implementation. Conceptually it fits nicely for me. We are caching an artifact resolution, indexed by different attributes and that resolution is the same thing. There are certain constraints on how you can reliably use that information for optimisations based on external factors (e.g. with HTTP, once an entity's url changes then there is no guarantee that last modified has any relation to its previous url) but that's not a characteristic of the metadata itself. I guess my opinion is that the shared abstraction is nice, and coherent in my opinion, but we are storing data that we are not using right now. So our options are: #1 - leave it the way it is and live with the cost of storing this extra metadata (i.e. cache file size and extra serialisation cost) #2 - optimise to only store what is strictly needed and introduce more concepts and types My vote is for 1, but given the criticality of this section of code I don't feel comfortable making that decision on my own. -- Luke Daley Principal Engineer, Gradleware http://gradleware.com
