On 22/03/2012, at 7:15 AM, Daz DeBoer wrote:

> It looks like storing the artifact lastModified date is required for the 
> url->artifact cache only, and the artifact url is not required for either the 
> artifactId->artifact cache or the url->artifact cache.

Right, in the earlier implementations it was used but not anymore.

>  I guess one question is whether the lastModified date can actually be 
> considered some sort of meta-data about the artifact itself, or if it's 
> actually more like meta-data about the artifact retrieval. I kind of think 
> the lastModified date should be only used when checking if a particular URL 
> retrieval is up-to-date: and that this value should not leak into the model 
> further.

I've got no problem conceptualising it as meta data about the resolution. Say 
if we provided some kind of report on the contents of the cache relevant to the 
particular build you are working on. I can see providing the url and last 
modified of the artifacts when they were resolved in a particular repository 
being useful information.

> At the moment, if we get the same artifact from 2 different URLs in the same 
> repository, it looks like we will overwrite the first retrieval (lastModified 
> + url) when we cache the second. This doesn't feel great to me.

For the “by-repository” cache, yes. But that in itself is interesting 
information (i.e. given the same repository and module vectors we got something 
at a different URL). It becomes more interesting if the repository is actually 
serving back redirects too. We could potentially then do some short circuiting 
here by using this info (need to think about that more).

> So while I quite like the fact that these 2 caches are consistent in API, I 
> don't know if they accurately model the information available. The issue with 
> diverging them, of course, is that you may lose that nice shared 
> implementation.


Conceptually it fits nicely for me. We are caching an artifact resolution, 
indexed by different attributes and that resolution is the same thing. There 
are certain constraints on how you can reliably use that information for 
optimisations based on external factors (e.g. with HTTP, once an entity's url 
changes then there is no guarantee that last modified has any relation to its 
previous url) but that's not a characteristic of the metadata itself.

I guess my opinion is that the shared abstraction is nice, and coherent in my 
opinion, but we are storing data that we are not using right now.

So our options are:

#1 - leave it the way it is and live with the cost of storing this extra 
metadata (i.e. cache file size and extra serialisation cost)
#2 - optimise to only store what is strictly needed and introduce more concepts 
and types

My vote is for 1, but given the criticality of this section of code I don't 
feel comfortable making that decision on my own.

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com

Reply via email to