Re: Flexible Cache Management discussion (was Re: [jira] Commented: (IVY-399) Flexible Cache Management)

Xavier Hanin Sun, 06 May 2007 06:18:08 -0700

On 5/6/07, Stephane Bailliez <[EMAIL PROTECTED]> wrote:

Xavier Hanin wrote:
> [...]
> IvyFileInCache, ArchiveFileInCache, CachedDataFile
> For the moment ArchiveFileInCache already supports a mechanism to
> avoid copying it to the cache, called use origin. This mechanism
> relies on the original location of the artifact which is stored in the
> CachedDataFile. But this mechanism is not flexible enough, and not as
> clean as what is suggested in IVY-399. Therefore I think we have to
> review this mechanism as part of IVY-399.


+1 too much duplication in this artifact origin.
All this information should be maintained in every artifact (source,
lastmodified) metadata when 'resolved'.

>   [...]Hence I see another
> solution: make BasicResolver delegate all its cache related operation
> to a class implementing an interface we will have to define, but which
> could be called ResolverCacheManager. Then we would have an
> implementation called ResolverNoCacheManager, which would simply do no
> cache management at all.

What operations are in a ResolverCacheManager ? can you elaborate ?

The operation I see for the moment are very basic, and would be very
similar to a part of what can currently be found in CacheManager. For
instance:
File getArchiveFileInCache(Artifact artifact)
File getIvyFileInCache(ModuleRevisionId mrid)
ArtifactOrigin getSavedArtifactOrigin(Artifact artifact)

I don't know if getSavedArtifactOrigin(Artifact artifact) will
actually be necessary. Maybe we could make this interface simpler with
something like:

File getArchiveFile(Artifact)
=> returns the location of the artifact as a File, which can be either
in the cache or at it's original location if the artifact is not
cached but used directly. We could use this method also for Ivy files
(using DefaultArtifact.newIvyArtifact(ModuleRevisionId mrid, Date
pubDate) as artifact).

String getOriginLocation(Artifact)
=> returns the location of an artifact in the repository. This is
usually an URL, but depends on the DependencyResolver implementation.
This would usually be used for reporting only.

To make BasicResolver actually able to delegate to the
ResolverCacheManager, we should also add methods like:
void cacheArtifact(Artifact, InputStream)
=> copies the input stream to the cache file for the given artifact

I have a hard time seeing the difference with a cache and nullcache.

Indeed now that I push the reflection further I have troubles to
clearly see the separation of responsibilities between the resolver
and the cache. Indeed to implement the method
getArtifactFile(Artifact), the cache manager can know the answer only
if it actually caches the artifact file. If it doesn't cache it, only
the resolver can know it.

So maybe a solution is to cache the origin location (as is currently
done in CachedDataFile) to be able to return this location. This means
that even a nullcache would have to persistently store the origin
location of artifacts.

Another solution is to make the BasicResolver aware of the nature of
the cache manager (for example if the cache manager has to implement
something like boolean doCacheArtifact(Artifact)). In this case the
resolver could take the responsibility of implementing
getArchiveFile(Artifact), delegating to the CacheManager only if it
returns true to doCacheArtifact(Artifact) and return an existing file
to getArchiveFile(Artifact). Otherwise it would have to go through its
patterns (or usual process to find an artifact location in the
repository) to find the actual file. This means that it wouldn't
perform very well if you have a lot of possible locations.

So therefore I think I'm in favor of the first solution: store the
origin location in a way similar to what we do for the moment. But I'd
be happy to change my mind if someone has a better idea, because for
the moment things are still not fully clear for me, I think I need to
start the implementation to better see all the implications.


> [...]
> With such a change, one thing that wouldn't make sense any more is the
> saved resolver and artifact resolver, so the only information we would
> have in CachedDataFile would be artifact origin (for resolvers using a
> cache)
mmm ? Well the artifact data and artifact information are intended to
cached.
The source (ie origin) should be part of the metadata (ie attributes) of
the artifact and follow it all the time.

What do you mean by the attributes of the artifact? Are you speaking
about something in memory, or persistent?

In memory the Artifact object when it is created do not know anything
about its actual location. Moreover retrieving the source location of
an Artifact can be a costly operation, because you have to delegate to
dependency resolver. That's why I think that keeping the
ArtifactOrigin object separated from the Artifact has a sense.

Concerning the persistent storage of information concerning the
artifact, the artifact file itself cannot be modified to store this
metadata, that's why we have to store them in a separate file. Do you
see another option?

BTW, now that I think more about the problem, I think we will even
still need to store the resolver and artifact resolver information, at
least when the resolver directly associated with the Module is a
compound resolver. Otherwise we would have to go through the compound
resolver itself whenever we want to reload the module from cache. This
should be the responsibility of the compound resolvers cache, so that
we can avoid to store this information when not necessary.

Xavier

Re: Flexible Cache Management discussion (was Re: [jira] Commented: (IVY-399) Flexible Cache Management)

Reply via email to