Flexible Cache Management discussion (was Re: [jira] Commented: (IVY-399) Flexible Cache Management)

Xavier Hanin Sun, 06 May 2007 02:44:04 -0700

Hi All,

As you can see on my comment on IVY-399, I've started working on
IVY-399, by first listing all the files which are currently put in the
cache, with their default location, their content and their use. I
think this is a very good starting point for a discussion about how we
can actually review the cache management to be more flexible and
address IVY-399 (see my comment on IVY-399, included at the end of
this mail).


For the moment I think we can distinguish two main kind of cached data:
* 'resolve' related data, mainly used to be able to reuse the result
of the resolve process
This includes all files which are for the moment stored by default at
the root of the cache: ResolvedIvyFileInCache,
ResolvedIvyPropertiesInCache, ConfigurationResolveReportInCache
I think the need expressed in IVY-399 doesn't concern those files,
which could thus be kept as is for the moment.
* repository related data, used to cache data to avoid downloading
from repository each time they are required. Those files are by
default located in [organisation]/[module]. I think IVY-399 intent is
to make those files under the responsibility of the dependency
resolvers only, to let the resolver cache them or not, and decide
where to put them. Here is the list of files which fall in this
category:
IvyFileInCache, ArchiveFileInCache, CachedDataFile
For the moment ArchiveFileInCache already supports a mechanism to
avoid copying it to the cache, called use origin. This mechanism
relies on the original location of the artifact which is stored in the
CachedDataFile. But this mechanism is not flexible enough, and not as
clean as what is suggested in IVY-399. Therefore I think we have to
review this mechanism as part of IVY-399.

I'm still not sure how the implementation of IVY-399 should be done,
but in regard to this first analysis, I see two main parts:
1) make classes which were reading from the cache (listed in the used
by in my analysis) delegate to a resolver related information. This
can be done either by delegating to the resolver directly, or
delegating to a per resolver cache manager.
2) review BasicResolver, which is currently the main producer of
cached data, and also the main user, to make its cache management more
flexible. The first way to do that I previously foreseen was to make
it cache independent (do not rely on a cache at all), and provide a
wrapper handling cache and delegating to another resolver when the
cache doesn't contain enough information. Then using directly a not
wrapped resolver would be enough to avoid any caching. The problem is
that I think that the wrapping won't be easy, since cache management
is deeply tight to BasicResolver algorithm. Hence I see another
solution: make BasicResolver delegate all its cache related operation
to a class implementing an interface we will have to define, but which
could be called ResolverCacheManager. Then we would have an
implementation called ResolverNoCacheManager, which would simply do no
cache management at all. Another implementation would do something
very similar to the current way of dealing with cache (and that would
be the default to preserve backward compatibility). The advantage of
this solution is that it's far easier to implement than the first one
from current code base. We could also easily ask the resolver its
cache manager to implement point #1.
The settings could reflect that, by providing a way to configure the
resolver cache manager for each resolver. This could for example be an
inner element. For instance:
<ibiblio name="public">
 <cache ref="default" />
</ibiblio>
=> to get the default cache management (this would be the default)
---
<ibiblio name="public">
 <cache ref="nocache" />
</ibiblio>
=> to avoid cache management for the resolver
---
<ibiblio name="public">
 <cache name="myPublicCache" location="${user.home}/.ivy/cache/ibiblio">
   <ivy pattern="[organisation]/[module]/[revision]/ivy.xml" />
 </cache>
</ibiblio>
=> to use a cache manager configured specifically, caching files in
${user.home}/.ivy/cache/ibiblio, and with a customized pattern for ivy
files.
---
<ibiblio name="public">
 <mycache name="myCustomCache" cacheArtifact="false" />
</ibiblio>
=> to use a custom resolver cache manager implementation, previously
typedefed as 'mycache'


With such a change, one thing that wouldn't make sense any more is the
saved resolver and artifact resolver, so the only information we would
have in CachedDataFile would be artifact origin (for resolvers using a
cache).

Overall, I'm beginning to feel comfortable with this idea of using a
delegate ResolverCacheManager, I think this would answer to the needs
expressed in IVY-399, without requiring too much refactoring in the
current code base. I think I can find time to start working on this
implementation next week.

WDYT?

Xavier

On 5/6/07, Xavier Hanin (JIRA) <[EMAIL PROTECTED]> wrote:


    [ 
https://issues.apache.org/jira/browse/IVY-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493914
 ]

Xavier Hanin commented on IVY-399:
----------------------------------

To start working on this issue with a clean view of how the cache is currently 
implemented in Ivy, I've listed all files which are cached, with their 
location, content and use:
{noformat}
All default locations are relative to the cache root.

ResolvedIvyFileInCache
======================
default location:
    resolved-[organisation]-[module]-[revision].xml
content:
    the representation as an ivy file of the module descriptor used for 
dependency resolution
created by:
    ResolveEngine#resolve(ModuleDescriptor md, ResolveOptions options)
used by:
    PublishEngine#publish(ModuleRevisionId mrid, Collection srcArtifactPattern, 
String resolverName, PublishOptions options)
        => when no src ivy pattern is provided, the cached file is used
    DeliverEngine#deliver(ModuleRevisionId mrid, String revision, String 
destIvyPattern, DeliverOptions options)
        => used as basis for the ivy file to deliver
    RetrieveEngine#getConfs(ModuleRevisionId mrid, RetrieveOptions options)
        => to list the configurations of a module to retrieve, when they are 
not provided in the RetrieveOptions, or provided as '*' wildcard

ResolvedIvyPropertiesInCache
============================
default location:
    resolved-[organisation]-[module]-[revision].properties
content:
    a map of resolved revision and status by dependency, stored as properties
created by:
    ResolveEngine#resolve(ModuleDescriptor md, ResolveOptions options)
used by:
    DeliverEngine#deliver(ModuleRevisionId mrid, String revision, String 
destIvyPattern, DeliverOptions options)
        => used to replace dynamic versions by static ones, and do recursive 
delivery according to the dependencies status

ConfigurationResolveReportInCache
=================================
default location (not configurable):
    resolveId + "-" + conf + ".xml" (usually resolvedId is 
[organisation]-[module])
content:
    xml resolution report for one configuration
created by:
    XmlReportOutputter#output(ConfigurationResolveReport report, String 
resolveId, String[] confs, File destDir)
used by:
    ConfigurationResolveReport
        => to load previous dependencies to know if they have changed
    RetrieveEngine#determineArtifactsToCopy(ModuleRevisionId mrid, String 
destFilePattern, RetrieveOptions options)
    IvyReport
    IvyRepositoryReport
    IvyArtifactProperty
    IvyCacheTask (basis of cachepath and cachefileset)
    Main#outputCachePath
        => to know the details of the last resolve operation

IvyFileInCache
==============
default location:
    [organisation]/[module]/ivy-[revision].xml
content:
    an ivy file representation of a downloaded module descriptor, converted 
into the system namespace
created by:
    BasicResolver#getDependency(DependencyDescriptor dd, ResolveData data)
used by:
    CacheManager#findModuleInCache(ModuleRevisionId mrid, boolean validate)
        => common method used to reload a module information from cache. This 
method is in turn used by:
            * BasicResolver#parse => to avoid downloading a module descriptor 
before parsing it (used when the version matcher requires metadata parsing)
            * BasicResolver#getDependency => to avoid downloading a module 
descriptor when resolving a dependency
            * CacheResolver#getDependency => it's the purpose of the 
CacheResolver to reuse module descriptors from cache
    BasicResolver#parse
        => to check that the resolver does not point directly to the cache, 
which is forbidden
    RetrieveEngine#retrieve(ModuleRevisionId mrid, String destFilePattern, 
RetrieveOptions options)
        => when the retrieve operation is asked to retrieve module descriptors, 
files from the cache are used

ArchiveFileInCache
==================
default location:
    [organisation]/[module]/[type]s/[artifact]-[revision](.[ext])
content:
    a module artifact
created by:
    BasicResolver#download(Artifact[] artifacts, DownloadOptions options)
used by:
    IvyCachePath#execute()
    IvyCacheFileset#execute()
        => to know which file should be put in the path or fileset
    RetrieveEngine#retrieve(ModuleRevisionId mrid, String destFilePattern, 
RetrieveOptions options)
        => to know which files should be copied
    Main#invoke
        => to build the classloader to use for launching
    Main#outputCachePath
        => to report the paths of artifacts
    IvyArtifactReport#writeCacheLocation
        => to add the artifact cache location in the artifact report
    CacheResolver#download
        => to check the file exists in cache
    BasicResolver#parse
        => to delete old artifacts when downloading a module revision which has 
changed

CachedDataFile
==============
default location:
    [organisation]/[module]/ivydata-[revision].properties
content:
    Artifact Origin (is local + location)
    Resolver Name
    Artifacts Resolver Name (different from Resolver in case of dual resolver)
created by:
    the cache manager to store the information listed in content
used by:
    the cache manager to load information listed in content.
    This information is used by:
    ArtifactOrigin
        => used by the same classes as ArchiveFileInCache
    Resolve and Artifact Resolver
        => used by findModuleInCache, to reassociate a cached module to the 
resolvers which originally resolved it
        Side Note: this causes problems when trying to reuse a same cache with 
different settings, causing warning messages 'unknown resolver xxx'
{noformat}

> Flexible Cache Management
> -------------------------
>
>                 Key: IVY-399
>                 URL: https://issues.apache.org/jira/browse/IVY-399
>             Project: Ivy
>          Issue Type: Improvement
>         Environment: ALL
>            Reporter: Eric Crahen
>
> Creating an issue at Xaviers request for improving the approach to cache 
management
> On 1/29/07, Xavier Hanin <[EMAIL PROTECTED]> wrote:
>     Supporting this kind of graph
>     could be interesting, and what makes it difficult for Ivy is that Ivy
>     heavily relies on its cache mechanism, which makes it impossible to do
>     what you want (i.e. never put anything from your local repository to
>     the cache).
> This would be a very powerful feature to add. In 2.0, is there any reason for 
the cache to have to be so baked into everything? In otherwords, why not implement 
every resolver and all of the internal management w/ no caching what so ever baked 
in anywhere? Instead all caching is done in a decorator fashion by wrapping a 
caching resolver around any other resolver? In otherwords, the core of Ivy only 
knows about resolvers, no concept of cache exists in the heart of Ivy.
> It seems to me this would be much more flexible, and it would still be very 
possible to provide the syntactic sugar to make it very simple and even seemless 
to configure these wrappers by default. At the same time, people who will use the 
flexibility have the power to set up chains that might go something like.
> (logical chain)
>   localresolver
>   cacheresolver
>     httpresolver url="..."
>   cacheresolver
>     httpresolver url="..."
> There is no longer any need to have things like useLocal flags. Its already 
expressed that the local resolver is not cached because its just not wrapped in a 
caching resolver.
> I think this idiom should be applied to both artifact and metadata resolution.
> One cool thing about this, is that in this way, since all caching is simply a 
type of resolver we'd provide people who don't like the particular method we use 
to perform caching in the resolver we provide are free to provide their own. This 
would address lots of the issues that have been raised about caching, consistency, 
doing anything remotely fancy with local resolvers - right now its very hard to 
address any of that because caching is not very plugable as it stands.
> I think the only drawback is that it seems like its harder to configure out 
of the box because most people by default would want to wrap every resolver with a 
cacheresolver - but like I said, this is easily solvable by providing some simple 
syntactic sugar. For instance the simplehttpresolver might be the name of an 
undecorated resolver for power users, and the things named httpresolver would 
simple be an alias for the cacheresolver wrapped around the simplehttpresolver (or 
subclass, whatever is the most sensible choice)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



--
Xavier Hanin - Independent Java Consultant
Manage your dependencies with Ivy!
http://incubator.apache.org/ivy/

Flexible Cache Management discussion (was Re: [jira] Commented: (IVY-399) Flexible Cache Management)

Reply via email to