[ 
https://issues.apache.org/jira/browse/MNG-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472093#comment-17472093
 ] 

Thomas Skjølberg commented on MNG-7389:
---------------------------------------

[~michael-o] please elaborate. Is the dependencies component not the right home 
for managing the cache?

> Incremental .m2 cache cleanup for CI
> ------------------------------------
>
>                 Key: MNG-7389
>                 URL: https://issues.apache.org/jira/browse/MNG-7389
>             Project: Maven
>          Issue Type: New Feature
>          Components: Dependencies
>            Reporter: Thomas Skjølberg
>            Priority: Minor
>
> One or more popular continous integration are unable to properly manage the 
> .m2 repository cache, resulting in wasted resources in the form of increased 
> CI runtime and bandwidth consumption.
> *CircleCI cache behaviour:*
>  - immutable cache entries
>  - default behaviour is to wipe the cache each time a pom file is modified 
> (i.e. using pom hash as a cache key)
>  - cache entries TTL > weeks
> So CircleCI always has a cache containing only the necessary artifacts, but 
> has to download all dependencies every time the pom file changes.
> *Github Actions cache behaviour*
>  - (effectively) mutable cache entries
>  - incremental cache (if it gets too big, it is wiped).
>  - cache entries TTL 1 week
> So Github actions work well if the cache entries expire from time to time, 
> otherwise the cache keeps growing.
> *Summary*
> Perhaps this does not look so bad at first glance, but for a project under 
> active development, with a lot of artifacts, the pom file changes often. For 
> example we have apps with 100 dependencies and automatic dependency bumping 
> via Renovate, in addition to an hierarchy of libraries.
> Key takeaways; time is wasted
>  - saving caches in CI
>  - loading cache in CI
>  - loading artifacts from external artifact store
> This happens quite a lot. From the artifact store perspective, this probably 
> multiplies the load by a factor of 10.
> Possible solution: A way to define a "transaction" for artifact use, i.e.
> 1. run command to mark start of transaction 
> 2. run one or more maven commands
> 3. run command to mark end of transaction, deleting artifacts not in use.
> For reference, Gradle has the same problem.
> Proof of concept:
>  * CircleCI : [https://github.com/entur/maven-orb]
>  * Github actions: [https://github.com/skjolber/tidy-cache-github-action]
> The implementation uses instrumentation to record artifact access, then 
> delete the artifacts not recorded. 
> *Alternatives:*
> I did try the last-accessed file timestamp first, turns out most CI 
> filesystems are mounted without that option. However it should also be 
> possible to update the modified timestamp and/or add read access to some 
> existing metadata file. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to