We have a large dependency which has 300+ transitive dependencies, let's name
the dependency as BigDep1.
We have large numbers of libraries that depend on BigDep1. We may add
exclusions when we use these libraries in our project.
<dependency>
<groupId>com.company...</groupId>
<artifactId>Lib1</artifactId>
<exclusion>
<groupId>some_group_id</groupId>
<artifactId>some_artifact_id</artifactId>
</exclusion>
</dependency>
It took long time and huge memory to buid the project, we saw the BigDep1 is
resolved thousands of times without hit from memory cache...
By checking the code, we can see Maven is trying to load the resolved result
of BigDep1 from cache, but as debugged it always failed to get the cached
result.
We can see the key is determined by GAV, repositories, childSelector,
childManager, childTraverser, childFilter, this means exclusions is considered
as part of the key.
https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L504
| Object key = |
| | args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager,
childTraverser, childFilter ); |
| | |
| | List<DependencyNode> children = args.pool.getChildren( key ); |
| | if ( children == null ) => always null. need recalculate and again save to
cache which takes long time and consumes large memory |
| | { |
| | args.pool.putChildren( key, child.getChildren() ); |
| | |
| | args.nodes.push( child ); |
| | |
| | process( args, results, descriptorResult.getDependencies(), childRepos,
childSelector, childManager, |
| | childTraverser, childFilter ); |
| | |
| | args.nodes.pop(); |
| | } |
Let me use a simple pattern to describe the problem:
Lib1 -> BigDep1
Lib2 -> Lib3 (has exclusion) -> BigDep1
Lib4 -> Lib2
...
Now in our project, we use libraries: Lib1, Lib2 , Lib4 with exclusions.
Project -> Lib1
Project -> Lib2
Project -> Lib4 (has exclusion)
Here is how maven resolve the dependencies:
maven starts to resolve Lib1, Lib1 -> BigDep1. maven first resolves BigDep1 and
caches BigDep1 in memory
maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, as Lib3
has exclusion, so maven cannot load BigDep1 from cache and calculate BigDep1
again.
maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2
->Lib3 -> BigDep1, as Lib4 has exclusion, so maven cannot load Lib2, Lib3,
BigDep1 from cache, all of them recalculated.
I'm thinking if we can use GAV as the cache key and apply the exclusions later.
maven can resolve the dependencies in this way:
maven starts to resolve Lib1, maven first resolves BigDep1 and caches BigDep1
by using BigDep1's GAV as key.
maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, maven
get BigDep1 from cache, then calc Lib3 without applying exclusion and cache the
result with Lib'3 GAV.
when maven comes to resolve Lib2, maven starts to apply Lib3's exclusion to
Lib3, add Lib3 with exclusion as children of Lib2 and then cache Lib2.
maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2
->Lib3 -> BigDep1, maven get Lib2 from cache, then calc Lib4 without applying
the exclusion and then cache Lib4.
when maven comes to resolve the current project, maven applies Lib4's
exclusion, add Lib4 with exclusion as children of Project module, and then
cache Project's resolved result.
Does this make sense?
This means all libraries' resolved result are cached with its GAV.
Only the one which depends on it need to load the result from cache and apply
exclusions if any.
Thanks,
Eric