We have a large dependency which has 300+ transitive dependencies, let's name 
the dependency as BigDep1.


We have large numbers of libraries that depend on BigDep1. We may add 
exclusions when we use these libraries in our project.
<dependency>
<groupId>com.company...</groupId>
<artifactId>Lib1</artifactId>
<exclusion>
    <groupId>some_group_id</groupId>
    <artifactId>some_artifact_id</artifactId>
  </exclusion>
</dependency>  


It took long time and huge memory to buid the project, we saw the BigDep1 is 
resolved thousands of times without hit from memory cache...


By checking the code,  we can see Maven is trying to load the resolved result 
of BigDep1 from cache, but as debugged it always failed to get the cached 
result.
We can see the key is determined by GAV, repositories, childSelector, 
childManager, childTraverser, childFilter, this means exclusions is considered 
as part of the key.
https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L504
| Object key = |
| | args.pool.toKey( d.getArtifact(), childRepos, childSelector, childManager, 
childTraverser, childFilter ); |
| | |
| | List<DependencyNode> children = args.pool.getChildren( key ); |
| | if ( children == null ) => always null. need recalculate and again save to 
cache which takes long time and consumes large memory |
| | { |
| | args.pool.putChildren( key, child.getChildren() ); |
| | |
| | args.nodes.push( child ); |
| | |
| | process( args, results, descriptorResult.getDependencies(), childRepos, 
childSelector, childManager, |
| | childTraverser, childFilter ); |
| | |
| | args.nodes.pop(); |
| | } |


Let me use a simple pattern to describe the problem:


Lib1 -> BigDep1
Lib2 -> Lib3 (has exclusion) -> BigDep1
Lib4 -> Lib2
...


Now in our project, we use libraries: Lib1, Lib2 , Lib4 with exclusions.


Project -> Lib1
Project -> Lib2
Project -> Lib4 (has exclusion)


Here is how maven resolve the dependencies:
maven starts to resolve Lib1, Lib1 -> BigDep1. maven first resolves BigDep1 and 
caches BigDep1 in memory
maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, as Lib3 
has exclusion, so maven cannot load BigDep1 from cache and calculate BigDep1 
again. 
maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 
->Lib3 -> BigDep1,  as Lib4 has exclusion, so maven cannot load Lib2, Lib3, 
BigDep1 from cache, all of them recalculated.


I'm thinking if we can use GAV as the cache key and apply the exclusions later. 
maven can resolve the dependencies in this way:
maven starts to resolve Lib1, maven first resolves BigDep1 and caches BigDep1 
by using BigDep1's GAV as key.
maven starts to resolve Lib2, Lib2 -> Lib3 (has exclusion) -> BigDep1, maven 
get BigDep1 from cache, then calc Lib3 without applying exclusion and cache the 
result with Lib'3 GAV.
when maven comes to resolve Lib2, maven starts to apply Lib3's exclusion to 
Lib3, add Lib3 with exclusion as children of Lib2 and then cache Lib2. 
maven starts to resolve Lib4 (has exclusion), Lib4 (has exclusion) -> Lib2 
->Lib3 -> BigDep1,  maven get Lib2 from cache, then calc Lib4 without applying 
the exclusion and then cache Lib4.
when maven comes to resolve the current project, maven applies Lib4's 
exclusion, add Lib4 with exclusion as children of Project module, and then 
cache Project's resolved result. 


Does this make sense?


This means all libraries' resolved result are cached with its GAV.
Only the one which depends on it need to load the result from cache and apply 
exclusions if any.


Thanks,
Eric 

Reply via email to