[ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
------------------------------
    Attachment:     (was: skip-duplicate.png)

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> -----------------------------------------------------------------------
>
>                 Key: MRESOLVER-247
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-247
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>    Affects Versions: 1.7.3
>            Reporter: wei cai
>            Priority: Major
>         Attachments: exclusion-matters.png, skip-duplicate-1.png, 
> skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children (D & E in this case) need to be 
> recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 
> is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as 
> it is more left than the last resolved R#3 node. This is because maven picks 
> up widest scope present among conflicting dependencies (scopes of the 
> conflicts may differ), we need to *retain the conflict paths* by using a 
> strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
> R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
> R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve 
> R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
> already resolved at depth 2. This is because the same node with deeper depth 
> won't be picked up by maven as maven employs a "nearest transitive dependency 
> in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
> know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is 
> having a different version and it conflicts with D1, D2 will be skipped as it 
> won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> dependency:tree* and *mvn dependency:list* result with/without the 
> performance patch.
> Please help review the PR.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to