[ 
https://issues.apache.org/jira/browse/MRESOLVER-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17466764#comment-17466764
 ] 

wei cai commented on MRESOLVER-228:
-----------------------------------

[~ibabiankou] 

Thanks for your comments!  Kudos for the excellent work on MRESOLVER-133  and 
MRESOLVER-7.

I agree combining MRESOLVER-133, MRESOLVER-7 and MRESOLVER-228, there would be 
huge performance improvements, however I don't think BFS + Skip can simply work 
as it may introduce scope incompatibilities.

Before I submitted the PR of skip & reconcile approach in this ticket, I 
actually noticed the MRESOLVER-133 and tried breadth-first approach but no 
vain, here is what I've tried:
 * DFS + Skip
1% of 500+ apps failed because some low level transitive dependencies are no 
longer available in the dependency tree. This is because for a skipped node, I 
need set children as empty and reuse the result of the node with same GAV but a 
lower depth.
When there are version conflicts in the parent paths, a skipped node might 
should not be skipped and need reconcile somehow.
 * BFS + Skip
This time, I'm not seeing any failures because of missing some low level 
transitive dependencies. Instead I witnessed the scope of certain dependency 
nodes in the dependency tree were incorrect, especially for "provided" or 
"compile" scopes. Check [this 
code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/transformer/JavaScopeSelector.java#L87]
 , when there are conflict items with different scopes, maven will pick 
"compile" scope as long as there is a conflict item using "compile" scope. 
 * DFS + Skip + Reconcile
Then I come up with this solution. Tested with 2000+ apps in our company, all 
dependency:tree and dependency:list result remains the same which proves it is 
enough compatible. If go with BFS + Skip approach, as the scopes of 
dependencies are no longer the same, we should do some reconcile, however it is 
hard to implement such reconcile logic to make scope correct with BFS solution.

> Improve the maven dependency resolution speed by a skip & reconcile approach
> ----------------------------------------------------------------------------
>
>                 Key: MRESOLVER-228
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-228
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: Resolver
>    Affects Versions: 1.7.2
>            Reporter: wei cai
>            Priority: Major
>             Fix For: 1.8.0
>
>         Attachments: Screen Shot 2021-11-27 at 12.58.26 PM.png, Screen Shot 
> 2021-11-27 at 12.58.59 PM.png, Screen Shot 2021-11-27 at 12.59.32 PM.png
>
>
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}.
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
> !Screen Shot 2021-11-27 at 12.58.26 PM.png!
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children needs to be recalculated.  Recalculating 
> all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip & reconcile approach. Here is the *skip* part.
> !Screen Shot 2021-11-27 at 12.58.59 PM.png!
> From above figure, the 1st R is resolved at depth 3, and the 2nd R is 
> resolved again because the depth is at 2 which is lower, the 3rd R at depth 3 
> and the 4th R at depth 4 are simply skipped as R is already resolved at depth 
> 2. This is because the same node with deeper depth is most likely won't be 
> picked up by maven as maven employs a "{*}nearest{*} transitive dependency in 
> the tree depth and the *first* in resolution" strategy.
> The 3rd R and 4th R will have children set as zero and marked as skipped by 
> the R at depth 2 in 2nd tree path.
>  
> Here is the *reconcile* part:
> !Screen Shot 2021-11-27 at 12.59.32 PM.png!
> When there are dependency conflicts, some of the skipped nodes need to be 
> reconciled.
> In above figure, there are 4 tree paths.
>  * The D1 (D with version 1) in the 1st tree path is get resolved, children 
> of E and R at depth 3 are resolved and cached.
>  * In the 2nd tree path, when resolving E & R of H, we simply skip these 2 
> nodes as they are in deeper depth (depth: 4) than the E & R in 1st tree path.
>  * In the 3rd tree path, a R node with lower path is resolved, and a E node 
> at depth 5 is skipped.
>  * In the 4th path, a D2 (D with version 2) node is resolved, as the depth is 
> lower than D1, so maven will pick D2, this means the E & R's children cached 
> in tree depth 1 should be {*}discarded{*}. 
> Thus we might need to reconcile the E & R nodes in 2nd, 3rd and 4th tree 
> paths. Here only E in 2nd tree path needs to be reconciled. This is because:
>  * R in 3rd tree path won't be picked up as there is already a R in 2nd tree 
> path with a lower depth.
>  * E in 3rd tree path won't be picked up as it is enough to reconcile the E 
> in 2nd tree path as the E in 2nd tree path is deeper than E in 3rd tree path.
> Here is what we've updated in the maven-resolver logic:
>  # Resolve dependencies by leveraging a skip approach. The node in deeper 
> depth will be skipped if a node with same GAV has been resolved with a lower 
> depth.
>  # Cloned the nodes in step 1. 
> Use maven's ConflictResolver (Transformer) to transform the cloned nodes and 
> find out the conflict winners & the nodes to reconcile. 
> Ex, D1 conflicts with D2 in above digram.
> E in 2nd tree path will be reconciled.
>  # Reconcile all skipped nodes identified in above step.
>  
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> dependency:tree* and *mvn dependency:list* result with/without the 
> performance patch.
> Please help review the PR.
> [https://github.com/apache/maven-resolver/pull/136]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to