[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-08 Thread Michael Osipov (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Osipov updated MRESOLVER-247:
-
Fix Version/s: 1.8.0

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> ---
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Assignee: Michael Osipov
>Priority: Major
> Fix For: 1.8.0
>
> Attachments: exclusion-matters.png, skip-duplicate.png, 
> skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children (D & E in this case) need to be 
> recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 
> is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as 
> it is more left than the last resolved R#3 node. This is because maven picks 
> up widest scope present among conflicting dependencies (scopes of the 
> conflicts may differ), we need to *retain the conflict paths* by using a 
> strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
> R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
> R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve 
> R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
> already resolved at depth 2. This is because the same node with deeper depth 
> won't be picked up by maven as maven employs a "nearest transitive dependency 
> in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
> know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is 
> having a different version and it conflicts with D1, D2 will be skipped as it 
> won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-08 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children (D & E in this case) need to be 
recalculated.  Recalculating all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

h1. Solution

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach to avoid unnecessary dependency resolution.
h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*

!skip-duplicate.png!

>From above figure:
 * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 is 
the winner.
 * The R#5 won't be the winner, however here we force resolved this node as it 
is more left than the last resolved R#3 node. This is because maven picks up 
widest scope present among conflicting dependencies (scopes of the conflicts 
may differ), we need to *retain the conflict paths* by using a strategy like:
 ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
R#5.
 ** If there is a R#6 which is more left than R#5, then need to force resolve 
R#6.

 * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
already resolved at depth 2. This is because the same node with deeper depth 
won't be picked up by maven as maven employs a "nearest transitive dependency 
in the tree depth and the first in resolution" strategy.

h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*

*!skip-version-conflicts.png!*

In above figure.
 * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
know D1 is the winner
 * When comes to resolve D2 (D with version 2, after #4), we know D2 is having 
a different version and it conflicts with D1, D2 will be skipped as it won't be 
picked up by maven,  all D2's children *won't be resolved* then. 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

https://github.com/apache/maven-resolver/pull/158

 

  was:
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children (D & E in this case) need to be 
recalculated.  Recalculating all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

h1. Solution

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach to avoid unnecessary dependency resolution.
h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*

!skip-duplicate.png!

>From above figure:
 * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 is 
the winner.
 * The R#5 won't be the winner, however here we force resolved this node as it 
is more left than the last resolved R#3 node. This is because maven picks up 
widest scope present among conflicting dependencies (scopes of the conflicts 
may differ), we need to *retain the conflict paths* by using a strategy like:
 ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
R#5.
 ** If there is a R#6 which is more left than R#5, then need to force resolve 
R#6.

 * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
already resolved at depth 2. This is because the same node with deeper depth 
won't be picked up by maven as maven employs a "nearest transitive dependency 
in the tree depth and the first in resolution" strategy.

h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*

*!skip-version-conflicts.png!*

In above figure.
 * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
know D1 is the winner
 * When comes to resolve D2 (D with version 2, after #4), we know D2 is having 
a different version and it conflicts with D1, D2 will be skipped as it won't be 
picked up by maven,  all D2's children *won't be resolved* then. 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

 

  was:
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: (was: skip-duplicate-1.png)

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> ---
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: exclusion-matters.png, skip-duplicate.png, 
> skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children (D & E in this case) need to be 
> recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 
> is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as 
> it is more left than the last resolved R#3 node. This is because maven picks 
> up widest scope present among conflicting dependencies (scopes of the 
> conflicts may differ), we need to *retain the conflict paths* by using a 
> strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
> R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
> R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve 
> R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
> already resolved at depth 2. This is because the same node with deeper depth 
> won't be picked up by maven as maven employs a "nearest transitive dependency 
> in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
> know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is 
> having a different version and it conflicts with D1, D2 will be skipped as it 
> won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: skip-duplicate.png

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> ---
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: exclusion-matters.png, skip-duplicate.png, 
> skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children (D & E in this case) need to be 
> recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 
> is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as 
> it is more left than the last resolved R#3 node. This is because maven picks 
> up widest scope present among conflicting dependencies (scopes of the 
> conflicts may differ), we need to *retain the conflict paths* by using a 
> strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
> R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
> R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve 
> R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
> already resolved at depth 2. This is because the same node with deeper depth 
> won't be picked up by maven as maven employs a "nearest transitive dependency 
> in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
> know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is 
> having a different version and it conflicts with D1, D2 will be skipped as it 
> won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> dependency:tree* 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: skip-duplicate-1.png

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> ---
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: exclusion-matters.png, skip-duplicate-1.png, 
> skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children (D & E in this case) need to be 
> recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 
> is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as 
> it is more left than the last resolved R#3 node. This is because maven picks 
> up widest scope present among conflicting dependencies (scopes of the 
> conflicts may differ), we need to *retain the conflict paths* by using a 
> strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
> R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
> R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve 
> R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
> already resolved at depth 2. This is because the same node with deeper depth 
> won't be picked up by maven as maven employs a "nearest transitive dependency 
> in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
> know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is 
> having a different version and it conflicts with D1, D2 will be skipped as it 
> won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: (was: skip-duplicate.png)

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> ---
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: exclusion-matters.png, skip-duplicate-1.png, 
> skip-version-conflicts.png
>
>
> h1. *Background*
> This Jira is related with MRESOLVER-240 and MRESOLVER-228
> There were discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
> h1. *The phenomenon*
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children (D & E in this case) need to be 
> recalculated.  Recalculating all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> h1. Solution
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach to avoid unnecessary dependency resolution.
> h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
> !skip-duplicate.png!
> From above figure:
>  * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 
> is the winner.
>  * The R#5 won't be the winner, however here we force resolved this node as 
> it is more left than the last resolved R#3 node. This is because maven picks 
> up widest scope present among conflicting dependencies (scopes of the 
> conflicts may differ), we need to *retain the conflict paths* by using a 
> strategy like:
>  ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
> R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
> R#5.
>  ** If there is a R#6 which is more left than R#5, then need to force resolve 
> R#6.
>  * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
> already resolved at depth 2. This is because the same node with deeper depth 
> won't be picked up by maven as maven employs a "nearest transitive dependency 
> in the tree depth and the first in resolution" strategy.
> h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*
> *!skip-version-conflicts.png!*
> In above figure.
>  * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
> know D1 is the winner
>  * When comes to resolve D2 (D with version 2, after #4), we know D2 is 
> having a different version and it conflicts with D1, D2 will be skipped as it 
> won't be picked up by maven,  all D2's children *won't be resolved* then. 
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children (D & E in this case) need to be 
recalculated.  Recalculating all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

h1. Solution

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach to avoid unnecessary dependency resolution.
h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*

!skip-duplicate.png!

>From above figure:
 * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 is 
the winner.
 * The R#5 won't be the winner, however here we force resolved this node as it 
is more left than the last resolved R#3 node. This is because maven picks up 
widest scope present among conflicting dependencies (scopes of the conflicts 
may differ), we need to *retain the conflict paths* by using a strategy like:
 ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
R#5.


 ** If there is a R#6 which is more left than R#5, then need to force resolve 
R#6.

 * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
already resolved at depth 2. This is because the same node with deeper depth 
won't be picked up by maven as maven employs a "nearest transitive dependency 
in the tree depth and the first in resolution" strategy.

h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*

*!skip-version-conflicts.png!*

In above figure.
 * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
know D1 is the winner
 * When comes to resolve D2 (D with version 2, after #4), we know D2 is having 
a different version and it conflicts with D1, D2 will be skipped as it won't be 
picked up by maven,  all D2's children *won't be resolved* then. 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

 

  was:
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children (D & E in this case) need to be 
recalculated.  Recalculating all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

h1. Solution

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach to avoid unnecessary dependency resolution.
h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*

!skip-duplicate.png!

>From above figure:
 * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 is 
the winner.
 * The R#5 won't be the winner, however here we force resolved this node as it 
is more left than the last resolved R#3 node. This is because maven picks up 
widest scope present among conflicting dependencies (scopes of the conflicts 
may differ), we need to *retain the conflict paths* by using a strategy like:
 ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
R#5.

 * 
 ** If there is a R#6 which is more left than R#5, then need to force resolve 
R#6.

 * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
already resolved at depth 2. This is because the same node with deeper depth 
won't be picked up by maven as maven employs a "nearest transitive dependency 
in the tree depth and the first in resolution" strategy.

h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*

*!skip-version-conflicts.png!*

In above figure.
 * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
know D1 is the winner
 * When comes to resolve D2 (D with version 2, after #4), we know D2 is having 
a different version and it conflicts with D1, D2 will be skipped as it won't be 
picked up by maven,  all D2's children *won't be resolved* then. 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

 

  was:
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip approach - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children (D & E in this case) need to be 
recalculated.  Recalculating all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

h1. Solution

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach to avoid unnecessary dependency resolution.
h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*

!skip-duplicate.png!

>From above figure:
 * The R#3 is resolved at depth 2, in the BFS solution, we already know R#3 is 
the winner.
 * The R#5 won't be the winner, however here we force resolved this node as it 
is more left than the last resolved R#3 node.

This is because maven picks up widest scope present among conflicting 
dependencies (scopes of the conflicts may differ), we need to *retain the 
conflict paths* by using a strategy like:

 * 
 ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth) while 
R#5 locates in (3,1), R#5 is more left than R#3, so we need to force resolve 
R#5.
 ** If there is a R#6 which is more left than R#5, then need to force resolve 
R#6.

 * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
already resolved at depth 2. This is because the same node with deeper depth 
won't be picked up by maven as maven employs a "nearest transitive dependency 
in the tree depth and the first in resolution" strategy.

h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*

*!skip-version-conflicts.png!*

In above figure.
 * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
know D1 is the winner
 * When comes to resolve D2 (D with version 2, after #4), we know D2 is having 
a different version and it conflicts with D1, D2 will be skipped as it won't be 
picked up by maven,  all D2's children *won't be resolved* then. 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

 

  was:
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
h1. *Background*

This Jira is related with MRESOLVER-240 and MRESOLVER-228

There were discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

h1. *The phenomenon*

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children (D & E in this case) need to be 
recalculated.  Recalculating all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

h1. Solution

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach to avoid unnecessary dependency resolution.
h2. *CASE 1: Skip duplicate node (omitted duplicate case in dependency tree)*
!skip-duplicate.png!

>From above figure:
 * The R#3 is resolved at depth 2, in a BFS solution, we already know R#3 is 
the winner.
 * The R#5 won't be the winner, however here we force resolved this node as it 
is more left than the last resolved R#3 node. 

As maven picks up widest scope present among conflicting dependencies (scopes 
of the conflicts may differ), we need to retain the conflict paths by using a 
strategy like:
 ** R#3 locates in coordinate (2 - depth, 2 - sequence in given depth), R#5 
locates in (3,1), R#5 is more left so need to resolve R#5.
 ** If there is a R#6 which is more left than R#5, then need to resolve R#6 
again.


 * The R#8 is at depth 3 and the R#12 at depth 4 are simply skipped as R#3 is 
already resolved at depth 2. This is because the same node with deeper depth 
won't be picked up by maven as maven employs a "nearest transitive dependency 
in the tree depth and the first in resolution" strategy.

h2. *CASE 2: Skip version conflict (omitted conflict case in dependency tree)*

*!skip-version-conflicts.png!*

In above figure.
 * The D1 (D with version 1, #4) is resolved, in the BFS solution, we already 
know D1 is the winner
 * When comes to resolve D2 (D with version 2, after #4), we know D2 is having 
a different version and it conflicts with D1, D2 will be skipped as it won't be 
picked up by maven,  all D2's children {*}won't be resolved then{*}. 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

 

  was:
This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
and https://issues.apache.org/jira/browse/MRESOLVER-228

There was discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

 

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution by a Skip solution based on BFS

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Summary: Avoid unnecessary dependency resolution by a Skip solution based 
on BFS  (was: Avoid unnecessary dependency resolution )

> Avoid unnecessary dependency resolution by a Skip solution based on BFS
> ---
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: exclusion-matters.png, skip-duplicate.png, 
> skip-version-conflicts.png
>
>
> This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
> and https://issues.apache.org/jira/browse/MRESOLVER-228
> There was discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
>  
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 
>  
> !exclusion-matters.png!
>  
> From above figure, we know:
>  * In 1st case, D will be resolved only once as there are no exclusions/same 
> exclusions up the tree.
>  * In 2nd case, the B and C have different exclusions and D needs to be 
> recalculated, if D is a heavy dependency which introduce many transitive 
> dependencies, all D and its children needs to be recalculated.  Recalculating 
> all of these nodes introduces 2 issues:
>  ** Slow in resolving dependencies.
>  ** Lots of DependencyNodes cached (all calculated/recalculated nodes would 
> be cached) and will consume huge memory.
> To improve the speed of maven resolver's dependency resolution,  I 
> implemented a skip approach.
> *CASE 1: Skip duplicate node*
> !skip-duplicate.png!
> From above figure, the 1st R is resolved at depth 3, and the 2nd R is 
> resolved again because the depth is at 2 which is lower, the 3rd R at depth 3 
> and the 4th R at depth 4 are simply skipped as R is already resolved at depth 
> 2. This is because the same node with deeper depth is most likely won't be 
> picked up by maven as maven employs a "nearest transitive dependency in the 
> tree depth and the first in resolution" strategy.
> The 3rd R and 4th R will have children set as zero and marked as skipped by 
> the R at depth 2 in 2nd tree path.
> *CASE 2: Skip version conflict*
> *!skip-version-conflicts.png!*
>  
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build 
> time reduced for different projects depend on how complex the dependencies 
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain 
> the same.
> We've verified the resolver performance patch leveraging an automation 
> solution to certify 2000+ apps of our company by comparing the  *mvn 
> dependency:tree* and *mvn dependency:list* result with/without the 
> performance patch.
> Please help review the PR.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Description: 
This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
and https://issues.apache.org/jira/browse/MRESOLVER-228

There was discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

 

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 

 

!exclusion-matters.png!

 

>From above figure, we know:
 * In 1st case, D will be resolved only once as there are no exclusions/same 
exclusions up the tree.
 * In 2nd case, the B and C have different exclusions and D needs to be 
recalculated, if D is a heavy dependency which introduce many transitive 
dependencies, all D and its children needs to be recalculated.  Recalculating 
all of these nodes introduces 2 issues:
 ** Slow in resolving dependencies.
 ** Lots of DependencyNodes cached (all calculated/recalculated nodes would be 
cached) and will consume huge memory.

To improve the speed of maven resolver's dependency resolution,  I implemented 
a skip approach.

*CASE 1: Skip duplicate node*
!skip-duplicate.png!

>From above figure, the 1st R is resolved at depth 3, and the 2nd R is resolved 
>again because the depth is at 2 which is lower, the 3rd R at depth 3 and the 
>4th R at depth 4 are simply skipped as R is already resolved at depth 2. This 
>is because the same node with deeper depth is most likely won't be picked up 
>by maven as maven employs a "nearest transitive dependency in the tree depth 
>and the first in resolution" strategy.

The 3rd R and 4th R will have children set as zero and marked as skipped by the 
R at depth 2 in 2nd tree path.

*CASE 2: Skip version conflict*

*!skip-version-conflicts.png!*

 

After we enabled the resolver patch in maven, we are seeing 10% ~70% build time 
reduced for different projects depend on how complex the dependencies are, and 
the result of *mvn dependency:tree* and *mvn dependency:list* remain the same.

We've verified the resolver performance patch leveraging an automation solution 
to certify 2000+ apps of our company by comparing the  *mvn dependency:tree* 
and *mvn dependency:list* result with/without the performance patch.

Please help review the PR.

 

  was:
This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
and https://issues.apache.org/jira/browse/MRESOLVER-228

There was discussions about the DFS or BFS algorithm for maven resolver in 
MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
easier to implement. Here is the plan for multiple changes requested recently:
 * DFS > BFS - preparation for parallel download
 * Skip - avoid unnecessary version resolution (Covered in this JIRA)
 * Download descriptors in parallel (MRESOLVER-7)

 

When comes to resolve the huge amount of dependencies of an enterprise level 
project, the maven resolver is very slow to resolve the dependency graph/tree. 
Take one of our app as example, it could take *10minutes+ and 16G memory* to 
print out the result of {*}mvn dependency:tree{*}. 

This is because there are many dependencies declared in the project, and some 
of the dependencies would introduce *600+* transitive dependencies, and 
exclusions are widely used to solve dependency conflicts. 

By checking the 
[code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
 we know the exclusion is also part of the cache key. This means when the 
exclusions up the tree differs, the cached resolution result for the same GAV 
won't be picked up and need s to be recalculated. 


> Avoid unnecessary dependency resolution 
> 

[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: exclusion-matters.png
skip-version-conflicts.png
skip-duplicate.png

> Avoid unnecessary dependency resolution 
> 
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: exclusion-matters.png, skip-duplicate.png, 
> skip-version-conflicts.png
>
>
> This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
> and https://issues.apache.org/jira/browse/MRESOLVER-228
> There was discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
>  
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: (was: Blank diagram (53).png)

> Avoid unnecessary dependency resolution 
> 
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
>
> This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
> and https://issues.apache.org/jira/browse/MRESOLVER-228
> There was discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
>  
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: (was: Blank diagram (55).png)

> Avoid unnecessary dependency resolution 
> 
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
>
> This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
> and https://issues.apache.org/jira/browse/MRESOLVER-228
> There was discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
>  
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: (was: Blank diagram (54).png)

> Avoid unnecessary dependency resolution 
> 
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
>
> This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
> and https://issues.apache.org/jira/browse/MRESOLVER-228
> There was discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
>  
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (MRESOLVER-247) Avoid unnecessary dependency resolution

2022-03-07 Thread wei cai (Jira)


 [ 
https://issues.apache.org/jira/browse/MRESOLVER-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wei cai updated MRESOLVER-247:
--
Attachment: Blank diagram (55).png
Blank diagram (54).png
Blank diagram (53).png

> Avoid unnecessary dependency resolution 
> 
>
> Key: MRESOLVER-247
> URL: https://issues.apache.org/jira/browse/MRESOLVER-247
> Project: Maven Resolver
>  Issue Type: Improvement
>  Components: Resolver
>Affects Versions: 1.7.3
>Reporter: wei cai
>Priority: Major
> Attachments: Blank diagram (53).png, Blank diagram (54).png, Blank 
> diagram (55).png
>
>
> This Jira is related with https://issues.apache.org/jira/browse/MRESOLVER-240 
> and https://issues.apache.org/jira/browse/MRESOLVER-228
> There was discussions about the DFS or BFS algorithm for maven resolver in 
> MRESOLVER-228, Changing to BFS would make MRESOLVER-228 & MRESOLVER-7 much 
> easier to implement. Here is the plan for multiple changes requested recently:
>  * DFS > BFS - preparation for parallel download
>  * Skip - avoid unnecessary version resolution (Covered in this JIRA)
>  * Download descriptors in parallel (MRESOLVER-7)
>  
> When comes to resolve the huge amount of dependencies of an enterprise level 
> project, the maven resolver is very slow to resolve the dependency 
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G 
> memory* to print out the result of {*}mvn dependency:tree{*}. 
> This is because there are many dependencies declared in the project, and some 
> of the dependencies would introduce *600+* transitive dependencies, and 
> exclusions are widely used to solve dependency conflicts. 
> By checking the 
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
>  we know the exclusion is also part of the cache key. This means when the 
> exclusions up the tree differs, the cached resolution result for the same GAV 
> won't be picked up and need s to be recalculated. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)