[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop

2023-12-20 Thread caijialiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caijialiang updated HDFS-17287:
---
Attachment: patch11-HDFS-17287.diff
Status: Patch Available  (was: Open)

> Parallel Maven Build Support for Apache Hadoop
> --
>
> Key: HDFS-17287
> URL: https://issues.apache.org/jira/browse/HDFS-17287
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.3.6
>Reporter: caijialiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: patch11-HDFS-17287.diff
>
>
> The reason for the slow compilation: The Hadoop project has many modules, and 
> the inability to compile them in parallel results in a slow process. For 
> instance, the first compilation of Hadoop might take several hours, and even 
> with local Maven dependencies, a subsequent compilation can still take close 
> to 40 minutes, which is very slow.
> How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to 
> investigate the dependency issues that prevent parallel compilation.
>  * Investigate the dependencies between project modules.
>  * Analyze the dependencies in multi-module Maven projects.
>  * Download {{{}maven-to-plantuml{}}}:
>  
> {{wget 
> [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}}
>  * Generate a dependency tree:
>  
> {{mvn dependency:tree > dep.txt}}
>  * Generate a UML diagram from the dependency tree:
>  
> {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}
> For more information, visit: [maven-to-plantuml GitHub 
> repository|https://github.com/phxql/maven-to-plantuml/tree/master].
> Here's the translation of the Hadoop PR description into English:
> *Hadoop Parallel Compilation Submission Logic*
>  # Reasons for Parallel Compilation Failure
>  ** In sequential compilation, as modules are compiled one by one in order, 
> there are no errors because the compilation follows the module sequence.
>  ** However, in parallel compilation, all modules are compiled 
> simultaneously. The compilation order during multi-module concurrent 
> compilation depends on the inter-module dependencies. If Module A depends on 
> Module B, then Module B will be compiled before Module A. This ensures that 
> the compilation order follows the dependencies between modules.
> But when Hadoop compiles in parallel, for example, compiling 
> {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. 
> The issue arises during the dist package stage. {{dist}} packages all other 
> compiled modules.
> *Behavior of {{hadoop-yarn-project}} in Serial Compilation:*
>  ** In serial compilation, it compiles modules in the pom one by one in 
> sequence. After all modules are compiled, it compiles 
> {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the 
> {{maven-assembly-plugin}} plugin is executed for packaging. All packages are 
> repackaged according to the description in 
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}.
> *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:*
>  ** Parallel compilation compiles modules according to the dependency order 
> among them. If modules do not declare dependencies on each other through 
> {{{}dependency{}}}, they are compiled in parallel. According to the 
> dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the 
> dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, 
> executing its {{{}maven-assembly-plugin{}}}.
>  ** However, the files needed for packaging in 
> {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are 
> not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. 
> Therefore, when compiling {{hadoop-yarn-project}} and executing 
> {{{}maven-assembly-plugin{}}}, not all required modules are built yet, 
> leading to errors in parallel compilation.
> *Solution:*
>  ** The solution is relatively straightforward: organize all modules from 
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, 
> and then declare them as dependencies in the pom of 
> {{{}hadoop-yarn-project{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop

2023-12-20 Thread caijialiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caijialiang updated HDFS-17287:
---
Description: 
The reason for the slow compilation: The Hadoop project has many modules, and 
the inability to compile them in parallel results in a slow process. For 
instance, the first compilation of Hadoop might take several hours, and even 
with local Maven dependencies, a subsequent compilation can still take close to 
40 minutes, which is very slow.

How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to 
investigate the dependency issues that prevent parallel compilation.
 * Investigate the dependencies between project modules.
 * Analyze the dependencies in multi-module Maven projects.
 * Download {{{}maven-to-plantuml{}}}:

 
{{wget 
[https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}}
 * Generate a dependency tree:

 
{{mvn dependency:tree > dep.txt}}
 * Generate a UML diagram from the dependency tree:

 
{{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}

For more information, visit: [maven-to-plantuml GitHub 
repository|https://github.com/phxql/maven-to-plantuml/tree/master].




Here's the translation of the Hadoop PR description into English:

*Hadoop Parallel Compilation Submission Logic*
 # Reasons for Parallel Compilation Failure

 ** In sequential compilation, as modules are compiled one by one in order, 
there are no errors because the compilation follows the module sequence.
 ** However, in parallel compilation, all modules are compiled simultaneously. 
The compilation order during multi-module concurrent compilation depends on the 
inter-module dependencies. If Module A depends on Module B, then Module B will 
be compiled before Module A. This ensures that the compilation order follows 
the dependencies between modules.
But when Hadoop compiles in parallel, for example, compiling 
{{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. The 
issue arises during the dist package stage. {{dist}} packages all other 
compiled modules.

*Behavior of {{hadoop-yarn-project}} in Serial Compilation:*

 ** In serial compilation, it compiles modules in the pom one by one in 
sequence. After all modules are compiled, it compiles 
{{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the 
{{maven-assembly-plugin}} plugin is executed for packaging. All packages are 
repackaged according to the description in 
{{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}.
*Behavior of {{hadoop-yarn-project}} in Parallel Compilation:*

 ** Parallel compilation compiles modules according to the dependency order 
among them. If modules do not declare dependencies on each other through 
{{{}dependency{}}}, they are compiled in parallel. According to the dependency 
definition in the pom of {{{}hadoop-yarn-project{}}}, the dependencies are 
compiled first, followed by {{{}hadoop-yarn-project{}}}, executing its 
{{{}maven-assembly-plugin{}}}.
 ** However, the files needed for packaging in 
{{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are 
not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. 
Therefore, when compiling {{hadoop-yarn-project}} and executing 
{{{}maven-assembly-plugin{}}}, not all required modules are built yet, leading 
to errors in parallel compilation.
*Solution:*

 ** The solution is relatively straightforward: organize all modules from 
{{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, 
and then declare them as dependencies in the pom of {{{}hadoop-yarn-project{}}}.

  was:
The reason for the slow compilation: The Hadoop project has many modules, and 
the inability to compile them in parallel results in a slow process. For 
instance, the first compilation of Hadoop might take several hours, and even 
with local Maven dependencies, a subsequent compilation can still take close to 
40 minutes, which is very slow.

How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to 
investigate the dependency issues that prevent parallel compilation.
 * Investigate the dependencies between project modules.
 * Analyze the dependencies in multi-module Maven projects.
 * Download {{{}maven-to-plantuml{}}}:

 
{{wget 
https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}}
 * Generate a dependency tree:

 
{{mvn dependency:tree > dep.txt}}
 * Generate a UML diagram from the dependency tree:

 
{{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}

For more information, visit: [maven-to-plantuml GitHub 
repository|https://github.com/phxql/maven-to-plantuml/tree/master].


> Parallel Maven Build Support for Apache Hadoop
> --
>
> Key: HDFS-17287
> URL: https://issues.apache

[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop

2023-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17287:
--
Labels: pull-request-available  (was: )

> Parallel Maven Build Support for Apache Hadoop
> --
>
> Key: HDFS-17287
> URL: https://issues.apache.org/jira/browse/HDFS-17287
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.3.6
>Reporter: caijialiang
>Priority: Major
>  Labels: pull-request-available
>
> The reason for the slow compilation: The Hadoop project has many modules, and 
> the inability to compile them in parallel results in a slow process. For 
> instance, the first compilation of Hadoop might take several hours, and even 
> with local Maven dependencies, a subsequent compilation can still take close 
> to 40 minutes, which is very slow.
> How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to 
> investigate the dependency issues that prevent parallel compilation.
>  * Investigate the dependencies between project modules.
>  * Analyze the dependencies in multi-module Maven projects.
>  * Download {{{}maven-to-plantuml{}}}:
>  
> {{wget 
> https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}}
>  * Generate a dependency tree:
>  
> {{mvn dependency:tree > dep.txt}}
>  * Generate a UML diagram from the dependency tree:
>  
> {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}
> For more information, visit: [maven-to-plantuml GitHub 
> repository|https://github.com/phxql/maven-to-plantuml/tree/master].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org