[ 
https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reassigned HADOOP-19019:
------------------------------------

    Assignee: caijialiang

> Parallel Maven Build Support for Apache Hadoop
> ----------------------------------------------
>
>                 Key: HADOOP-19019
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19019
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>            Reporter: caijialiang
>            Assignee: caijialiang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: patch11-HDFS-17287.diff
>
>
> The reason for the slow compilation: The Hadoop project has many modules, and 
> the inability to compile them in parallel results in a slow process. For 
> instance, the first compilation of Hadoop might take several hours, and even 
> with local Maven dependencies, a subsequent compilation can still take close 
> to 40 minutes, which is very slow.
> How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to 
> investigate the dependency issues that prevent parallel compilation.
>  * Investigate the dependencies between project modules.
>  * Analyze the dependencies in multi-module Maven projects.
>  * Download {{{}maven-to-plantuml{}}}:
>  
> {{wget 
> [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}}
>  * Generate a dependency tree:
>  
> {{mvn dependency:tree > dep.txt}}
>  * Generate a UML diagram from the dependency tree:
>  
> {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}}
> For more information, visit: [maven-to-plantuml GitHub 
> repository|https://github.com/phxql/maven-to-plantuml/tree/master].
>  
> *Hadoop Parallel Compilation Submission Logic*
>  # Reasons for Parallel Compilation Failure
>  * 
>  ** In sequential compilation, as modules are compiled one by one in order, 
> there are no errors because the compilation follows the module sequence.
>  ** However, in parallel compilation, all modules are compiled 
> simultaneously. The compilation order during multi-module concurrent 
> compilation depends on the inter-module dependencies. If Module A depends on 
> Module B, then Module B will be compiled before Module A. This ensures that 
> the compilation order follows the dependencies between modules.
> But when Hadoop compiles in parallel, for example, compiling 
> {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. 
> The issue arises during the dist package stage. {{dist}} packages all other 
> compiled modules.
> *Behavior of {{hadoop-yarn-project}} in Serial Compilation:*
>  * 
>  ** In serial compilation, it compiles modules in the pom one by one in 
> sequence. After all modules are compiled, it compiles 
> {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the 
> {{maven-assembly-plugin}} plugin is executed for packaging. All packages are 
> repackaged according to the description in 
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}.
> *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:*
>  * 
>  ** Parallel compilation compiles modules according to the dependency order 
> among them. If modules do not declare dependencies on each other through 
> {{{}dependency{}}}, they are compiled in parallel. According to the 
> dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the 
> dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, 
> executing its {{{}maven-assembly-plugin{}}}.
>  ** However, the files needed for packaging in 
> {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are 
> not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. 
> Therefore, when compiling {{hadoop-yarn-project}} and executing 
> {{{}maven-assembly-plugin{}}}, not all required modules are built yet, 
> leading to errors in parallel compilation.
> *Solution:*
>  * 
>  ** The solution is relatively straightforward: organize all modules from 
> {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, 
> and then declare them as dependencies in the pom of 
> {{{}hadoop-yarn-project{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to