[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoqiao He updated HADOOP-19019: --------------------------------- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Parallel Maven Build Support for Apache Hadoop > ---------------------------------------------- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build > Reporter: caijialiang > Assignee: caijialiang > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org