[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caijialiang updated HDFS-17287: --- Attachment: patch11-HDFS-17287.diff Status: Patch Available (was: Open) > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL: https://issues.apache.org/jira/browse/HDFS-17287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.6 >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > Here's the translation of the Hadoop PR description into English: > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caijialiang updated HDFS-17287: --- Description: The reason for the slow compilation: The Hadoop project has many modules, and the inability to compile them in parallel results in a slow process. For instance, the first compilation of Hadoop might take several hours, and even with local Maven dependencies, a subsequent compilation can still take close to 40 minutes, which is very slow. How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to investigate the dependency issues that prevent parallel compilation. * Investigate the dependencies between project modules. * Analyze the dependencies in multi-module Maven projects. * Download {{{}maven-to-plantuml{}}}: {{wget [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} * Generate a dependency tree: {{mvn dependency:tree > dep.txt}} * Generate a UML diagram from the dependency tree: {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} For more information, visit: [maven-to-plantuml GitHub repository|https://github.com/phxql/maven-to-plantuml/tree/master]. Here's the translation of the Hadoop PR description into English: *Hadoop Parallel Compilation Submission Logic* # Reasons for Parallel Compilation Failure ** In sequential compilation, as modules are compiled one by one in order, there are no errors because the compilation follows the module sequence. ** However, in parallel compilation, all modules are compiled simultaneously. The compilation order during multi-module concurrent compilation depends on the inter-module dependencies. If Module A depends on Module B, then Module B will be compiled before Module A. This ensures that the compilation order follows the dependencies between modules. But when Hadoop compiles in parallel, for example, compiling {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. The issue arises during the dist package stage. {{dist}} packages all other compiled modules. *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* ** In serial compilation, it compiles modules in the pom one by one in sequence. After all modules are compiled, it compiles {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the {{maven-assembly-plugin}} plugin is executed for packaging. All packages are repackaged according to the description in {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* ** Parallel compilation compiles modules according to the dependency order among them. If modules do not declare dependencies on each other through {{{}dependency{}}}, they are compiled in parallel. According to the dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, executing its {{{}maven-assembly-plugin{}}}. ** However, the files needed for packaging in {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. Therefore, when compiling {{hadoop-yarn-project}} and executing {{{}maven-assembly-plugin{}}}, not all required modules are built yet, leading to errors in parallel compilation. *Solution:* ** The solution is relatively straightforward: organize all modules from {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, and then declare them as dependencies in the pom of {{{}hadoop-yarn-project{}}}. was: The reason for the slow compilation: The Hadoop project has many modules, and the inability to compile them in parallel results in a slow process. For instance, the first compilation of Hadoop might take several hours, and even with local Maven dependencies, a subsequent compilation can still take close to 40 minutes, which is very slow. How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to investigate the dependency issues that prevent parallel compilation. * Investigate the dependencies between project modules. * Analyze the dependencies in multi-module Maven projects. * Download {{{}maven-to-plantuml{}}}: {{wget https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}} * Generate a dependency tree: {{mvn dependency:tree > dep.txt}} * Generate a UML diagram from the dependency tree: {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} For more information, visit: [maven-to-plantuml GitHub repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL: https://issues.apache
[jira] [Updated] (HDFS-17287) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HDFS-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17287: -- Labels: pull-request-available (was: ) > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HDFS-17287 > URL: https://issues.apache.org/jira/browse/HDFS-17287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.6 >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org