[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809761#comment-17809761 ] ASF GitHub Bot commented on HADOOP-19019: - Hexiaoqiao commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1905390687 Committed to trunk. Thanks @JiaLiangC and @steveloughran . > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809760#comment-17809760 ] ASF GitHub Bot commented on HADOOP-19019: - Hexiaoqiao merged PR #6373: URL: https://github.com/apache/hadoop/pull/6373 > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809216#comment-17809216 ] ASF GitHub Bot commented on HADOOP-19019: - Hexiaoqiao commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1902975164 If no more other concerns, I will check this PR into trunk for a short while. @steveloughran > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809161#comment-17809161 ] ASF GitHub Bot commented on HADOOP-19019: - steveloughran commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1902729436 Who is going to merge this? @Hexiaoqiao? > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807721#comment-17807721 ] ASF GitHub Bot commented on HADOOP-19019: - hadoop-yetus commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1895784931 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 34s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 50s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 35m 22s | | trunk passed | | +1 :green_heart: | compile | 18m 15s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 39s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | mvnsite | 4m 48s | | trunk passed | | +1 :green_heart: | javadoc | 4m 49s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 4m 20s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | shadedclient | 135m 36s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 4m 7s | | the patch passed | | +1 :green_heart: | compile | 17m 52s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 52s | | the patch passed | | +1 :green_heart: | compile | 16m 9s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 9s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 4m 42s | | the patch passed | | +1 :green_heart: | javadoc | 4m 51s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 4m 27s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | shadedclient | 48m 41s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 241m 44s | | hadoop-yarn-project in the patch passed. | | +1 :green_heart: | unit | 162m 20s | | hadoop-mapreduce-project in the patch passed. | | +1 :green_heart: | asflicense | 1m 14s | | The patch does not generate ASF License warnings. | | | | 633m 12s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6373 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint | | uname | Linux 08a9c77cafe4 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / b9656c76142a19c5a1b71fcb91dd75d404f00c0a | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/3/testReport/ | | Max. process+thread count | 2702 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project hadoop-mapreduce-project U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/3/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807516#comment-17807516 ] ASF GitHub Bot commented on HADOOP-19019: - JiaLiangC commented on code in PR #6373: URL: https://github.com/apache/hadoop/pull/6373#discussion_r1454392094 ## hadoop-yarn-project/pom.xml: ## @@ -90,6 +91,56 @@ hadoop-yarn-applications-catalog-webapp war + + org.apache.hadoop Review Comment: @steveloughran all these dependencies already changed to provided scope. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807511#comment-17807511 ] ASF GitHub Bot commented on HADOOP-19019: - Hexiaoqiao commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1894833569 Great! Thanks @JiaLiangC , Let's wait if anymore folks would like to give another review here. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802353#comment-17802353 ] ASF GitHub Bot commented on HADOOP-19019: - JiaLiangC commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1876166196 @Hexiaoqiao Test environment: CentOS 8 x86_64, 16GB RAM, SSD. Tested on Hadoop 3.3.6. The initial serial compilation took almost 3 hours due to slow dependency downloads. With parallel compilation (-2C), the initial compilation took about 1 hour, approximately 2 times faster. For subsequent compilations, with dependencies already downloaded locally, the overall parallel compilation time for Hadoop was 13 minutes, while serial compilation took 37 minutes. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802173#comment-17802173 ] ASF GitHub Bot commented on HADOOP-19019: - Hexiaoqiao commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1875368262 @JiaLiangC Thanks for your work and involve me here. It is very interesting improvement. I want to know if any time cost save when change to parallel build. Another side, beside hadoop-yarn module, any other modules need to set dependency explicitly? Thanks again. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801641#comment-17801641 ] ASF GitHub Bot commented on HADOOP-19019: - JiaLiangC commented on code in PR #6373: URL: https://github.com/apache/hadoop/pull/6373#discussion_r1439125229 ## hadoop-yarn-project/pom.xml: ## @@ -90,6 +91,56 @@ hadoop-yarn-applications-catalog-webapp war + + org.apache.hadoop Review Comment: Yes, the scope here should be defined as 'provided'. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801640#comment-17801640 ] ASF GitHub Bot commented on HADOOP-19019: - JiaLiangC commented on code in PR #6373: URL: https://github.com/apache/hadoop/pull/6373#discussion_r1439125035 ## hadoop-yarn-project/pom.xml: ## @@ -90,6 +91,56 @@ hadoop-yarn-applications-catalog-webapp war + + org.apache.hadoop + hadoop-yarn-applications-distributedshell + + + org.apache.hadoop + hadoop-yarn-applications-unmanaged-am-launcher + ${project.version} + + + org.apache.hadoop + hadoop-yarn-server-tests + ${project.version} + + + org.apache.hadoop + hadoop-yarn-server-timelineservice-hbase-client + ${project.version} + Review Comment: cd hadoop-yarn-project mvn clean -T2C -Pnative -Pdist -Dtar -Psrc -Pyarn-ui -Dzookeeper.version=3.7.2 -Dhbase.profile=2.0 -DskipTests -DskipITs install The purpose of adding exclusions here is to resolve version conflicts during compilation, as shown in the diagram. I did not encounter this conflict when testing with Hadoop 3.3.6; it only appears in the trunk branch. ![image](https://github.com/apache/hadoop/assets/18082602/4e558bee-6218-4036-baee-8cf26109c550) > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801596#comment-17801596 ] ASF GitHub Bot commented on HADOOP-19019: - steveloughran commented on code in PR #6373: URL: https://github.com/apache/hadoop/pull/6373#discussion_r1439080434 ## hadoop-yarn-project/pom.xml: ## @@ -90,6 +91,56 @@ hadoop-yarn-applications-catalog-webapp war + + org.apache.hadoop Review Comment: shouldn't all these be scoped as provided? ## hadoop-yarn-project/pom.xml: ## @@ -90,6 +91,56 @@ hadoop-yarn-applications-catalog-webapp war + + org.apache.hadoop + hadoop-yarn-applications-distributedshell + + + org.apache.hadoop + hadoop-yarn-applications-unmanaged-am-launcher + ${project.version} + + + org.apache.hadoop + hadoop-yarn-server-tests + ${project.version} + + + org.apache.hadoop + hadoop-yarn-server-timelineservice-hbase-client + ${project.version} + Review Comment: this worries me. why is this exclusion needed? > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801406#comment-17801406 ] ASF GitHub Bot commented on HADOOP-19019: - JiaLiangC commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1872647348 @Hexiaoqiao Could you help review this pr? > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > * > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > * > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > * > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > * > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801202#comment-17801202 ] ASF GitHub Bot commented on HADOOP-19019: - hadoop-yetus commented on PR #6373: URL: https://github.com/apache/hadoop/pull/6373#issuecomment-1872173866 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 32s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 40s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 35m 22s | | trunk passed | | +1 :green_heart: | compile | 16m 58s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 15m 4s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | mvnsite | 4m 48s | | trunk passed | | +1 :green_heart: | javadoc | 4m 57s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 4m 23s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | shadedclient | 133m 44s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 3m 43s | | the patch passed | | +1 :green_heart: | compile | 17m 23s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 23s | | the patch passed | | +1 :green_heart: | compile | 17m 6s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 17m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 5m 12s | | the patch passed | | +1 :green_heart: | javadoc | 5m 2s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 4m 21s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | shadedclient | 54m 22s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 238m 49s | [/patch-unit-hadoop-yarn-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/2/artifact/out/patch-unit-hadoop-yarn-project.txt) | hadoop-yarn-project in the patch passed. | | +1 :green_heart: | unit | 161m 3s | | hadoop-mapreduce-project in the patch passed. | | +1 :green_heart: | asflicense | 1m 14s | | The patch does not generate ASF License warnings. | | | | 632m 56s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6373 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint | | uname | Linux d64615eda07c 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 941069621b58020f19fcee00bef8e97a61a6cf27 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/2/testReport/ | | Max. process+thread count | 2703 (vs. ulimit of 5500) |
[jira] [Commented] (HADOOP-19019) Parallel Maven Build Support for Apache Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801093#comment-17801093 ] Xiaoqiao He commented on HADOOP-19019: -- Thanks [~jialiang] for your works. Move from HDFS to COMMON module. > Parallel Maven Build Support for Apache Hadoop > -- > > Key: HADOOP-19019 > URL: https://issues.apache.org/jira/browse/HADOOP-19019 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: caijialiang >Priority: Major > Labels: pull-request-available > Attachments: patch11-HDFS-17287.diff > > > The reason for the slow compilation: The Hadoop project has many modules, and > the inability to compile them in parallel results in a slow process. For > instance, the first compilation of Hadoop might take several hours, and even > with local Maven dependencies, a subsequent compilation can still take close > to 40 minutes, which is very slow. > How to solve it: Use {{mvn dependency:tree}} and {{maven-to-plantuml}} to > investigate the dependency issues that prevent parallel compilation. > * Investigate the dependencies between project modules. > * Analyze the dependencies in multi-module Maven projects. > * Download {{{}maven-to-plantuml{}}}: > > {{wget > [https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar]}} > * Generate a dependency tree: > > {{mvn dependency:tree > dep.txt}} > * Generate a UML diagram from the dependency tree: > > {{java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml}} > For more information, visit: [maven-to-plantuml GitHub > repository|https://github.com/phxql/maven-to-plantuml/tree/master]. > Here's the translation of the Hadoop PR description into English: > *Hadoop Parallel Compilation Submission Logic* > # Reasons for Parallel Compilation Failure > ** In sequential compilation, as modules are compiled one by one in order, > there are no errors because the compilation follows the module sequence. > ** However, in parallel compilation, all modules are compiled > simultaneously. The compilation order during multi-module concurrent > compilation depends on the inter-module dependencies. If Module A depends on > Module B, then Module B will be compiled before Module A. This ensures that > the compilation order follows the dependencies between modules. > But when Hadoop compiles in parallel, for example, compiling > {{{}hadoop-yarn-project{}}}, the dependencies between modules are correct. > The issue arises during the dist package stage. {{dist}} packages all other > compiled modules. > *Behavior of {{hadoop-yarn-project}} in Serial Compilation:* > ** In serial compilation, it compiles modules in the pom one by one in > sequence. After all modules are compiled, it compiles > {{{}hadoop-yarn-project{}}}. During the {{prepare-package}} stage, the > {{maven-assembly-plugin}} plugin is executed for packaging. All packages are > repackaged according to the description in > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}. > *Behavior of {{hadoop-yarn-project}} in Parallel Compilation:* > ** Parallel compilation compiles modules according to the dependency order > among them. If modules do not declare dependencies on each other through > {{{}dependency{}}}, they are compiled in parallel. According to the > dependency definition in the pom of {{{}hadoop-yarn-project{}}}, the > dependencies are compiled first, followed by {{{}hadoop-yarn-project{}}}, > executing its {{{}maven-assembly-plugin{}}}. > ** However, the files needed for packaging in > {{hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml}} are > not all included in the {{dependency}} of {{{}hadoop-yarn-project{}}}. > Therefore, when compiling {{hadoop-yarn-project}} and executing > {{{}maven-assembly-plugin{}}}, not all required modules are built yet, > leading to errors in parallel compilation. > *Solution:* > ** The solution is relatively straightforward: organize all modules from > {{{}hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml{}}}, > and then declare them as dependencies in the pom of > {{{}hadoop-yarn-project{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org