JiaLiangC opened a new pull request, #1226:
URL: https://github.com/apache/bigtop/pull/1226

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/BIGTOP/How+to+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'BIGTOP-3638: Your PR title ...'.
   -->
   
   ### Description of PR
   The reason for the slow compilation: The Hadoop project has many modules, 
and the inability to compile them in parallel results in a slow process. For 
instance, the first compilation of Hadoop might take several hours, and even 
with local Maven dependencies, a subsequent compilation can still take close to 
40 minutes, which is very slow.
   
   How to solve it: Use mvn dependency:tree and maven-to-plantuml to 
investigate the dependency issues that prevent parallel compilation.
   
   Investigate the dependencies between project modules.
   Analyze the dependencies in multi-module Maven projects.
   Download maven-to-plantuml:
    
   wget 
https://github.com/phxql/maven-to-plantuml/releases/download/v1.0/maven-to-plantuml-1.0.jar
   
   Generate a dependency tree:
    
   mvn dependency:tree > dep.txt
   
   Generate a UML diagram from the dependency tree:
    
   java -jar maven-to-plantuml.jar --input dep.txt --output dep.puml
   
   For more information, visit: [maven-to-plantuml GitHub 
repository](https://github.com/phxql/maven-to-plantuml/tree/master).
   
    
   
   Hadoop Parallel Compilation Submission Logic
   
   Reasons for Parallel Compilation Failure
   In sequential compilation, as modules are compiled one by one in order, 
there are no errors because the compilation follows the module sequence.
   However, in parallel compilation, all modules are compiled simultaneously. 
The compilation order during multi-module concurrent compilation depends on the 
inter-module dependencies. If Module A depends on Module B, then Module B will 
be compiled before Module A. This ensures that the compilation order follows 
the dependencies between modules.
   But when Hadoop compiles in parallel, for example, compiling 
hadoop-yarn-project, the dependencies between modules are correct. The issue 
arises during the dist package stage. dist packages all other compiled modules.
   Behavior of hadoop-yarn-project in Serial Compilation:
   
   In serial compilation, it compiles modules in the pom one by one in 
sequence. After all modules are compiled, it compiles hadoop-yarn-project. 
During the prepare-package stage, the maven-assembly-plugin plugin is executed 
for packaging. All packages are repackaged according to the description in 
hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml.
   Behavior of hadoop-yarn-project in Parallel Compilation:
   Parallel compilation compiles modules according to the dependency order 
among them. If modules do not declare dependencies on each other through 
dependency, they are compiled in parallel. According to the dependency 
definition in the pom of hadoop-yarn-project, the dependencies are compiled 
first, followed by hadoop-yarn-project, executing its maven-assembly-plugin.
   However, the files needed for packaging in 
hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml are not 
all included in the dependency of hadoop-yarn-project. Therefore, when 
compiling hadoop-yarn-project and executing maven-assembly-plugin, not all 
required modules are built yet, leading to errors in parallel compilation.
   Solution:
   The solution is relatively straightforward: organize all modules from 
hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml, and then 
declare them as dependencies in the pom of hadoop-yarn-project.
   
   ### How was this patch tested?
   manual test
   
![image](https://github.com/apache/bigtop/assets/18082602/ad07e665-272e-4172-994f-bc515c09f582)
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'BIGTOP-3638. Your PR title ...')?
   - [ ] Make sure that newly added files do not have any licensing issues. 
When in doubt refer to https://www.apache.org/licenses/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to