I greatly appreciate your valuable suggestions and responses. The mvn pacth with the Smart Builder extension is indeed an excellent way to transform Maven's default compilation order from breadth-first to depth-first, significantly increasing parallel compilation concurrency. I'm aware that the mvnd project has been using this extension. Of course, it's also possible to revert to Maven's default compilation behavior by removing this extension if needed.
Indeed, it's crucial to submit the corresponding parallel compilation optimizations to the respective components. So far, I have conducted testing on several components: Hive, HBase, Flink, and ZooKeeper can be parallel-compiled without requiring any modifications. Among them, Flink benefits the most from parallel compilation as it comprises around 200 modules. For other components, I have submitted parallel compilation patches to the Ranger and Tez communities: Ranger's parallel compilation PR: [https://issues.apache.org/jira/browse/RANGER-4511] [https://github.com/apache/ranger/pull/289] Tez's parallel compilation PR: [https://issues.apache.org/jira/browse/TEZ-4520] [https://github.com/apache/tez/pull/315] Spark 3.3, by default, uses Maven Shade Plugin version 3.2.4, which triggers an infinite loop bug during parallel compilation. https://issues.apache.org/jira/browse/MSHADE-413 PR for Spark 3.3 to upgrade the Maven Shade Plugin to version 3.5.0 has already been applied in Spark 3.5, eliminating the need for a separate PR in the Spark community. [https://issues.apache.org/jira/browse/SPARK-44257] Additionally, Hadoop is the most time-consuming component and a prime candidate for parallel compilation optimization. I have made optimization changes and have been using the improved Hadoop build for some time. However, due to the complexity of the Hadoop project, I am still contemplating how to elegantly optimize the dependencies for parallel compilation before submitting it to the community. As for other components like Alluxio, Zeppelin, Solr, and Phoenix, I have not yet had the opportunity to test and study them in-depth. In the first PR, from a design perspective, I have only added parallel patches to components that have been thoroughly validated. This approach ensures that the compilation behavior of components not yet optimized for parallel compilation remains unaffected. This flexibility allows us to release this feature independently, without waiting for all components to complete parallel compilation optimization. I look forward to your feedback and further discussions on this exciting endeavor to enhance the compilation process within our open-source community. Best regards, jiaLiang > On Dec 15, 2023, at 10:33, Masatake Iwasaki <[email protected]> wrote: > > Hi Jialiang, > > Thanks for bringing this up. > > It sounds good to improve developers' experience by reducing compilation time. > I think we should apply it to the most time-consuming product first > then evaluate how much time we can get. > > It would be better to contribute the patch to each project > rather than make it Bigtop private since it is difficult > to confirm if there is no problem by ourselves. > > > > https://github.com/apache/bigtop/pull/1212 > > The PR introduces 3rd party extension. > Is the extension crucial? > How about starting from using built-in feature of Maven? > We can evaluate the gain from the extension alone after that. > I prefer one patch for doing one thing. > > Regards, > Masatake Iwasaki > > On 2023/12/15 10:51, Jialiang Cai wrote: >> Subject: Discussion on Introducing Parallel Compilation Support for Bigtop >> Components >> >> Dear Bigtop Community Members, >> >> I hope this message finds you well. I would like to initiate a discussion >> regarding the introduction of parallel compilation support for Bigtop >> components. >> >> Background: >> Within the components maintained by Bigtop, a significant portion is built >> using Java and relies on Maven as the build tool. >> >> Rationale: >> Compiling components that consist of numerous modules can be a >> time-consuming process. For instance, some components contain hundreds of >> modules, and compiling them one by one consumes a substantial amount of >> time. Even when all dependencies are pre-downloaded for a second >> compilation, the process remains slow due to the sequential nature of >> compilation. Additionally, compiling all components together still results >> in sequential compilation, making it challenging to fully leverage CPU >> resources and reduce compilation time significantly. Consequently, >> repetitive compilation and testing phases impose prolonged waiting periods. >> >> Proposal: >> I propose the introduction of a new parameter that allows users to toggle >> parallel compilation for components built using Maven, thus empowering them >> to align compilation practices with their specific needs. >> >> Related Pull Requests (PRs): >> https://issues.apache.org/jira/browse/BIGTOP-4044 >> https://github.com/apache/bigtop/pull/1212 >> >> This new feture can be divided into two main parts: >> >> The first part entails adding parallel compilation functionality and >> enabling it for components that have undergone testing without encountering >> additional issues related to parallel compilation. These components may >> include Hive, HBase, Flink, ZooKeeper, among others. >> >> The second part involves enabling parallel compilation for components that >> face challenges with parallel compilation and necessitate additional patches >> to address Maven's parallel compilation capabilities. These components may >> include Ranger, Tez, Hadoop, Spark, and more. >> >> I welcome your insights, opinions, and suggestions on this matter. Please >> feel free to share your thoughts and concerns regarding the introduction of >> parallel compilation support for Bigtop components. >> >> Your contributions to this discussion are highly valued as we work towards >> enhancing the efficiency and performance of our build processes. >> >> Best regards, >> jiaLiang
