Thank you for the information. On my machine with 16 cores and 16GB of RAM, I conducted tests, and with dependencies pre-downloaded, Hadoop takes approximately 37 minutes to compile in single-threaded mode. With parallel compilation across multiple modules, it takes around 15 minutes. Parallel compilation has indeed improved the compilation speed to some extent.
> On Dec 15, 2023, at 14:04, Masatake Iwasaki <[email protected]> wrote: > > > Additionally, Hadoop is the most time-consuming component and a prime > > candidate for parallel compilation optimization. > > Creating shaded client jar by maven-shade-plugin is the > most time consuming part of building Hadoop. > The reduced building time in percentage may not be > significant as expected due to the part, FWIW. > > On 2023/12/15 11:33, Masatake Iwasaki wrote: >> Hi Jialiang, >> >> Thanks for bringing this up. >> >> It sounds good to improve developers' experience by reducing compilation >> time. >> I think we should apply it to the most time-consuming product first >> then evaluate how much time we can get. >> >> It would be better to contribute the patch to each project >> rather than make it Bigtop private since it is difficult >> to confirm if there is no problem by ourselves. >> >> >> > https://github.com/apache/bigtop/pull/1212 >> >> The PR introduces 3rd party extension. >> Is the extension crucial? >> How about starting from using built-in feature of Maven? >> We can evaluate the gain from the extension alone after that. >> I prefer one patch for doing one thing. >> >> Regards, >> Masatake Iwasaki >> >> On 2023/12/15 10:51, Jialiang Cai wrote: >>> Subject: Discussion on Introducing Parallel Compilation Support for Bigtop >>> Components >>> >>> Dear Bigtop Community Members, >>> >>> I hope this message finds you well. I would like to initiate a discussion >>> regarding the introduction of parallel compilation support for Bigtop >>> components. >>> >>> Background: >>> Within the components maintained by Bigtop, a significant portion is built >>> using Java and relies on Maven as the build tool. >>> >>> Rationale: >>> Compiling components that consist of numerous modules can be a >>> time-consuming process. For instance, some components contain hundreds of >>> modules, and compiling them one by one consumes a substantial amount of >>> time. Even when all dependencies are pre-downloaded for a second >>> compilation, the process remains slow due to the sequential nature of >>> compilation. Additionally, compiling all components together still results >>> in sequential compilation, making it challenging to fully leverage CPU >>> resources and reduce compilation time significantly. Consequently, >>> repetitive compilation and testing phases impose prolonged waiting periods. >>> >>> Proposal: >>> I propose the introduction of a new parameter that allows users to toggle >>> parallel compilation for components built using Maven, thus empowering them >>> to align compilation practices with their specific needs. >>> >>> Related Pull Requests (PRs): >>> https://issues.apache.org/jira/browse/BIGTOP-4044 >>> https://github.com/apache/bigtop/pull/1212 >>> >>> This new feture can be divided into two main parts: >>> >>> The first part entails adding parallel compilation functionality and >>> enabling it for components that have undergone testing without encountering >>> additional issues related to parallel compilation. These components may >>> include Hive, HBase, Flink, ZooKeeper, among others. >>> >>> The second part involves enabling parallel compilation for components that >>> face challenges with parallel compilation and necessitate additional >>> patches to address Maven's parallel compilation capabilities. These >>> components may include Ranger, Tez, Hadoop, Spark, and more. >>> >>> I welcome your insights, opinions, and suggestions on this matter. Please >>> feel free to share your thoughts and concerns regarding the introduction of >>> parallel compilation support for Bigtop components. >>> >>> Your contributions to this discussion are highly valued as we work towards >>> enhancing the efficiency and performance of our build processes. >>> >>> Best regards, >>> jiaLiang
