Thank you for the information. On my machine with 16 cores and 16GB of RAM, I 
conducted tests, and with dependencies pre-downloaded, Hadoop takes 
approximately 37 minutes to compile in single-threaded mode.  With parallel 
compilation across multiple modules, it takes around 15 minutes. Parallel 
compilation has indeed improved the compilation speed to some extent.

> On Dec 15, 2023, at 14:04, Masatake Iwasaki <[email protected]> wrote:
> 
> > Additionally, Hadoop is the most time-consuming component and a prime 
> > candidate for parallel compilation optimization.
> 
> Creating shaded client jar by maven-shade-plugin is the
> most time consuming part of building Hadoop.
> The reduced building time in percentage may not be
> significant as expected due to the part, FWIW.
> 
> On 2023/12/15 11:33, Masatake Iwasaki wrote:
>> Hi Jialiang,
>> 
>> Thanks for bringing this up.
>> 
>> It sounds good to improve developers' experience by reducing compilation 
>> time.
>> I think we should apply it to the most time-consuming product first
>> then evaluate how much time we can get.
>> 
>> It would be better to contribute the patch to each project
>> rather than make it Bigtop private since it is difficult
>> to confirm if there is no problem by ourselves.
>> 
>> 
>> > https://github.com/apache/bigtop/pull/1212
>> 
>> The PR introduces 3rd party extension.
>> Is the extension crucial?
>> How about starting from using built-in feature of Maven?
>> We can evaluate the gain from the extension alone after that.
>> I prefer one patch for doing one thing.
>> 
>> Regards,
>> Masatake Iwasaki
>> 
>> On 2023/12/15 10:51, Jialiang Cai wrote:
>>> Subject: Discussion on Introducing Parallel Compilation Support for Bigtop 
>>> Components
>>> 
>>> Dear Bigtop Community Members,
>>> 
>>> I hope this message finds you well. I would like to initiate a discussion 
>>> regarding the introduction of parallel compilation support for Bigtop 
>>> components.
>>> 
>>> Background:
>>> Within the components maintained by Bigtop, a significant portion is built 
>>> using Java and relies on Maven as the build tool.
>>> 
>>> Rationale:
>>> Compiling components that consist of numerous modules can be a 
>>> time-consuming process. For instance, some components contain hundreds of 
>>> modules, and compiling them one by one consumes a substantial amount of 
>>> time. Even when all dependencies are pre-downloaded for a second 
>>> compilation, the process remains slow due to the sequential nature of 
>>> compilation. Additionally, compiling all components together still results 
>>> in sequential compilation, making it challenging to fully leverage CPU 
>>> resources and reduce compilation time significantly. Consequently, 
>>> repetitive compilation and testing phases impose prolonged waiting periods.
>>> 
>>> Proposal:
>>> I propose the introduction of a new parameter that allows users to toggle 
>>> parallel compilation for components built using Maven, thus empowering them 
>>> to align compilation practices with their specific needs.
>>> 
>>> Related Pull Requests (PRs):
>>> https://issues.apache.org/jira/browse/BIGTOP-4044
>>> https://github.com/apache/bigtop/pull/1212
>>> 
>>> This new feture can be divided into two main parts:
>>> 
>>> The first part entails adding parallel compilation functionality and 
>>> enabling it for components that have undergone testing without encountering 
>>> additional issues related to parallel compilation. These components may 
>>> include Hive, HBase, Flink, ZooKeeper, among others.
>>> 
>>> The second part involves enabling parallel compilation for components that 
>>> face challenges with parallel compilation and necessitate additional 
>>> patches to address Maven's parallel compilation capabilities. These 
>>> components may include Ranger, Tez, Hadoop, Spark, and more.
>>> 
>>> I welcome your insights, opinions, and suggestions on this matter. Please 
>>> feel free to share your thoughts and concerns regarding the introduction of 
>>> parallel compilation support for Bigtop components.
>>> 
>>> Your contributions to this discussion are highly valued as we work towards 
>>> enhancing the efficiency and performance of our build processes.
>>> 
>>> Best regards,
>>> jiaLiang

Reply via email to