Here's a summary of the current progress:

As of now, all components in Bigtop that use Maven for compilation have been 
thoroughly tested. Based on the testing results, the Java-based components in 
Bigtop can be parallel-compiled using Maven without requiring any 
modifications. These components include Hive, HBase, Flink, ZooKeeper, Alluxio, 
Phoenix, Livy, and Zeppelin.

However, Spark, Ranger, Tez, and Hadoop require some parallel 
compilation-related patches to enable parallel compilation.

The acceleration achieved through parallel compilation primarily falls into two 
categories:

During the initial compilation, the parallel download of dependencies 
significantly reduces the time spent on dependency retrieval.

Components with a large number of modules, such as Hadoop, Flink, Ranger, 
Alluxio, experience significant performance improvements.

If your machine has sufficient resources and you compile multiple components 
simultaneously, combined with parallel compilation and local dependency caching 
(non-initial compilation), it is possible to complete the compilation of all 
components within an hour.

For components compiled using Maven, except for Spark, individual component 
compilation typically takes less than fifteen minutes. It's important to note 
that Scala itself has slow compilation times, and there are currently no 
effective means of accelerating compilation when using Maven in the overall 
build context.

> On Dec 15, 2023, at 14:04, Masatake Iwasaki <[email protected]> wrote:
> 
> > Additionally, Hadoop is the most time-consuming component and a prime 
> > candidate for parallel compilation optimization.
> 
> Creating shaded client jar by maven-shade-plugin is the
> most time consuming part of building Hadoop.
> The reduced building time in percentage may not be
> significant as expected due to the part, FWIW.
> 
> On 2023/12/15 11:33, Masatake Iwasaki wrote:
>> Hi Jialiang,
>> 
>> Thanks for bringing this up.
>> 
>> It sounds good to improve developers' experience by reducing compilation 
>> time.
>> I think we should apply it to the most time-consuming product first
>> then evaluate how much time we can get.
>> 
>> It would be better to contribute the patch to each project
>> rather than make it Bigtop private since it is difficult
>> to confirm if there is no problem by ourselves.
>> 
>> 
>> > https://github.com/apache/bigtop/pull/1212
>> 
>> The PR introduces 3rd party extension.
>> Is the extension crucial?
>> How about starting from using built-in feature of Maven?
>> We can evaluate the gain from the extension alone after that.
>> I prefer one patch for doing one thing.
>> 
>> Regards,
>> Masatake Iwasaki
>> 
>> On 2023/12/15 10:51, Jialiang Cai wrote:
>>> Subject: Discussion on Introducing Parallel Compilation Support for Bigtop 
>>> Components
>>> 
>>> Dear Bigtop Community Members,
>>> 
>>> I hope this message finds you well. I would like to initiate a discussion 
>>> regarding the introduction of parallel compilation support for Bigtop 
>>> components.
>>> 
>>> Background:
>>> Within the components maintained by Bigtop, a significant portion is built 
>>> using Java and relies on Maven as the build tool.
>>> 
>>> Rationale:
>>> Compiling components that consist of numerous modules can be a 
>>> time-consuming process. For instance, some components contain hundreds of 
>>> modules, and compiling them one by one consumes a substantial amount of 
>>> time. Even when all dependencies are pre-downloaded for a second 
>>> compilation, the process remains slow due to the sequential nature of 
>>> compilation. Additionally, compiling all components together still results 
>>> in sequential compilation, making it challenging to fully leverage CPU 
>>> resources and reduce compilation time significantly. Consequently, 
>>> repetitive compilation and testing phases impose prolonged waiting periods.
>>> 
>>> Proposal:
>>> I propose the introduction of a new parameter that allows users to toggle 
>>> parallel compilation for components built using Maven, thus empowering them 
>>> to align compilation practices with their specific needs.
>>> 
>>> Related Pull Requests (PRs):
>>> https://issues.apache.org/jira/browse/BIGTOP-4044
>>> https://github.com/apache/bigtop/pull/1212
>>> 
>>> This new feture can be divided into two main parts:
>>> 
>>> The first part entails adding parallel compilation functionality and 
>>> enabling it for components that have undergone testing without encountering 
>>> additional issues related to parallel compilation. These components may 
>>> include Hive, HBase, Flink, ZooKeeper, among others.
>>> 
>>> The second part involves enabling parallel compilation for components that 
>>> face challenges with parallel compilation and necessitate additional 
>>> patches to address Maven's parallel compilation capabilities. These 
>>> components may include Ranger, Tez, Hadoop, Spark, and more.
>>> 
>>> I welcome your insights, opinions, and suggestions on this matter. Please 
>>> feel free to share your thoughts and concerns regarding the introduction of 
>>> parallel compilation support for Bigtop components.
>>> 
>>> Your contributions to this discussion are highly valued as we work towards 
>>> enhancing the efficiency and performance of our build processes.
>>> 
>>> Best regards,
>>> jiaLiang

Reply via email to