I greatly appreciate your valuable suggestions and responses.

The mvn pacth with the Smart Builder extension is indeed an excellent way to 
transform Maven's default compilation order from breadth-first to depth-first, 
significantly increasing parallel compilation concurrency. I'm aware that the 
mvnd project has been using this extension. Of course, it's also possible to 
revert to Maven's default compilation behavior by removing this extension if 
needed.

Indeed, it's crucial to submit the corresponding parallel compilation 
optimizations to the respective components. So far, I have conducted testing on 
several components:

Hive, HBase, Flink, and ZooKeeper can be parallel-compiled without requiring 
any modifications. Among them, Flink benefits the most from parallel 
compilation as it comprises around 200 modules.

For other components, I have submitted parallel compilation patches to the 
Ranger and Tez communities:

Ranger's parallel compilation PR: 
[https://issues.apache.org/jira/browse/RANGER-4511]
[https://github.com/apache/ranger/pull/289]


Tez's parallel compilation PR: 
[https://issues.apache.org/jira/browse/TEZ-4520]
[https://github.com/apache/tez/pull/315]

Spark 3.3, by default, uses Maven Shade Plugin version 3.2.4, which triggers an 
infinite loop bug during parallel compilation. 
https://issues.apache.org/jira/browse/MSHADE-413

PR for Spark 3.3 to upgrade the Maven Shade Plugin to version 3.5.0  has 
already been applied in Spark 3.5, eliminating the need for a separate PR in 
the Spark community.
[https://issues.apache.org/jira/browse/SPARK-44257]

Additionally, Hadoop is the most time-consuming component and a prime candidate 
for parallel compilation optimization. I have made optimization changes and 
have been using the improved Hadoop build for some time. However, due to the 
complexity of the Hadoop project, I am still contemplating how to elegantly 
optimize the dependencies for parallel compilation before submitting it to the 
community.

As for other components like Alluxio, Zeppelin, Solr, and Phoenix, I have not 
yet had the opportunity to test and study them in-depth.

In the first PR, from a design perspective, I have only added parallel patches 
to components that have been thoroughly validated. This approach ensures that 
the compilation behavior of components not yet optimized for parallel 
compilation remains unaffected. This flexibility allows us to release this 
feature independently, without waiting for all components to complete parallel 
compilation optimization.

I look forward to your feedback and further discussions on this exciting 
endeavor to enhance the compilation process within our open-source community.

Best regards,

jiaLiang

> On Dec 15, 2023, at 10:33, Masatake Iwasaki <[email protected]> wrote:
> 
> Hi Jialiang,
> 
> Thanks for bringing this up.
> 
> It sounds good to improve developers' experience by reducing compilation time.
> I think we should apply it to the most time-consuming product first
> then evaluate how much time we can get.
> 
> It would be better to contribute the patch to each project
> rather than make it Bigtop private since it is difficult
> to confirm if there is no problem by ourselves.
> 
> 
> > https://github.com/apache/bigtop/pull/1212
> 
> The PR introduces 3rd party extension.
> Is the extension crucial?
> How about starting from using built-in feature of Maven?
> We can evaluate the gain from the extension alone after that.
> I prefer one patch for doing one thing.
> 
> Regards,
> Masatake Iwasaki
> 
> On 2023/12/15 10:51, Jialiang Cai wrote:
>> Subject: Discussion on Introducing Parallel Compilation Support for Bigtop 
>> Components
>> 
>> Dear Bigtop Community Members,
>> 
>> I hope this message finds you well. I would like to initiate a discussion 
>> regarding the introduction of parallel compilation support for Bigtop 
>> components.
>> 
>> Background:
>> Within the components maintained by Bigtop, a significant portion is built 
>> using Java and relies on Maven as the build tool.
>> 
>> Rationale:
>> Compiling components that consist of numerous modules can be a 
>> time-consuming process. For instance, some components contain hundreds of 
>> modules, and compiling them one by one consumes a substantial amount of 
>> time. Even when all dependencies are pre-downloaded for a second 
>> compilation, the process remains slow due to the sequential nature of 
>> compilation. Additionally, compiling all components together still results 
>> in sequential compilation, making it challenging to fully leverage CPU 
>> resources and reduce compilation time significantly. Consequently, 
>> repetitive compilation and testing phases impose prolonged waiting periods.
>> 
>> Proposal:
>> I propose the introduction of a new parameter that allows users to toggle 
>> parallel compilation for components built using Maven, thus empowering them 
>> to align compilation practices with their specific needs.
>> 
>> Related Pull Requests (PRs):
>> https://issues.apache.org/jira/browse/BIGTOP-4044
>> https://github.com/apache/bigtop/pull/1212
>> 
>> This new feture can be divided into two main parts:
>> 
>> The first part entails adding parallel compilation functionality and 
>> enabling it for components that have undergone testing without encountering 
>> additional issues related to parallel compilation. These components may 
>> include Hive, HBase, Flink, ZooKeeper, among others.
>> 
>> The second part involves enabling parallel compilation for components that 
>> face challenges with parallel compilation and necessitate additional patches 
>> to address Maven's parallel compilation capabilities. These components may 
>> include Ranger, Tez, Hadoop, Spark, and more.
>> 
>> I welcome your insights, opinions, and suggestions on this matter. Please 
>> feel free to share your thoughts and concerns regarding the introduction of 
>> parallel compilation support for Bigtop components.
>> 
>> Your contributions to this discussion are highly valued as we work towards 
>> enhancing the efficiency and performance of our build processes.
>> 
>> Best regards,
>> jiaLiang

Reply via email to