> Additionally, Hadoop is the most time-consuming component and a prime
candidate for parallel compilation optimization.
Creating shaded client jar by maven-shade-plugin is the
most time consuming part of building Hadoop.
The reduced building time in percentage may not be
significant as expected due to the part, FWIW.
On 2023/12/15 11:33, Masatake Iwasaki wrote:
Hi Jialiang,
Thanks for bringing this up.
It sounds good to improve developers' experience by reducing
compilation time.
I think we should apply it to the most time-consuming product first
then evaluate how much time we can get.
It would be better to contribute the patch to each project
rather than make it Bigtop private since it is difficult
to confirm if there is no problem by ourselves.
> https://github.com/apache/bigtop/pull/1212
The PR introduces 3rd party extension.
Is the extension crucial?
How about starting from using built-in feature of Maven?
We can evaluate the gain from the extension alone after that.
I prefer one patch for doing one thing.
Regards,
Masatake Iwasaki
On 2023/12/15 10:51, Jialiang Cai wrote:
Subject: Discussion on Introducing Parallel Compilation Support for
Bigtop Components
Dear Bigtop Community Members,
I hope this message finds you well. I would like to initiate a
discussion regarding the introduction of parallel compilation support
for Bigtop components.
Background:
Within the components maintained by Bigtop, a significant portion is
built using Java and relies on Maven as the build tool.
Rationale:
Compiling components that consist of numerous modules can be a
time-consuming process. For instance, some components contain
hundreds of modules, and compiling them one by one consumes a
substantial amount of time. Even when all dependencies are
pre-downloaded for a second compilation, the process remains slow due
to the sequential nature of compilation. Additionally, compiling all
components together still results in sequential compilation, making
it challenging to fully leverage CPU resources and reduce compilation
time significantly. Consequently, repetitive compilation and testing
phases impose prolonged waiting periods.
Proposal:
I propose the introduction of a new parameter that allows users to
toggle parallel compilation for components built using Maven, thus
empowering them to align compilation practices with their specific
needs.
Related Pull Requests (PRs):
https://issues.apache.org/jira/browse/BIGTOP-4044
https://github.com/apache/bigtop/pull/1212
This new feture can be divided into two main parts:
The first part entails adding parallel compilation functionality and
enabling it for components that have undergone testing without
encountering additional issues related to parallel compilation. These
components may include Hive, HBase, Flink, ZooKeeper, among others.
The second part involves enabling parallel compilation for components
that face challenges with parallel compilation and necessitate
additional patches to address Maven's parallel compilation
capabilities. These components may include Ranger, Tez, Hadoop,
Spark, and more.
I welcome your insights, opinions, and suggestions on this matter.
Please feel free to share your thoughts and concerns regarding the
introduction of parallel compilation support for Bigtop components.
Your contributions to this discussion are highly valued as we work
towards enhancing the efficiency and performance of our build processes.
Best regards,
jiaLiang