Hi all,

hudi-client module has core Hudi abstractions and client logic for
different engines like Spark, Flink, and Java.  While previous effort
(HUDI-538 [1]) has decoupled the integration with Spark, there is quite
some code duplication across different engines for almost the same logic
due to the current interface design.  Some part also has divergence among
engines, making debugging and support difficult.

I propose to further refactor the hudi-client module with the goal of
improving the code reuse across multiple engines and reducing the
divergence of the logic among them, so that the core Hudi action execution
logic should be shared across engines, except for engine specific
transformations.  Such a pattern also allows easy support of core Hudi
functionality for all engines in the future.  Specifically,

(1) Abstracts the transformation boilerplates inside the
HoodieEngineContext and implements the engine-specific data transformation
logic in the subclasses.  Type cast can be done inside the engine context.
(2) Creates new HoodieData abstraction for passing input and output along
the flow of execution, and uses it in different Hudi abstractions, e.g.,
HoodieTable, HoodieIOHandle, BaseActionExecutor, instead of enforcing type
parameters encountering RDD<HoodieRecord> and List<HoodieRecord> which are
one source of duplication.
(3) Extracts common execution logic to hudi-client-common module from
multiple engines.

As a first step and exploration for item (1) and (3) above, I've tried to
refactor the rollback actions and the PR is here [HUDI-2433][2].  In this
PR, I completely remove all engine-specific rollback packages and only keep
one rollback package in hudi-client-common, adding ~350 LoC while deleting
1.3K LoC.  My next step is to refactor the commit action which encompasses
item (2) above.

What do you folks think and any other suggestions?

[1] [HUDI-538] [UMBRELLA] Restructuring hudi client module for multi engine
support
https://issues.apache.org/jira/browse/HUDI-538
[2] PR: [HUDI-2433] Refactor rollback actions in hudi-client module
https://github.com/apache/hudi/pull/3664/files

Best,
- Ethan

Reply via email to